这是indexloc提供的服务,不要输入任何密码
Skip to content

rework queries with variable sets so they use indexes #83

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 21 commits into from
Jun 24, 2024

Conversation

hallettj
Copy link
Collaborator

@hallettj hallettj commented Jun 21, 2024

Changes aggregation pipelines that are produced for query requests that include variable sets. Previously we used a $facet sub-pipelines where each sub-pipeline was a copy of the query pipeline with different variable values interpolated. Variable values were written directly into the pipelines at use sites.

This PR removes the $facet stage, and instead feeds variable sets into the start of the pipeline using a $documents stage which feeds into a $lookup stage that contains the query pipeline as a sub-pipeline. So we only have one copy of the query pipeline. Instead if interpolating variable values into that pipeline, we use MongoDB variables to reference values from the $documents stage.

This change has two big benefits:

  • using $lookup instead of $facet allows the use of indexes
  • the new pipeline is valid even with zero variable sets which fixes a bug we had where connector /explain requests would fail if the connector is the target of a remote join

The downside is that in my testing use of a $documents stage fails in MongoDB 5 with this error:

Kind: Command failed: Error code 5491300 (Location5491300): $documents' is not allowed in user requests, labels: {}

We're investigating this issue to see if it can be fixed. For this PR I have skipped integration tests involving variable sets when testing MongoDB 5. (The tests do run in MongoDB versions 6 & 7.)

A couple of considerations came up while making the changes for this PR. The biggest is that since we are now injecting variables in the first stage of the pipeline we need to know types for those variables up front to apply JSON-to-BSON conversions. To handle that I updated the plan_for_query_request state struct to record types for each use of each variable, and i added a variable_types map to the QueryPlan struct. It is possible for one variable to appear in multiple contexts in the query with different types. Previously we did the BSON conversion at each use site, so we would convert to BSON differently at each site as appropriate for the expected type. Moving variable serialization to a central point requires either an attempt at type unification, or setting separate values for each variable for each distinct type context. I opted for the latter. So the variables written into the aggregation pipeline have names with type suffixes to distinguish different possible BSON serializations.

The other consideration was that the pipeline change results in a different response shape. So I also had to touch response serialization. Thankfully the new response serialization came out a little simpler than it used to be.

MDB-113

@hallettj hallettj requested review from codedmart and dmoverton June 21, 2024 19:21
@hallettj hallettj self-assigned this Jun 21, 2024
Copy link
Contributor

@dmoverton dmoverton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

@dmoverton dmoverton merged commit 6a5e208 into main Jun 24, 2024
1 check passed
@dmoverton dmoverton deleted the jesse/rework-variable-sets branch June 24, 2024 22:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants