这是indexloc提供的服务,不要输入任何密码
Skip to content

add native queries, functions or virtual collections defined by pipelines #45

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 38 commits into from
Apr 19, 2024

Conversation

hallettj
Copy link
Collaborator

@hallettj hallettj commented Apr 18, 2024

Describe your changes

Implements native queries. They are defined as aggregation pipelines. If the root target of a query request is a native query then the given pipeline will form the start of the overall query plan pipeline. In this case the query will be executed as a MongoDB aggregate command with no target collection - as opposed to our other queries which are an aggregate command that does have a collection target.

Native queries currently cannot be the target of a relation.

There is a really basic native query in fixtures to test with. If you run services with arion up -d you can see it as a query field called hello.

The changes were going to result in a large-ish amount of very similar, but technically incompatible code involving converting configuration to ndc types for schema responses, and for processing query requests. To avoid that I pushed configuration processing into the configuration crate. This makes it easier to share that logic, pushes a bunch of errors from connector runtime to configuration parsing time, and pushes computation to connector startup time instead of response-handling time. This resulted in a bunch of changes:

  • The MongoConfig type is gone. Instead configuration-related data is kept in Configuration which is now passed directly to mongodb-agent-common functions. Database-connection data is put in a new type, ConnectorState. So now we have properly separated configuration and state types, which is what the ndc-sdk API expects.
  • The configuration crate has a new dependency on ndc-models. We need to keep the ndc-models version matched with ndc-sdk. To make that easier I moved configuration for those dependencies to the workspace Cargo.toml and added a note.

Issue ticket number and link

MDB-103

@hallettj hallettj self-assigned this Apr 18, 2024
/// responses they are separate concepts. So we want a set of [CollectionInfo] values for
/// functions for query processing, and we want it separate from `collections` for the schema
/// response.
pub function_collection_infos: BTreeMap<String, ndc::CollectionInfo>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will functions and function_collection_infos have the same set of keys?
If so, would it be better to represent this as a BTreeMap<String, (ndc::FunctionInfo, ndc::CollectionInfo)>?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That does make sense. But I think it's a matter of style. There isn't any case where we want to look up both FunctionInfo and CollectionInfo - the schema handler wants one, the query handler wants the other. We shouldn't be constructing Configuration values without going through validate except in tests so I'm not concerned about accidentally created an invalid state.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok I made this change. It does make the validate function simpler. But I'd rather not combine the procedure maps since I like keeping the ndc stuff somewhat separated from the mongodb-specific stuff.


/// Native procedures allow arbitrary MongoDB commands where types of results are
/// specified via user configuration.
#[serde(default, skip_serializing_if = "BTreeMap::is_empty")]
pub native_procedures: BTreeMap<String, NativeProcedure>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Likewise, could procedures and native_procedures be represented as a single map BTreeMap<String, (ndc::ProcedureInfo, NativeProcedure)>?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep that makes sense. But again I think it's a matter of style since there are no cases where we want both values together.

/// be merged with the definitions in `schema.json`. This allows you to maintain hand-written
/// types for native procedures without having to edit a generated `schema.json` file.
#[serde(default, skip_serializing_if = "BTreeMap::is_empty")]
pub object_types: BTreeMap<String, ObjectType>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering whether all user-defined object types should be in one place, rather than each native procedure/query having their own. That would make it easier to re-use types between procedures/queries. It would also provide a place where users could override type definitions inferred by the schema inference.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I think you make a good point. I think the way I have it set up has a convenience advantage because you can define single-use types in the same place they're used. But it might be more confusing. @codedmart already ran into this trying to write a procedure that reuses a collection object type.

I think this is a good topic for discussion. I think it's outside the scope of this PR since I already established this convention in native procedures. But there's still time to change it in another ticket.

@hallettj hallettj merged commit 2067659 into main Apr 19, 2024
@hallettj hallettj deleted the jesse/native-queries-using-pipelines branch April 19, 2024 21:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants