这是indexloc提供的服务,不要输入任何密码
Skip to content

support more aggregation operators when generating native query configurations #120

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

hallettj
Copy link
Collaborator

@hallettj hallettj commented Nov 7, 2024

Adds or refines support for these operators:

Arithmetic Expression Operators

  • $abs
  • $add
  • $divide
  • $multiply
  • $subtract

Array Expression Operators

  • $arrayElemAt

Boolean Expression Operators

  • $and
  • $not
  • $or

Comparison Expression Operators

  • $eq
  • $gt
  • $gte
  • $lt
  • $lte
  • $ne

Set Expression Operators

  • $allElementsTrue
  • $anyElementTrue

String Expression Operators

  • $split

Trigonometry Expression Operators

  • $sin
  • $cos
  • $tan
  • $asin
  • $acos
  • $atan
  • $asinh
  • $acosh
  • $atanh
  • $sinh
  • $cosh
  • $tanh

Accumulators ($group, $bucket, $bucketAuto, $setWindowFields)

  • $avg
  • $count
  • $max
  • $min
  • $push
  • $sum

Also improves type inference to make all of these operators work.

This is work an an in-progress feature that is gated behind a feature flag, native-query-subcommand

@hallettj hallettj requested a review from codedmart November 7, 2024 22:33
@hallettj hallettj self-assigned this Nov 7, 2024
Comment on lines +184 to +251
#[googletest::test]
fn supports_various_aggregation_operators() -> googletest::Result<()> {
let config = mflix_config();

let pipeline = Pipeline::new(vec![
Stage::Match(doc! {
"$expr": {
"$and": [
{ "$eq": ["{{ title }}", "$title"] },
{ "$or": [null, 1] },
{ "$not": "{{ bool_param }}" },
{ "$gt": ["$imdb.votes", "{{ votes }}"] },
]
}
}),
Stage::ReplaceWith(Selection::new(doc! {
"abs": { "$abs": "$year" },
"add": { "$add": ["$tomatoes.viewer.rating", "{{ rating_inc }}"] },
"divide": { "$divide": ["$tomatoes.viewer.rating", "{{ rating_div }}"] },
"multiply": { "$multiply": ["$tomatoes.viewer.rating", "{{ rating_mult }}"] },
"subtract": { "$subtract": ["$tomatoes.viewer.rating", "{{ rating_sub }}"] },
"arrayElemAt": { "$arrayElemAt": ["$genres", "{{ idx }}"] },
"title_words": { "$split": ["$title", " "] }
})),
]);

let native_query =
native_query_from_pipeline(&config, "operators_test", Some("movies".into()), pipeline)?;

expect_eq!(
native_query.arguments,
object_fields([
("title", Type::Scalar(BsonScalarType::String)),
("bool_param", Type::Scalar(BsonScalarType::Bool)),
("votes", Type::Scalar(BsonScalarType::Int)),
("rating_inc", Type::Scalar(BsonScalarType::Double)),
("rating_div", Type::Scalar(BsonScalarType::Double)),
("rating_mult", Type::Scalar(BsonScalarType::Double)),
("rating_sub", Type::Scalar(BsonScalarType::Double)),
("idx", Type::Scalar(BsonScalarType::Int)),
])
);

let result_type = native_query.result_document_type;
expect_eq!(
native_query.object_types[&result_type],
ObjectType {
fields: object_fields([
("abs", Type::Scalar(BsonScalarType::Int)),
("add", Type::Scalar(BsonScalarType::Double)),
("divide", Type::Scalar(BsonScalarType::Double)),
("multiply", Type::Scalar(BsonScalarType::Double)),
("subtract", Type::Scalar(BsonScalarType::Double)),
(
"arrayElemAt",
Type::Nullable(Box::new(Type::Scalar(BsonScalarType::String)))
),
(
"title_words",
Type::ArrayOf(Box::new(Type::Scalar(BsonScalarType::String)))
),
]),
description: None,
}
);

Ok(())
}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test demonstrates the new functionality

} else {
Err((C::Scalar(a), C::Scalar(b)))
match variance {
Variance::Covariant => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the purpose of tracking the difference between Covariant and Contravariant? Seems like we try the subtyping check both ways regardless?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh you're right - I want to return the supertype in the covariant case, and the subtype in the contravariant case. But that's not actually how I wrote it is it?

Predicate {
object_type_name: ObjectTypeName,
},

// Complex types
Union(BTreeSet<TypeConstraint>),
OneOf(BTreeSet<TypeConstraint>), // unlike union type should be only one of these, but we don't know which one
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you provide a concrete example here in a comment perhaps? This may be entirely a reflection of my lack of understanding around Mongo's semantics, but I can't get my head around when this would be used.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good idea, I'll add more details in a comment.

OneOf is helpful mostly for numeric inputs, where we're expecting a numeric type, but we don't know specifically which one. We probably want to infer a specific numeric type instead of a union of all numeric types, which would present as ExtendedJSON (the top type) in the API which isn't as helpful.

OTOH my thinking is Union is for cases where we have specific evidence a type variable is associated with multiple distinct concrete types. I'm mainly using Union to indicate nullability.

Another way to handle OneOf that could be less handwavy could be to use SubtypeOf(Union(...)). But I'm not sure exactly how the semantics of that would play out.


C::Union(_) => Some(Type::ExtendedJSON),

t @ C::OneOf(_) if t.is_numeric() => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely not a thing for now, but I wonder how hard it would be to optionally include the user's argument types here to help guide us through ambiguity here. Choosing double seems absolutely acceptable for now though.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, good point! Type annotations will show up as another constraint in the set of constraints for the type variable. So before we get to constraint_to_type we'll see a call to simplify_pair with OneOf(...) and Scalar(_) (or whatever concrete type is given by the annotation) which will simplify to the annotation type if that overlaps with one of the types in the OneOf set. If there is no overlap we'll either report an error, or have some way of signalling that annotations suppress type mismatches.

loop {
let prev_type_variables = type_variables.clone();
let prev_solutions = solutions.clone();
let prev_substitutions = substitutions.clone();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm glad your simplification was instead to make a clone of the substitutions for backtracking.

operand_type_hint: Option<TypeConstraint>,
operator: &str,
operand: Bson,
) -> Result<TypeConstraint> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, broadly, forgive my massive handwaving, do I understand correctly that TypeConstraint contains either a concrete type or information we have about an unknown type (ie, that it's Numeric)?

And that we use subtyping to move constraints towards types when we combine them?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's pretty much it with the addendum that a TypeConstraint may reference a type variable. We have a type variable for the query result type, and another type variable for each parameter, and each variable is associated with a set of TypeConstraints. Since TypeConstraint can reference variables that's a way for variables to indirectly reference each other.

Subtyping is part of it. I'm considering the query result variable to be covariant so multiple constraints converge on the supertype. Parameters are contravariant so those constraints should converge on the subtype, or lead to a mismatch error. But constraints can also relate variables in ways that are not related to subtyping. For example we've got FieldOf which is a way to state that the type of a variable is the same as the type of a given field of a given object type.

@@ -0,0 +1,280 @@
Arithmetic Expression Operators

- [x] $abs - Returns the absolute value of a number.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume that Mongo has no deep concept of introspection for these functions, so it's just a case of special casing each one we want to support?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately MongoDB doesn't have much of a concept of typing at all. I don't think there is any way to introspect for signatures for functions like these.

It's possible we could build our own function signature language, and write signatures ourselves. But I thought it would be easier to hard-code this stuff, especially since the query language has a closed, limited set of functions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this approach is absolutely fine, it's very clear what is going on which is great.

Copy link
Contributor

@danieljharvey danieljharvey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've left some questions to clarify my understanding, however I do not think they should stop us merging this work.

@hallettj
Copy link
Collaborator Author

I've left some questions to clarify my understanding, however I do not think they should stop us merging this work.

Thanks very much for the review and the feedback!

@hallettj hallettj force-pushed the jessehallett/eng-1269-when-re-generating-native-query-ignore-existing-config-for branch from b05e1ac to cf42e89 Compare November 18, 2024 02:23
Base automatically changed from jessehallett/eng-1269-when-re-generating-native-query-ignore-existing-config-for to main November 19, 2024 17:22
@hallettj hallettj force-pushed the jessehallett/eng-1102-support-more-aggregation-expression-operators-when branch from 4094237 to 0b52183 Compare November 19, 2024 17:26
@hallettj hallettj merged commit 1577927 into main Nov 19, 2024
1 check passed
@hallettj hallettj deleted the jessehallett/eng-1102-support-more-aggregation-expression-operators-when branch November 19, 2024 19:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants