-
Notifications
You must be signed in to change notification settings - Fork 3
support more aggregation operators when generating native query configurations #120
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
support more aggregation operators when generating native query configurations #120
Conversation
#[googletest::test] | ||
fn supports_various_aggregation_operators() -> googletest::Result<()> { | ||
let config = mflix_config(); | ||
|
||
let pipeline = Pipeline::new(vec![ | ||
Stage::Match(doc! { | ||
"$expr": { | ||
"$and": [ | ||
{ "$eq": ["{{ title }}", "$title"] }, | ||
{ "$or": [null, 1] }, | ||
{ "$not": "{{ bool_param }}" }, | ||
{ "$gt": ["$imdb.votes", "{{ votes }}"] }, | ||
] | ||
} | ||
}), | ||
Stage::ReplaceWith(Selection::new(doc! { | ||
"abs": { "$abs": "$year" }, | ||
"add": { "$add": ["$tomatoes.viewer.rating", "{{ rating_inc }}"] }, | ||
"divide": { "$divide": ["$tomatoes.viewer.rating", "{{ rating_div }}"] }, | ||
"multiply": { "$multiply": ["$tomatoes.viewer.rating", "{{ rating_mult }}"] }, | ||
"subtract": { "$subtract": ["$tomatoes.viewer.rating", "{{ rating_sub }}"] }, | ||
"arrayElemAt": { "$arrayElemAt": ["$genres", "{{ idx }}"] }, | ||
"title_words": { "$split": ["$title", " "] } | ||
})), | ||
]); | ||
|
||
let native_query = | ||
native_query_from_pipeline(&config, "operators_test", Some("movies".into()), pipeline)?; | ||
|
||
expect_eq!( | ||
native_query.arguments, | ||
object_fields([ | ||
("title", Type::Scalar(BsonScalarType::String)), | ||
("bool_param", Type::Scalar(BsonScalarType::Bool)), | ||
("votes", Type::Scalar(BsonScalarType::Int)), | ||
("rating_inc", Type::Scalar(BsonScalarType::Double)), | ||
("rating_div", Type::Scalar(BsonScalarType::Double)), | ||
("rating_mult", Type::Scalar(BsonScalarType::Double)), | ||
("rating_sub", Type::Scalar(BsonScalarType::Double)), | ||
("idx", Type::Scalar(BsonScalarType::Int)), | ||
]) | ||
); | ||
|
||
let result_type = native_query.result_document_type; | ||
expect_eq!( | ||
native_query.object_types[&result_type], | ||
ObjectType { | ||
fields: object_fields([ | ||
("abs", Type::Scalar(BsonScalarType::Int)), | ||
("add", Type::Scalar(BsonScalarType::Double)), | ||
("divide", Type::Scalar(BsonScalarType::Double)), | ||
("multiply", Type::Scalar(BsonScalarType::Double)), | ||
("subtract", Type::Scalar(BsonScalarType::Double)), | ||
( | ||
"arrayElemAt", | ||
Type::Nullable(Box::new(Type::Scalar(BsonScalarType::String))) | ||
), | ||
( | ||
"title_words", | ||
Type::ArrayOf(Box::new(Type::Scalar(BsonScalarType::String))) | ||
), | ||
]), | ||
description: None, | ||
} | ||
); | ||
|
||
Ok(()) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test demonstrates the new functionality
} else { | ||
Err((C::Scalar(a), C::Scalar(b))) | ||
match variance { | ||
Variance::Covariant => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the purpose of tracking the difference between Covariant
and Contravariant
? Seems like we try the subtyping check both ways regardless?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh you're right - I want to return the supertype in the covariant case, and the subtype in the contravariant case. But that's not actually how I wrote it is it?
Predicate { | ||
object_type_name: ObjectTypeName, | ||
}, | ||
|
||
// Complex types | ||
Union(BTreeSet<TypeConstraint>), | ||
OneOf(BTreeSet<TypeConstraint>), // unlike union type should be only one of these, but we don't know which one |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you provide a concrete example here in a comment perhaps? This may be entirely a reflection of my lack of understanding around Mongo's semantics, but I can't get my head around when this would be used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good idea, I'll add more details in a comment.
OneOf is helpful mostly for numeric inputs, where we're expecting a numeric type, but we don't know specifically which one. We probably want to infer a specific numeric type instead of a union of all numeric types, which would present as ExtendedJSON
(the top type) in the API which isn't as helpful.
OTOH my thinking is Union is for cases where we have specific evidence a type variable is associated with multiple distinct concrete types. I'm mainly using Union to indicate nullability.
Another way to handle OneOf that could be less handwavy could be to use SubtypeOf(Union(...))
. But I'm not sure exactly how the semantics of that would play out.
|
||
C::Union(_) => Some(Type::ExtendedJSON), | ||
|
||
t @ C::OneOf(_) if t.is_numeric() => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Definitely not a thing for now, but I wonder how hard it would be to optionally include the user's argument types here to help guide us through ambiguity here. Choosing double seems absolutely acceptable for now though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, good point! Type annotations will show up as another constraint in the set of constraints for the type variable. So before we get to constraint_to_type
we'll see a call to simplify_pair
with OneOf(...)
and Scalar(_)
(or whatever concrete type is given by the annotation) which will simplify to the annotation type if that overlaps with one of the types in the OneOf
set. If there is no overlap we'll either report an error, or have some way of signalling that annotations suppress type mismatches.
loop { | ||
let prev_type_variables = type_variables.clone(); | ||
let prev_solutions = solutions.clone(); | ||
let prev_substitutions = substitutions.clone(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm glad your simplification was instead to make a clone of the substitutions for backtracking.
operand_type_hint: Option<TypeConstraint>, | ||
operator: &str, | ||
operand: Bson, | ||
) -> Result<TypeConstraint> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, broadly, forgive my massive handwaving, do I understand correctly that TypeConstraint
contains either a concrete type or information we have about an unknown type (ie, that it's Numeric
)?
And that we use subtyping to move constraints towards types when we combine them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's pretty much it with the addendum that a TypeConstraint
may reference a type variable. We have a type variable for the query result type, and another type variable for each parameter, and each variable is associated with a set of TypeConstraint
s. Since TypeConstraint
can reference variables that's a way for variables to indirectly reference each other.
Subtyping is part of it. I'm considering the query result variable to be covariant so multiple constraints converge on the supertype. Parameters are contravariant so those constraints should converge on the subtype, or lead to a mismatch error. But constraints can also relate variables in ways that are not related to subtyping. For example we've got FieldOf
which is a way to state that the type of a variable is the same as the type of a given field of a given object type.
@@ -0,0 +1,280 @@ | |||
Arithmetic Expression Operators | |||
|
|||
- [x] $abs - Returns the absolute value of a number. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume that Mongo has no deep concept of introspection for these functions, so it's just a case of special casing each one we want to support?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately MongoDB doesn't have much of a concept of typing at all. I don't think there is any way to introspect for signatures for functions like these.
It's possible we could build our own function signature language, and write signatures ourselves. But I thought it would be easier to hard-code this stuff, especially since the query language has a closed, limited set of functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this approach is absolutely fine, it's very clear what is going on which is great.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've left some questions to clarify my understanding, however I do not think they should stop us merging this work.
Thanks very much for the review and the feedback! |
b05e1ac
to
cf42e89
Compare
4094237
to
0b52183
Compare
Adds or refines support for these operators:
Arithmetic Expression Operators
$abs
$add
$divide
$multiply
$subtract
Array Expression Operators
$arrayElemAt
Boolean Expression Operators
$and
$not
$or
Comparison Expression Operators
$eq
$gt
$gte
$lt
$lte
$ne
Set Expression Operators
$allElementsTrue
$anyElementTrue
String Expression Operators
$split
Trigonometry Expression Operators
$sin
$cos
$tan
$asin
$acos
$atan
$asinh
$acosh
$atanh
$sinh
$cosh
$tanh
Accumulators (
$group
,$bucket
,$bucketAuto
,$setWindowFields
)$avg
$count
$max
$min
$push
$sum
Also improves type inference to make all of these operators work.
This is work an an in-progress feature that is gated behind a feature flag,
native-query-subcommand