这是indexloc提供的服务,不要输入任何密码
Skip to content

implement group by for single-column aggregates #144

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 44 commits into from
Mar 1, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
44 commits
Select commit Hold shift + click to select a range
c3ebb0e
split plan types into modules matching structure in ndc-models
hallettj Jan 22, 2025
b98a093
implement plan versions of grouping types
hallettj Jan 22, 2025
f1afb3c
wip: plan_for_grouping
hallettj Jan 24, 2025
48a4b4c
wip: plan_for_grouping
hallettj Jan 24, 2025
92fa5e0
plan_for_grouping
hallettj Jan 24, 2025
5d314f0
implement Hash and Eq for plan types
hallettj Jan 27, 2025
1ffb772
update relation plan to handle groups
hallettj Jan 28, 2025
9f5495b
mongodb sub-pipeline for single-column aggregates
hallettj Jan 28, 2025
b203968
test a group by query
hallettj Jan 29, 2025
000904a
building valid queries for basic group by, need to handle response
hallettj Jan 29, 2025
e89614e
serialize groups in query response
hallettj Jan 29, 2025
0a7df71
remove unused imports
hallettj Jan 29, 2025
7581767
remove the macro that I deprecated
hallettj Jan 30, 2025
f13675a
unpack groups from faceted response
hallettj Feb 3, 2025
6d4524c
add groups to faceted pipelines
hallettj Feb 3, 2025
24226e6
pare down on magic strings; add groups to deserialization type inference
hallettj Feb 3, 2025
0d98234
propagate groups to relationship plans
hallettj Feb 3, 2025
36de643
fixing things, further pare down on magic strings
hallettj Feb 3, 2025
4c5429b
select groups from foreach response
hallettj Feb 3, 2025
3100114
wip
hallettj Feb 4, 2025
2b3b1f2
select group results through relationship field reference
hallettj Feb 5, 2025
740c241
integration tests
hallettj Feb 5, 2025
a01fa85
partial support for ordering groups
hallettj Feb 5, 2025
0487d73
update tests to get consistent ordering
hallettj Feb 5, 2025
80b9256
aggregate all numeric values while skipping strings
hallettj Feb 5, 2025
145de1a
tests
hallettj Feb 5, 2025
eda1fa5
add display impls for types to get reasonable error output
hallettj Feb 5, 2025
0b9d8d7
move limit to shared pipeline stages instead of duplicating in rows a…
hallettj Feb 6, 2025
eca6ab5
fix computed type for response row set with groups
hallettj Feb 6, 2025
0634014
convert groups aggregate results to required type
hallettj Feb 6, 2025
2b881ab
apply the same type conversion to root aggregate results
hallettj Feb 6, 2025
3cda346
reduce size of some integration test snapshots
hallettj Feb 6, 2025
9e93d48
test getting both fields and groups through a relationship
hallettj Feb 6, 2025
dc49a72
add ticket number to todo
hallettj Feb 6, 2025
6878e8b
update changelog
hallettj Feb 6, 2025
521c1b2
update openssl and system dep due to vulnerability report
hallettj Feb 6, 2025
20f08d4
consistent ordering for integration test snapshots
hallettj Feb 6, 2025
015dfd0
had the relationship keys mis-capitalized again
hallettj Feb 6, 2025
fa0131a
add tuple type to express type of dimensions in response
hallettj Feb 7, 2025
78dc016
type for dimension value from relationship should be an array
hallettj Feb 8, 2025
92c49ca
update integration tests
hallettj Feb 8, 2025
5788933
update unit test
hallettj Feb 8, 2025
cdc5e2a
remove outdated todo comment
hallettj Feb 28, 2025
0d64911
Merge branch 'main' into jessehallett/eng-1486-mongodb-implement-grou…
hallettj Mar 1, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 22 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,9 +4,15 @@ This changelog documents the changes between release versions.

## [Unreleased v2]

### Added

- You can now group documents for aggregation according to multiple grouping criteria ([#144](https://github.com/hasura/ndc-mongodb/pull/144))

### Changed

- **BREAKING:** Update to ndc-spec v0.2 ([#139](https://github.com/hasura/ndc-mongodb/pull/139))
- **BREAKING:** Remove custom count aggregation - use standard count instead ([#144](https://github.com/hasura/ndc-mongodb/pull/144))
- Results for `avg` and `sum` aggregations are coerced to consistent result types ([#144](https://github.com/hasura/ndc-mongodb/pull/144))

#### ndc-spec v0.2

Expand All @@ -26,7 +32,23 @@ changelog](https://hasura.github.io/ndc-spec/specification/changelog.html#020).
Use of the new spec requires a version of GraphQL Engine that supports ndc-spec
v0.2, and there are required metadata changes.

#### Removed custom count aggregation

Previously there were two options for getting document counts named `count` and
`_count`. These did the same thing. `count` has been removed - use `_count`
instead.

#### Results for `avg` and `sum` aggregations are coerced to consistent result types

This change is required for compliance with ndc-spec.

Results for `avg` are always coerced to `double`.

Results for `sum` are coerced to `double` if the summed inputs use a fractional
numeric type, or to `long` if inputs use an integral numeric type.

## [Unreleased v1]

### Added

- Add uuid scalar type ([#148](https://github.com/hasura/ndc-mongodb/pull/148))
Expand Down
15 changes: 14 additions & 1 deletion crates/configuration/src/mongo_scalar_type.rs
Original file line number Diff line number Diff line change
@@ -1,7 +1,9 @@
use std::fmt::Display;

use mongodb_support::{BsonScalarType, EXTENDED_JSON_TYPE_NAME};
use ndc_query_plan::QueryPlanError;

#[derive(Debug, Clone, PartialEq, Eq)]
#[derive(Debug, Clone, Hash, PartialEq, Eq)]
pub enum MongoScalarType {
/// One of the predefined BSON scalar types
Bson(BsonScalarType),
Expand Down Expand Up @@ -40,3 +42,14 @@ impl TryFrom<&ndc_models::ScalarTypeName> for MongoScalarType {
}
}
}

impl Display for MongoScalarType {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
match self {
MongoScalarType::ExtendedJSON => write!(f, "extendedJSON"),
MongoScalarType::Bson(bson_scalar_type) => {
write!(f, "{}", bson_scalar_type.bson_name())
}
}
}
}
3 changes: 1 addition & 2 deletions crates/integration-tests/src/tests/aggregation.rs
Original file line number Diff line number Diff line change
Expand Up @@ -131,7 +131,7 @@ async fn returns_zero_when_counting_empty_result_set() -> anyhow::Result<()> {
moviesAggregate(filter_input: {where: {title: {_eq: "no such movie"}}}) {
_count
title {
count
_count
}
}
}
Expand All @@ -152,7 +152,6 @@ async fn returns_zero_when_counting_nested_fields_in_empty_result_set() -> anyho
moviesAggregate(filter_input: {where: {title: {_eq: "no such movie"}}}) {
awards {
nominations {
count
_count
}
}
Expand Down
134 changes: 134 additions & 0 deletions crates/integration-tests/src/tests/grouping.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,134 @@
use insta::assert_yaml_snapshot;
use ndc_test_helpers::{
asc, binop, column_aggregate, dimension_column, field, grouping, or, ordered_dimensions, query,
query_request, target, value,
};

use crate::{connector::Connector, run_connector_query};

#[tokio::test]
async fn runs_single_column_aggregate_on_groups() -> anyhow::Result<()> {
assert_yaml_snapshot!(
run_connector_query(
Connector::SampleMflix,
query_request().collection("movies").query(
query()
// The predicate avoids an error when encountering documents where `year` is
// a string instead of a number.
.predicate(or([
binop("_gt", target!("year"), value!(0)),
binop("_lte", target!("year"), value!(0)),
]))
.order_by([asc!("_id")])
.limit(10)
.groups(
grouping()
.dimensions([dimension_column("year")])
.aggregates([
(
"average_viewer_rating",
column_aggregate("tomatoes.viewer.rating", "avg"),
),
("max_runtime", column_aggregate("runtime", "max")),
])
.order_by(ordered_dimensions()),
),
),
)
.await?
);
Ok(())
}

#[tokio::test]
async fn groups_by_multiple_dimensions() -> anyhow::Result<()> {
assert_yaml_snapshot!(
run_connector_query(
Connector::SampleMflix,
query_request().collection("movies").query(
query()
.predicate(binop("_lt", target!("year"), value!(1950)))
.order_by([asc!("_id")])
.limit(10)
.groups(
grouping()
.dimensions([
dimension_column("year"),
dimension_column("languages"),
dimension_column("rated"),
])
.aggregates([(
"average_viewer_rating",
column_aggregate("tomatoes.viewer.rating", "avg"),
)])
.order_by(ordered_dimensions()),
),
),
)
.await?
);
Ok(())
}

#[tokio::test]
async fn combines_aggregates_and_groups_in_one_query() -> anyhow::Result<()> {
assert_yaml_snapshot!(
run_connector_query(
Connector::SampleMflix,
query_request().collection("movies").query(
query()
.predicate(binop("_gte", target!("year"), value!(2000)))
.order_by([asc!("_id")])
.limit(10)
.aggregates([(
"average_viewer_rating",
column_aggregate("tomatoes.viewer.rating", "avg")
)])
.groups(
grouping()
.dimensions([dimension_column("year"),])
.aggregates([(
"average_viewer_rating_by_year",
column_aggregate("tomatoes.viewer.rating", "avg"),
)])
.order_by(ordered_dimensions()),
),
),
)
.await?
);
Ok(())
}

#[tokio::test]
async fn combines_fields_and_groups_in_one_query() -> anyhow::Result<()> {
assert_yaml_snapshot!(
run_connector_query(
Connector::SampleMflix,
query_request().collection("movies").query(
query()
// The predicate avoids an error when encountering documents where `year` is
// a string instead of a number.
.predicate(or([
binop("_gt", target!("year"), value!(0)),
binop("_lte", target!("year"), value!(0)),
]))
.order_by([asc!("_id")])
.limit(3)
.fields([field!("title"), field!("year")])
.order_by([asc!("_id")])
.groups(
grouping()
.dimensions([dimension_column("year")])
.aggregates([(
"average_viewer_rating_by_year",
column_aggregate("tomatoes.viewer.rating", "avg"),
)])
.order_by(ordered_dimensions()),
)
),
)
.await?
);
Ok(())
}
119 changes: 117 additions & 2 deletions crates/integration-tests/src/tests/local_relationship.rs
Original file line number Diff line number Diff line change
@@ -1,9 +1,10 @@
use crate::{connector::Connector, graphql_query, run_connector_query};
use insta::assert_yaml_snapshot;
use ndc_test_helpers::{
asc, binop, exists, field, query, query_request, related, relation_field,
relationship, target, value,
asc, binop, column, column_aggregate, dimension_column, exists, field, grouping, is_in,
ordered_dimensions, query, query_request, related, relation_field, relationship, target, value,
};
use serde_json::json;

#[tokio::test]
async fn joins_local_relationships() -> anyhow::Result<()> {
Expand Down Expand Up @@ -243,3 +244,117 @@ async fn joins_relationships_on_nested_key() -> anyhow::Result<()> {
);
Ok(())
}

#[tokio::test]
async fn groups_by_related_field() -> anyhow::Result<()> {
assert_yaml_snapshot!(
run_connector_query(
Connector::Chinook,
query_request()
.collection("Track")
.query(
query()
// avoid albums that are modified in mutation tests
.predicate(is_in(
target!("AlbumId"),
[json!(15), json!(91), json!(227)]
))
.groups(
grouping()
.dimensions([dimension_column(
column("Name").from_relationship("track_genre")
)])
.aggregates([(
"average_price",
column_aggregate("UnitPrice", "avg")
)])
.order_by(ordered_dimensions())
)
)
.relationships([(
"track_genre",
relationship("Genre", [("GenreId", &["GenreId"])]).object_type()
)])
)
.await?
);
Ok(())
}

#[tokio::test]
async fn gets_groups_through_relationship() -> anyhow::Result<()> {
assert_yaml_snapshot!(
run_connector_query(
Connector::Chinook,
query_request()
.collection("Album")
.query(
query()
// avoid albums that are modified in mutation tests
.predicate(is_in(target!("AlbumId"), [json!(15), json!(91), json!(227)]))
.order_by([asc!("_id")])
.fields([field!("AlbumId"), relation_field!("tracks" => "album_tracks", query()
.groups(grouping()
.dimensions([dimension_column(column("Name").from_relationship("track_genre"))])
.aggregates([
("AlbumId", column_aggregate("AlbumId", "avg")),
("average_price", column_aggregate("UnitPrice", "avg")),
])
.order_by(ordered_dimensions()),
)
)])
)
.relationships([
(
"album_tracks",
relationship("Track", [("AlbumId", &["AlbumId"])])
),
(
"track_genre",
relationship("Genre", [("GenreId", &["GenreId"])]).object_type()
)
])
)
.await?
);
Ok(())
}

#[tokio::test]
async fn gets_fields_and_groups_through_relationship() -> anyhow::Result<()> {
assert_yaml_snapshot!(
run_connector_query(
Connector::Chinook,
query_request()
.collection("Album")
.query(
query()
.predicate(is_in(target!("AlbumId"), [json!(15), json!(91), json!(227)]))
.order_by([asc!("_id")])
.fields([field!("AlbumId"), relation_field!("tracks" => "album_tracks", query()
.order_by([asc!("_id")])
.fields([field!("AlbumId"), field!("Name"), field!("UnitPrice")])
.groups(grouping()
.dimensions([dimension_column(column("Name").from_relationship("track_genre"))])
.aggregates([(
"average_price", column_aggregate("UnitPrice", "avg")
)])
.order_by(ordered_dimensions()),
)
)])
)
.relationships([
(
"album_tracks",
relationship("Track", [("AlbumId", &["AlbumId"])])
),
(
"track_genre",
relationship("Genre", [("GenreId", &["GenreId"])]).object_type()
)
])
)
.await?
);
Ok(())
}
1 change: 1 addition & 0 deletions crates/integration-tests/src/tests/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ mod aggregation;
mod basic;
mod expressions;
mod filtering;
mod grouping;
mod local_relationship;
mod native_mutation;
mod native_query;
Expand Down
Loading