-
Notifications
You must be signed in to change notification settings - Fork 2.8k
add statistical aggregate operations and count on columns (close #1028) #1029
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Review app available at: https://hge-ci-pull-1029.herokuapp.com |
|
Review app available at: https://hge-ci-pull-1029.herokuapp.com |
|
@rakeshkky Can you also add relevant docs? |
|
@rakeshkky Can you also add, |
|
Review app available at: https://hge-ci-pull-1029.herokuapp.com |
Updated #991 PR |
|
Review app available at: https://hge-ci-pull-1029.herokuapp.com |
|
Review app https://hge-ci-pull-1029.herokuapp.com is deleted |
<!-- The PR description should answer 2 important questions: -->
### What
- Adds datafusion row metrics to our NDC query and aggregate nodes, for
explain output
- Aggregates all datafusion metrics in the trace attributes:
- `rows_processed`, i.e. total number of rows considered over all
execution plan nodes
- `elapsed_compute`, i.e. CPU time spent in _processing_ data (not
fetching it)
- Adds the explain output to the `create_logical_plan` span.
E.g. a query we don't push down to NDC:
```sql
SELECT
COUNT(42 * invoiceId) AS odd_count
FROM
InvoiceLine;
```
Attributes:
```text
rows_processed: 2242
total_rows: 1
elapsed_compute: 417
logical_plan: Projection: count(Int64(42) * InvoiceLine.invoiceId) AS odd_count
Aggregate: groupBy=[[]], aggr=[[count(Int64(42) * InvoiceLine.invoiceId)]]
TableScan: InvoiceLine
```
The metrics clearly indicate that the cost in terms of rows processed
per row returned (2242 / 1) is very high in this case. The logical plan
makes it clear why this was the case: we failed to push down the
aggregate node.
### How
<!-- How is it trying to accomplish it (what are the implementation
steps)? -->
V3_GIT_ORIGIN_REV_ID: c26cce9adab9d0feb0a7d2873a3eea38542564a0
Description
What component does this PR affect?
Requires changes from other components? If yes, please mark the components:
Related Issue
close #1028
Solution and Design
stddev,stddev_pop,varianceandvar_popCOUNTon columns (also usingDISTINCT) via arguments tocountfieldEx:-
count(columns: [id, author_id], distinct: true)Type
Checklist: