这是indexloc提供的服务,不要输入任何密码
Skip to content

Enhanced variables support #12

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 62 commits into from
Jan 29, 2025
Merged

Enhanced variables support #12

merged 62 commits into from
Jan 29, 2025

Conversation

codingkarthik
Copy link
Collaborator

@codingkarthik codingkarthik commented Jan 20, 2025

Connector Query Logic Refactoring

This PR refactors the query generation and execution logic in the Calcite connector to improve maintainability and handle variables more robustly.

Key changes:

  • Simplified query execution flow by separating query plan generation from execution
  • Improved variables handling with proper CTE (Common Table Expression) support
  • Removed relationship handling code as it's currently not supported
  • Added comprehensive unit tests for aggregate query generation
  • Enhanced error handling with more specific error types
  • Fixed bugs around JSON object support and column qualifiers

Technical Details

  • Introduced QueryPlan struct to separate query generation from execution
  • Added proper support for variables using CTEs instead of direct substitution
  • Created dedicated error types for better error handling
  • Improved aggregate query generation with proper column qualification
  • Added unit tests for SQL generation focusing on aggregates

Variables Handling Deep Dive

Previous Approach

Previously, if there were n variable sets in a request, then n separate database queries were made. For example, if you wanted to fetch orders where the total was greater than [100, 200, 300], the connector would make three separate queries:

SELECT * FROM orders WHERE total > 100;
SELECT * FROM orders WHERE total > 200;
SELECT * FROM orders WHERE total > 300;

This led to:

  • Multiple database round trips
  • Poor performance on requests with many variable sets
  • Database having to parse and plan each query separately

New Approach

The new implementation batches all variable sets into a single query using Common Table Expressions (CTEs). It works like this:

  1. First, we create a CTE containing all variable sets and their values:
WITH hasura_cte_vars AS (
    SELECT 0 as __var_set_index, 100 as min_total  -- First variable set
    UNION ALL 
    SELECT 1 as __var_set_index, 200 as min_total  -- Second variable set
    UNION ALL
    SELECT 2 as __var_set_index, 300 as min_total  -- Third variable set
)
  1. Then we join this CTE with our main query, tracking which result belongs to which variable set:
SELECT 
    hasura_cte_vars.__var_set_index,  -- Track which result belongs to which variable set
    orders.*
FROM orders
CROSS JOIN hasura_cte_vars 
WHERE orders.total > hasura_cte_vars.min_total;
  1. For aggregates, we automatically group by the variable set index:
WITH "vars" AS (
  SELECT 0 as "__var_set_index", 1 as "AlbumId"
  UNION ALL
  SELECT 1 as "__var_set_index", 2 as "AlbumId"
)
SELECT
  "t"."__var_set_index",
  COUNT("t".*) AS "how_many_albums",
  COUNT("t"."ArtistId") AS "how_many_artist_ids",
  COUNT(DISTINCT "t"."ArtistId") AS "how_many_distinct_artist_ids",
  MIN("t"."ArtistId") AS "min_artist_id",
  MAX("t"."ArtistId") AS "max_artist_id",
  AVG("t"."ArtistId") AS "avg_artist_id"
FROM (
  SELECT "a"."ArtistId", "vars"."__var_set_index"
  FROM "albums" "a"
  CROSS JOIN "vars"
  WHERE "a"."AlbumId" > "vars"."AlbumId"
) "t"
GROUP BY "t"."__var_set_index"

Benefits:

  • Single database round trip regardless of variable set count
  • Better database performance through batch processing
  • Database can potentially optimize the query plan across all variable sets
  • Simpler client-side processing of results
  • Clean separation of results between variable sets

Changes to Explain API

The explain output structure has been updated to provide more detailed insights into how queries with variable sets are executed. The explain response now includes:

  1. Generated SQL queries:

    • row_query: The main query with the variables CTE
    • aggregate_query: The separate aggregates query (if aggregates were requested)
  2. Query execution plans:

    • rows_explain: Detailed execution plan for the main rows query
    • aggregates_explain: Execution plan for the aggregates query

For example, given a query that searches albums by title with multiple search patterns and computes aggregates, you'll see:

-- The variables CTE used in both queries
WITH "hasura_cte_vars" AS (
    SELECT 0 AS "__var_set_index", '%Quest%' AS "search"
    UNION ALL 
    SELECT 1 AS "__var_set_index", 'Amazing' AS "search"
    UNION ALL
    SELECT 2 AS "__var_set_index", '%Rio%' AS "search"
    -- ... more variable sets
)

-- Main row query
SELECT 
    hasura_cte_vars.__var_set_index,
    "TEST"."album"."Title" AS "Title",
    "TEST"."album"."ArtistId" AS "ArtistId" 
FROM "TEST"."album" 
CROSS JOIN "hasura_cte_vars" 
WHERE ("Title" LIKE "hasura_cte_vars"."search") 
ORDER BY "AlbumId" ASC

-- Aggregates query
SELECT 
    COUNT(*) AS "how_many_albums",
    COUNT("aggregates_subquery"."ArtistId") AS "how_many_artist_ids",
    COUNT(DISTINCT "aggregates_subquery"."ArtistId") AS "how_many_distinct_artist_ids",
    -- ... more aggregates
    "aggregates_subquery"."__var_set_index"
FROM (...) AS aggregates_subquery 
GROUP BY "aggregates_subquery"."__var_set_index"

Breaking Changes

  • Removed support for local relationships, relationships can be done in the engine using "remote relationships".

@codingkarthik
Copy link
Collaborator Author

Tests passing

ndc-test-local replay --endpoint http://localhost:8080  --snapshots-dir adapters/databricks/test-cases

├ Schema ... OK
├ Query ...
│ ├ aggregate_and_rows_with_offset_and_limit ... OK
│ ├ ordering_by_multiple_fields ... OK
│ ├ select_by_pk ... OK
│ ├ select_deeply_nested_predicate ... OK
│ ├ select_int_and_string ... OK
│ ├ select_predicate_eq_text_field ... OK
│ ├ select_simple_predicate_with_order_by ... OK
│ ├ select_where_album_id_equals_self ... OK
│ ├ select_where_album_id_greater_than_or_equal_to ... OK
│ ├ select_where_album_id_less_than ... OK
│ ├ select_where_album_id_less_than_or_equal_to ... OK
│ ├ select_where_text_field_in ... OK
│ ├ select_where_text_in_empty_array ... OK
│ ├ select_where_text_like ... OK
│ ├ select_where_text_not_equal_to ... OK
│ ├ select_where_text_not_in ... OK
│ ├ select_where_text_not_like ... OK
│ ├ select_where_variable ... OK
│ ├ select_where_variable_int_with_null_variable_value ... OK
│ ├ select_where_with_no_variable_values ... OK
│ ├ select_with_nested_and_predicate ... OK
│ ├ simple_aggregate_count ... OK
│ ├ simple_select_orderby_limit_offset ... OK

@gneeri gneeri self-requested a review January 27, 2025 15:57
@codingkarthik codingkarthik merged commit 3a8ccc2 into main Jan 29, 2025
1 check failed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants