Enhanced variables support #12

codingkarthik · 2025-01-20T06:38:05Z

Connector Query Logic Refactoring

This PR refactors the query generation and execution logic in the Calcite connector to improve maintainability and handle variables more robustly.

Key changes:

Simplified query execution flow by separating query plan generation from execution
Improved variables handling with proper CTE (Common Table Expression) support
Removed relationship handling code as it's currently not supported
Added comprehensive unit tests for aggregate query generation
Enhanced error handling with more specific error types
Fixed bugs around JSON object support and column qualifiers

Technical Details

Introduced QueryPlan struct to separate query generation from execution
Added proper support for variables using CTEs instead of direct substitution
Created dedicated error types for better error handling
Improved aggregate query generation with proper column qualification
Added unit tests for SQL generation focusing on aggregates

Variables Handling Deep Dive

Previous Approach

Previously, if there were n variable sets in a request, then n separate database queries were made. For example, if you wanted to fetch orders where the total was greater than [100, 200, 300], the connector would make three separate queries:

SELECT * FROM orders WHERE total > 100;
SELECT * FROM orders WHERE total > 200;
SELECT * FROM orders WHERE total > 300;

This led to:

Multiple database round trips
Poor performance on requests with many variable sets
Database having to parse and plan each query separately

New Approach

The new implementation batches all variable sets into a single query using Common Table Expressions (CTEs). It works like this:

First, we create a CTE containing all variable sets and their values:

WITH hasura_cte_vars AS (
    SELECT 0 as __var_set_index, 100 as min_total  -- First variable set
    UNION ALL 
    SELECT 1 as __var_set_index, 200 as min_total  -- Second variable set
    UNION ALL
    SELECT 2 as __var_set_index, 300 as min_total  -- Third variable set
)

Then we join this CTE with our main query, tracking which result belongs to which variable set:

SELECT 
    hasura_cte_vars.__var_set_index,  -- Track which result belongs to which variable set
    orders.*
FROM orders
CROSS JOIN hasura_cte_vars 
WHERE orders.total > hasura_cte_vars.min_total;

For aggregates, we automatically group by the variable set index:

WITH "vars" AS (
  SELECT 0 as "__var_set_index", 1 as "AlbumId"
  UNION ALL
  SELECT 1 as "__var_set_index", 2 as "AlbumId"
)
SELECT
  "t"."__var_set_index",
  COUNT("t".*) AS "how_many_albums",
  COUNT("t"."ArtistId") AS "how_many_artist_ids",
  COUNT(DISTINCT "t"."ArtistId") AS "how_many_distinct_artist_ids",
  MIN("t"."ArtistId") AS "min_artist_id",
  MAX("t"."ArtistId") AS "max_artist_id",
  AVG("t"."ArtistId") AS "avg_artist_id"
FROM (
  SELECT "a"."ArtistId", "vars"."__var_set_index"
  FROM "albums" "a"
  CROSS JOIN "vars"
  WHERE "a"."AlbumId" > "vars"."AlbumId"
) "t"
GROUP BY "t"."__var_set_index"

Benefits:

Single database round trip regardless of variable set count
Better database performance through batch processing
Database can potentially optimize the query plan across all variable sets
Simpler client-side processing of results
Clean separation of results between variable sets

Changes to Explain API

The explain output structure has been updated to provide more detailed insights into how queries with variable sets are executed. The explain response now includes:

Generated SQL queries:
- row_query: The main query with the variables CTE
- aggregate_query: The separate aggregates query (if aggregates were requested)
Query execution plans:
- rows_explain: Detailed execution plan for the main rows query
- aggregates_explain: Execution plan for the aggregates query

For example, given a query that searches albums by title with multiple search patterns and computes aggregates, you'll see:

-- The variables CTE used in both queries
WITH "hasura_cte_vars" AS (
    SELECT 0 AS "__var_set_index", '%Quest%' AS "search"
    UNION ALL 
    SELECT 1 AS "__var_set_index", 'Amazing' AS "search"
    UNION ALL
    SELECT 2 AS "__var_set_index", '%Rio%' AS "search"
    -- ... more variable sets
)

-- Main row query
SELECT 
    hasura_cte_vars.__var_set_index,
    "TEST"."album"."Title" AS "Title",
    "TEST"."album"."ArtistId" AS "ArtistId" 
FROM "TEST"."album" 
CROSS JOIN "hasura_cte_vars" 
WHERE ("Title" LIKE "hasura_cte_vars"."search") 
ORDER BY "AlbumId" ASC

-- Aggregates query
SELECT 
    COUNT(*) AS "how_many_albums",
    COUNT("aggregates_subquery"."ArtistId") AS "how_many_artist_ids",
    COUNT(DISTINCT "aggregates_subquery"."ArtistId") AS "how_many_distinct_artist_ids",
    -- ... more aggregates
    "aggregates_subquery"."__var_set_index"
FROM (...) AS aggregates_subquery 
GROUP BY "aggregates_subquery"."__var_set_index"

Breaking Changes

Removed support for local relationships, relationships can be done in the engine using "remote relationships".

codingkarthik · 2025-01-27T15:41:49Z

Tests passing

ndc-test-local replay --endpoint http://localhost:8080  --snapshots-dir adapters/databricks/test-cases

├ Schema ... OK
├ Query ...
│ ├ aggregate_and_rows_with_offset_and_limit ... OK
│ ├ ordering_by_multiple_fields ... OK
│ ├ select_by_pk ... OK
│ ├ select_deeply_nested_predicate ... OK
│ ├ select_int_and_string ... OK
│ ├ select_predicate_eq_text_field ... OK
│ ├ select_simple_predicate_with_order_by ... OK
│ ├ select_where_album_id_equals_self ... OK
│ ├ select_where_album_id_greater_than_or_equal_to ... OK
│ ├ select_where_album_id_less_than ... OK
│ ├ select_where_album_id_less_than_or_equal_to ... OK
│ ├ select_where_text_field_in ... OK
│ ├ select_where_text_in_empty_array ... OK
│ ├ select_where_text_like ... OK
│ ├ select_where_text_not_equal_to ... OK
│ ├ select_where_text_not_in ... OK
│ ├ select_where_text_not_like ... OK
│ ├ select_where_variable ... OK
│ ├ select_where_variable_int_with_null_variable_value ... OK
│ ├ select_where_with_no_variable_values ... OK
│ ├ select_with_nested_and_predicate ... OK
│ ├ simple_aggregate_count ... OK
│ ├ simple_select_orderby_limit_offset ... OK

codingkarthik added 30 commits January 6, 2025 13:45

Fix the Dockerfile

97428a7

Initial commit of calcite sql cmd

8a9ea2d

use calcite's sqlline

a2e4f9b

use calcite version 1.38.0

0b5e7d7

WIP commit

989defa

Generate a single SQL query for queries with variables

c8481b3

remove other stuff

864eb49

add support for grouping the flat rows by variables

7fdd8a3

Report errors from calcite elegantly

c217dac

no-op refactors

44fc0bc

create a new type QualifiedTable

ab6cc6b

more no-op refactors

baa022d

remove redundant arg to select

b39316d

refactor supports_json_object

31a030d

initialize SQL_OPERATIONS once!

b01dd09

Aggregate queries work with variables!!!

c986961

Merge branch 'main' into kc/variables-with-sqlline

2be9d1d

unstage calcite-cli-util

054634d

remove debug statements

eb2c08b

Handle missing variables correctly

2d6836c

minor improvements

38db606

correctly handle missing variable values

6493755

Add tests for group_by_rows function

5492d38

remove redundant functiosn

2433448

obviate orchastrate_query

123e838

fix *MOST* of the warnings

9ada862

Have an awesome explain response

d79a361

Parse the JDBC execution plan separately in the explain response

30fe284

added preliminary support for unrelated collection exists

ab5cbf3

remove relationships as capabilities

f549517

codingkarthik added 4 commits January 27, 2025 14:38

support the _neq operator

22d70d3

minor fixes

c55df9e

use proper error enum

7efb291

fix empty array valriable values

6987681

fix warnings

7705367

gneeri self-requested a review January 27, 2025 15:57

gneeri approved these changes Jan 27, 2025

View reviewed changes

codingkarthik added 21 commits January 28, 2025 12:26

add a workflow to push docker images

4f04e70

rename the workflow file

934346b

use array yaml notation

67aab32

Merge branch 'main' into kc/variables-with-sqlline

9ba6665

modify the dockerfile

2c5060a

remove faulty copy step

8acfa70

more modification

bf6aca0

Dockerfile without referencing build.sh

ca1aa93

make gradlew executable

715fd10

more changes

b9de22c

use mvn clean install

9fcb197

modify the dockefile

a99982a

use build.sh

48b5801

recursive submodules checkout

0aa5bdd

install pkg-config

5d3a19e

separate mount flags

a153eec

mkdir the app/target

845ee5f

one more mkdir

f936b91

remove cahcing bits

009045f

Include the filter, pagination in the aggregation request

587da4b

add explicit permisssions

c412158

codingkarthik merged commit 3a8ccc2 into main Jan 29, 2025
1 check failed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Enhanced variables support #12

Enhanced variables support #12

Uh oh!

codingkarthik commented Jan 20, 2025 •

edited

Loading

Uh oh!

codingkarthik commented Jan 27, 2025

Uh oh!

Uh oh!

Uh oh!

Enhanced variables support #12

Enhanced variables support #12

Uh oh!

Conversation

codingkarthik commented Jan 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Connector Query Logic Refactoring

Technical Details

Variables Handling Deep Dive

Previous Approach

New Approach

Changes to Explain API

Breaking Changes

Uh oh!

codingkarthik commented Jan 27, 2025

Uh oh!

Uh oh!

Uh oh!

codingkarthik commented Jan 20, 2025 •

edited

Loading