The generated SQL does not uses indexes and performs sub-optimally

We are experiencing a performance issue related to the fact that Hasura generates a sub-optional SQL query.

Simplified, let's say that we have `order` table which has `id`, `supplierA`, and `supplierB` fields. We also have `supplier` table with `a`, `b`, and `name` fields.

The relation is the following:

```
order . ( supplierA, supplierB ) → supplier . ( a, b )
```

We issue a query like this:

```
query MyQuery {
  order(
      order_by: {supplier: {name: asc}},
      limit: 10) {
    id
  }
}
```

The idea here is to get `order`s which are ordered by `name`s of corresponding suppliers. Here is the SQL code that Hasura generates:

```sql
SELECT
  coalesce(
    json_agg(
      "root"
      ORDER BY
        "root.or.supplier.pg.name" ASC NULLS LAST
    ),
    '[]'
  ) AS "root"
FROM
  (
    SELECT
      "_2_root.or.supplier"."root.or.supplier.pg.name" AS "root.or.supplier.pg.name",
      row_to_json(
        (
          SELECT
            "_3_e"
          FROM
            (
              SELECT
                "_0_root.base"."id" AS "id"
            ) AS "_3_e"
        )
      ) AS "root"
    FROM
      (
        SELECT
          *
        FROM
          "public"."order"
        WHERE
          ('true')
      ) AS "_0_root.base"
      LEFT OUTER JOIN LATERAL (
        SELECT
          "_1_root.or.supplier.base"."name" AS "root.or.supplier.pg.name"
        FROM
          (
            SELECT
              *
            FROM
              "public"."supplier"
            WHERE
              (
                (
                  ("_0_root.base"."supplierB") = ("b")
                )
                AND (
                  ("_0_root.base"."supplierA") = ("a")
                )
              )
          ) AS "_1_root.or.supplier.base"
      ) AS "_2_root.or.supplier" ON ('true')
    ORDER BY
      "root.or.supplier.pg.name" ASC NULLS LAST
    LIMIT
      10
  ) AS "_4_root"
```

If we run `EXPLAIN ANALYZE` on it, we can see the following output:

```
Aggregate  (cost=186286.83..186286.84 rows=1 width=32) (actual time=2246.618..2246.618 rows=1 loops=1)
  ->  Limit  (cost=186286.68..186286.70 rows=10 width=52) (actual time=2246.584..2246.586 rows=10 loops=1)
        ->  Sort  (cost=186286.68..189446.60 rows=1263970 width=52) (actual time=2246.582..2246.583 rows=10 loops=1)
              Sort Key: supplier.name
              Sort Method: top-N heapsort  Memory: 26kB
              ->  Hash Left Join  (cost=645.57..158972.74 rows=1263970 width=52) (actual time=4.202..1961.412 rows=1269435 loops=1)
                    Hash Cond: (("order"."supplierB" = supplier."b") AND ("order"."supplierA" = supplier."a"))
                    ->  Seq Scan on "order"  (cost=0.00..135891.70 rows=1263970 width=19) (actual time=0.033..734.237 rows=1269435 loops=1)
                    ->  Hash  (cost=521.03..521.03 rows=8303 width=31) (actual time=4.115..4.116 rows=8303 loops=1)
                          Buckets: 16384  Batches: 1  Memory Usage: 662kB
                          ->  Seq Scan on supplier  (cost=0.00..521.03 rows=8303 width=31) (actual time=0.010..2.370 rows=8303 loops=1)
                    SubPlan 1
                      ->  Result  (cost=0.00..0.01 rows=1 width=32) (actual time=0.000..0.000 rows=1 loops=1269435)
Planning Time: 1.139 ms
Execution Time: 2246.722 ms
```

An important detail here is that there exist indexes:

```sql
CREATE INDEX "order_supplierKeys_idx" ON public."order" ("supplierB", "supplierA")
CREATE UNIQUE INDEX "supplier_keys" ON public.supplier ("a", "b")
CREATE INDEX supplier_name_idx ON public.supplier ("name")
```

Yet they seem to be unused in this case.

However, if we re-write the query manually:

```sql
select o.id
from "order" as o
join supplier as s on o."supplierB" = s."b" AND o."supplierA" = s."a"
order by s."name" asc
limit 10
```

The indexes are used and `EXPLAIN ANALYZE` returns the following result:

```
Limit  (cost=0.71..880.38 rows=10 width=28) (actual time=0.038..0.060 rows=10 loops=1)
  ->  Nested Loop  (cost=0.71..56299.45 rows=640 width=28) (actual time=0.037..0.058 rows=10 loops=1)
        ->  Index Scan using supplier_name_idx on supplier s  (cost=0.29..2244.82 rows=8303 width=31) (actual time=0.015..0.016 rows=1 loops=1)
        ->  Index Scan using "order_supplierKeys_idx" on "order" o  (cost=0.43..6.50 rows=1 width=19) (actual time=0.018..0.038 rows=10 loops=1)
              Index Cond: (("supplierB" = s."b") AND ("supplierA" = s."a"))
Planning Time: 1.317 ms
Execution Time: 0.106 ms
```

Would it be possible to improve performance of the generated SQL in the cases like this?


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

The generated SQL does not uses indexes and performs sub-optimally #5949

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

The generated SQL does not uses indexes and performs sub-optimally #5949

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions