这是indexloc提供的服务,不要输入任何密码
Skip to content

The generated SQL does not uses indexes and performs sub-optimally #5949

@mrkkrp

Description

@mrkkrp

We are experiencing a performance issue related to the fact that Hasura generates a sub-optional SQL query.

Simplified, let's say that we have order table which has id, supplierA, and supplierB fields. We also have supplier table with a, b, and name fields.

The relation is the following:

order . ( supplierA, supplierB ) → supplier . ( a, b )

We issue a query like this:

query MyQuery {
  order(
      order_by: {supplier: {name: asc}},
      limit: 10) {
    id
  }
}

The idea here is to get orders which are ordered by names of corresponding suppliers. Here is the SQL code that Hasura generates:

SELECT
  coalesce(
    json_agg(
      "root"
      ORDER BY
        "root.or.supplier.pg.name" ASC NULLS LAST
    ),
    '[]'
  ) AS "root"
FROM
  (
    SELECT
      "_2_root.or.supplier"."root.or.supplier.pg.name" AS "root.or.supplier.pg.name",
      row_to_json(
        (
          SELECT
            "_3_e"
          FROM
            (
              SELECT
                "_0_root.base"."id" AS "id"
            ) AS "_3_e"
        )
      ) AS "root"
    FROM
      (
        SELECT
          *
        FROM
          "public"."order"
        WHERE
          ('true')
      ) AS "_0_root.base"
      LEFT OUTER JOIN LATERAL (
        SELECT
          "_1_root.or.supplier.base"."name" AS "root.or.supplier.pg.name"
        FROM
          (
            SELECT
              *
            FROM
              "public"."supplier"
            WHERE
              (
                (
                  ("_0_root.base"."supplierB") = ("b")
                )
                AND (
                  ("_0_root.base"."supplierA") = ("a")
                )
              )
          ) AS "_1_root.or.supplier.base"
      ) AS "_2_root.or.supplier" ON ('true')
    ORDER BY
      "root.or.supplier.pg.name" ASC NULLS LAST
    LIMIT
      10
  ) AS "_4_root"

If we run EXPLAIN ANALYZE on it, we can see the following output:

Aggregate  (cost=186286.83..186286.84 rows=1 width=32) (actual time=2246.618..2246.618 rows=1 loops=1)
  ->  Limit  (cost=186286.68..186286.70 rows=10 width=52) (actual time=2246.584..2246.586 rows=10 loops=1)
        ->  Sort  (cost=186286.68..189446.60 rows=1263970 width=52) (actual time=2246.582..2246.583 rows=10 loops=1)
              Sort Key: supplier.name
              Sort Method: top-N heapsort  Memory: 26kB
              ->  Hash Left Join  (cost=645.57..158972.74 rows=1263970 width=52) (actual time=4.202..1961.412 rows=1269435 loops=1)
                    Hash Cond: (("order"."supplierB" = supplier."b") AND ("order"."supplierA" = supplier."a"))
                    ->  Seq Scan on "order"  (cost=0.00..135891.70 rows=1263970 width=19) (actual time=0.033..734.237 rows=1269435 loops=1)
                    ->  Hash  (cost=521.03..521.03 rows=8303 width=31) (actual time=4.115..4.116 rows=8303 loops=1)
                          Buckets: 16384  Batches: 1  Memory Usage: 662kB
                          ->  Seq Scan on supplier  (cost=0.00..521.03 rows=8303 width=31) (actual time=0.010..2.370 rows=8303 loops=1)
                    SubPlan 1
                      ->  Result  (cost=0.00..0.01 rows=1 width=32) (actual time=0.000..0.000 rows=1 loops=1269435)
Planning Time: 1.139 ms
Execution Time: 2246.722 ms

An important detail here is that there exist indexes:

CREATE INDEX "order_supplierKeys_idx" ON public."order" ("supplierB", "supplierA")
CREATE UNIQUE INDEX "supplier_keys" ON public.supplier ("a", "b")
CREATE INDEX supplier_name_idx ON public.supplier ("name")

Yet they seem to be unused in this case.

However, if we re-write the query manually:

select o.id
from "order" as o
join supplier as s on o."supplierB" = s."b" AND o."supplierA" = s."a"
order by s."name" asc
limit 10

The indexes are used and EXPLAIN ANALYZE returns the following result:

Limit  (cost=0.71..880.38 rows=10 width=28) (actual time=0.038..0.060 rows=10 loops=1)
  ->  Nested Loop  (cost=0.71..56299.45 rows=640 width=28) (actual time=0.037..0.058 rows=10 loops=1)
        ->  Index Scan using supplier_name_idx on supplier s  (cost=0.29..2244.82 rows=8303 width=31) (actual time=0.015..0.016 rows=1 loops=1)
        ->  Index Scan using "order_supplierKeys_idx" on "order" o  (cost=0.43..6.50 rows=1 width=19) (actual time=0.018..0.038 rows=10 loops=1)
              Index Cond: (("supplierB" = s."b") AND ("supplierA" = s."a"))
Planning Time: 1.317 ms
Execution Time: 0.106 ms

Would it be possible to improve performance of the generated SQL in the cases like this?

Metadata

Metadata

Assignees

Labels

a/performancec/serverRelated to serverk/enhancementNew feature or improve an existing featurep/highcandidate for being included in the upcoming sprint

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions