-
Notifications
You must be signed in to change notification settings - Fork 2.8k
Description
We are experiencing a performance issue related to the fact that Hasura generates a sub-optional SQL query.
Simplified, let's say that we have order
table which has id
, supplierA
, and supplierB
fields. We also have supplier
table with a
, b
, and name
fields.
The relation is the following:
order . ( supplierA, supplierB ) → supplier . ( a, b )
We issue a query like this:
query MyQuery {
order(
order_by: {supplier: {name: asc}},
limit: 10) {
id
}
}
The idea here is to get order
s which are ordered by name
s of corresponding suppliers. Here is the SQL code that Hasura generates:
SELECT
coalesce(
json_agg(
"root"
ORDER BY
"root.or.supplier.pg.name" ASC NULLS LAST
),
'[]'
) AS "root"
FROM
(
SELECT
"_2_root.or.supplier"."root.or.supplier.pg.name" AS "root.or.supplier.pg.name",
row_to_json(
(
SELECT
"_3_e"
FROM
(
SELECT
"_0_root.base"."id" AS "id"
) AS "_3_e"
)
) AS "root"
FROM
(
SELECT
*
FROM
"public"."order"
WHERE
('true')
) AS "_0_root.base"
LEFT OUTER JOIN LATERAL (
SELECT
"_1_root.or.supplier.base"."name" AS "root.or.supplier.pg.name"
FROM
(
SELECT
*
FROM
"public"."supplier"
WHERE
(
(
("_0_root.base"."supplierB") = ("b")
)
AND (
("_0_root.base"."supplierA") = ("a")
)
)
) AS "_1_root.or.supplier.base"
) AS "_2_root.or.supplier" ON ('true')
ORDER BY
"root.or.supplier.pg.name" ASC NULLS LAST
LIMIT
10
) AS "_4_root"
If we run EXPLAIN ANALYZE
on it, we can see the following output:
Aggregate (cost=186286.83..186286.84 rows=1 width=32) (actual time=2246.618..2246.618 rows=1 loops=1)
-> Limit (cost=186286.68..186286.70 rows=10 width=52) (actual time=2246.584..2246.586 rows=10 loops=1)
-> Sort (cost=186286.68..189446.60 rows=1263970 width=52) (actual time=2246.582..2246.583 rows=10 loops=1)
Sort Key: supplier.name
Sort Method: top-N heapsort Memory: 26kB
-> Hash Left Join (cost=645.57..158972.74 rows=1263970 width=52) (actual time=4.202..1961.412 rows=1269435 loops=1)
Hash Cond: (("order"."supplierB" = supplier."b") AND ("order"."supplierA" = supplier."a"))
-> Seq Scan on "order" (cost=0.00..135891.70 rows=1263970 width=19) (actual time=0.033..734.237 rows=1269435 loops=1)
-> Hash (cost=521.03..521.03 rows=8303 width=31) (actual time=4.115..4.116 rows=8303 loops=1)
Buckets: 16384 Batches: 1 Memory Usage: 662kB
-> Seq Scan on supplier (cost=0.00..521.03 rows=8303 width=31) (actual time=0.010..2.370 rows=8303 loops=1)
SubPlan 1
-> Result (cost=0.00..0.01 rows=1 width=32) (actual time=0.000..0.000 rows=1 loops=1269435)
Planning Time: 1.139 ms
Execution Time: 2246.722 ms
An important detail here is that there exist indexes:
CREATE INDEX "order_supplierKeys_idx" ON public."order" ("supplierB", "supplierA")
CREATE UNIQUE INDEX "supplier_keys" ON public.supplier ("a", "b")
CREATE INDEX supplier_name_idx ON public.supplier ("name")
Yet they seem to be unused in this case.
However, if we re-write the query manually:
select o.id
from "order" as o
join supplier as s on o."supplierB" = s."b" AND o."supplierA" = s."a"
order by s."name" asc
limit 10
The indexes are used and EXPLAIN ANALYZE
returns the following result:
Limit (cost=0.71..880.38 rows=10 width=28) (actual time=0.038..0.060 rows=10 loops=1)
-> Nested Loop (cost=0.71..56299.45 rows=640 width=28) (actual time=0.037..0.058 rows=10 loops=1)
-> Index Scan using supplier_name_idx on supplier s (cost=0.29..2244.82 rows=8303 width=31) (actual time=0.015..0.016 rows=1 loops=1)
-> Index Scan using "order_supplierKeys_idx" on "order" o (cost=0.43..6.50 rows=1 width=19) (actual time=0.018..0.038 rows=10 loops=1)
Index Cond: (("supplierB" = s."b") AND ("supplierA" = s."a"))
Planning Time: 1.317 ms
Execution Time: 0.106 ms
Would it be possible to improve performance of the generated SQL in the cases like this?