Support incremental builds of the schema cache (fix #3354) #3394

lexi-lambda · 2019-11-21T15:22:22Z

Description

As of #2379, we drop and recreate all system metadata using a bulk query during schema migrations. This is satisfyingly decisive, but it can lead to a performance problem because it can cause the schema cache to be rebuilt many times in a single transaction, which created the problem described in #3354.

A simple fix would be to tweak the migrations to avoid rebuilding the schema cache so many times, but I’ve been wanting to avoid doing that. Currently, we go to a lot of effort to avoid rebuilding the schema cache in places, but this duplicates a lot of logic, and as that logic becomes increasingly complicated, it becomes extremely difficult to keep in sync. Ideally, we’d like to have a single source of truth for how to build the schema cache and what rules its contents must follow, but we need to do it without incurring such a huge performance cost.

The idea I’ve been thinking about for a while is to incrementalize the process of building the schema cache, in the incremental compilation sense. Essentially, the idea is to add dependency tracking to the build process, so if some part of the metadata doesn’t change, we can avoid rebuilding it. Tracking all those dependencies by hand would be tricky and probably about as error-prone as our existing solution, so this PR adds some scaffolding for automatic incrementalization that does a lot of the work automatically. More details below.

Affected components

Server

Related Issues

#2379, #3354

Solution and Design

The good news about this change is that it actually removes a lot of logic, so the resulting implementation is simpler. However, the bad news is that it is a fairly invasive refactor, and it touches all parts of the code that modify the catalog or schema cache. There’s also a lot of new supporting code to help with the incremental build process, especially since arrows have almost no standard library to speak of.

For some more details, see the “For reviewers” section below.

TODO

Server checklist

Catalog upgrade

Does this PR change Hasura Catalog version?

Yes
- Updated docs with SQL for downgrading the catalog

Metadata

Does this PR add a new Metadata feature?

No

GraphQL

No new GraphQL schema is generated

Breaking changes

No Breaking changes
There are breaking changes:
1. Metadata API
  
  Existing query types:
  - Modify args payload which is not backward compatible
  - Behavioural change of the API
  - Change in response JSON schema
  - Change in error code
2. Logging
  - Log JSON schema has changed
  - Log type names have changed

For reviewers

This is a big change, so here are some tips on where to start and what to focus on if you’d like to review it:

First, familiarize yourself very briefly with arrows.
1. A good introduction to the high-level concepts is Understanding arrows from the Haskell wikibook, but it doesn’t cover proc notation.
2. For a primer on proc notation, the Arrow syntax page on the Haskell website includes some basic information, and the relevant section of the GHC User’s Guide has more details.
3. If you want something really thorough, you can read Paterson’s paper introducing arrow notation, A New Notation for Arrows.
You don’t need to thoroughly understand arrows to understand most of the code in this pull request, and in fact if you squint proc notation looks quite a bit like plain do notation.
Having done that, take a quick look at Hasura.Incremental and read the documentation comments. You don’t need to worry about the implementation to understand the rest of the code. You can also look at Hasura.IncrementalSpec for a couple small examples.
Now you can start reading through the bulk of the changes. Start with Hasura.RQL.DDL.Schema.Cache, focusing on buildSchemaCacheRule, which defines how to build a SchemaCache from catalog metadata.
Look at Hasura.RQL.DDL.Schema.Cache.Dependencies, which implements dependency resolution to ensure integrity of the schema cache.
Read Hasura.RQL.Types.SchemaCache.Build, which defines the basic interface for triggering cache rebuilds.
The rest of the changes are mostly just adjustments to the rest of the code to accommodate the modified schema cache construction process.

netlify · 2019-11-21T15:23:39Z

Deploy preview for hasura-docs ready!

Built with commit e47a8c4

https://deploy-preview-3394--hasura-docs.netlify.com

hasura-bot · 2019-12-16T19:06:53Z

Review app for commit 6709317 deployed to Heroku: https://hge-ci-pull-3394.herokuapp.com
Docker image for server: hasura/graphql-engine:pull3394-67093178

hasura-bot · 2019-12-17T15:45:19Z

Review app for commit c378d92 deployed to Heroku: https://hge-ci-pull-3394.herokuapp.com
Docker image for server: hasura/graphql-engine:pull3394-c378d921

hasura-bot · 2020-01-02T15:12:21Z

Review app for commit 4aa756a deployed to Heroku: https://hge-ci-pull-3394.herokuapp.com
Docker image for server: hasura/graphql-engine:pull3394-4aa756a1

hasura-bot · 2020-01-07T20:48:58Z

Review app for commit 17b1de3 deployed to Heroku: https://hge-ci-pull-3394.herokuapp.com
Docker image for server: hasura/graphql-engine:pull3394-17b1de33

wip: fix error codes in remote schema tests

This should hopefully improve compile times by avoiding the need to specialize everything at once.

Also, use the view in Schema.Diff to share some more logic.

This is significantly more performance, even without specialization, which dramatically improves compile times.

We mostly want to do this to make queries against information_schema tables work, which the console cares about. information_schema tables use types like sql_identifier, which have no corresponding array types defined! Therefore, in order to generate valid queries for _in and _nin conditions, we need to treat them as their base types, instead.

This changes TableCoreCacheT to internally record dependencies at a per-table level. In practice, this dramatically improves the performance of building permissions: it makes it far, far less likely for permissions to be needlessly rebuilt because some unrelated table changed.

As explained in the note included in the diff, this can lead to dramatically better query plans, and it seems to be especially important for versions of Postgres <12.

hasura-bot · 2020-01-08T23:01:42Z

Review app for commit b26b7d0 deployed to Heroku: https://hge-ci-pull-3394.herokuapp.com
Docker image for server: hasura/graphql-engine:pull3394-b26b7d05

lexi-lambda · 2020-01-08T23:11:19Z

I think It'd be good if we keep the internal code for Incremental (and few similar modules) as a separate package which is not bound to change frequently. Probably in a subdirectory in server folder which will have a make rule to build and host haddock documentation at localhost. Interested ones may refer to the documentation (to know the magic!).

@rakeshkky Do you think there’s an advantage to keeping those things in a separate package rather than just in a separate directory in the same package?

- Move MonadBase/MonadBaseControl instances for TxE into pg-client-hs - Set the -qn2 RTS option by default to limit the parallel GC to 2 threads - Remove eventlog instrumentation - Don’t rebuild the schema cache again after running a query that needs it to be rebuilt, since we do that explicitly now. - Remove some redundant checks, and relocate a couple others.

hasura-bot · 2020-01-09T00:13:44Z

Review app https://hge-ci-pull-3394.herokuapp.com is deleted

lexi-lambda added the c/server Related to server label Nov 21, 2019

hasura deleted a comment from hasura-bot Nov 25, 2019

lexi-lambda added the s/wip Status: This issue is a work in progress label Dec 2, 2019

lexi-lambda force-pushed the 3354-faster-metadata-migrations branch 2 times, most recently from 8130638 to 0340e38 Compare December 12, 2019 22:00

hasura deleted a comment from hasura-bot Dec 12, 2019

lexi-lambda requested a review from rakeshkky December 13, 2019 05:20

lexi-lambda force-pushed the 3354-faster-metadata-migrations branch from 33b4225 to 9f429ec Compare December 16, 2019 16:33

hasura deleted a comment from hasura-bot Dec 16, 2019

lexi-lambda force-pushed the 3354-faster-metadata-migrations branch from 00601a4 to 87c0be7 Compare December 16, 2019 17:12

lexi-lambda mentioned this pull request Dec 17, 2019

Migrations drop and recreate hdb_views dozens of times, causing max_locks_per_transaction to be exceeded #3354

Closed

lexi-lambda force-pushed the 3354-faster-metadata-migrations branch 2 times, most recently from 13d669e to c378d92 Compare December 17, 2019 15:23

hasura deleted a comment from hasura-bot Dec 17, 2019

lexi-lambda marked this pull request as ready for review December 17, 2019 15:28

lexi-lambda requested a review from rikinsk as a code owner December 17, 2019 15:28

lexi-lambda force-pushed the 3354-faster-metadata-migrations branch from c378d92 to 4aa756a Compare January 2, 2020 14:48

lexi-lambda dismissed stale reviews from rakeshkky and 0x777 via 17b1de3 January 7, 2020 20:33

lexi-lambda force-pushed the 3354-faster-metadata-migrations branch from 4aa756a to 17b1de3 Compare January 7, 2020 20:33

lexi-lambda added 14 commits January 8, 2020 16:43

Add Hasura.Incremental, a library for incremental builds

f2963a2

Refactor schema cache construction to avoid imperative updates

1387722

wip: fix error codes in remote schema tests

Use arrows instead of monads to define the schema cache construction

5b96920

Split up Hasura.RQL.DDL.Schema.Cache

447bac7

This should hopefully improve compile times by avoiding the need to specialize everything at once.

Use a significantly more efficient table_info_agg view

c322e8a

Also, use the view in Schema.Diff to share some more logic.

Switch to a CPS implementation of Rule

780857f

This is significantly more performance, even without specialization, which dramatically improves compile times.

Add caching for recreating event trigger functions

2799710

Properly check that custom field names do not conflict with other fields

e2eabcd

Move arrow transformers into a separate module

89af4ae

Add support for fine-grained dependency tracking to Incremental

fa9077f

Alter the type of hdb_catalog columns that reference SQL identifiers

d9bb9f2

As explained in the note included in the diff, this can lead to dramatically better query plans, and it seems to be especially important for versions of Postgres <12.

Add missing check in SetTableCustomFields; update various test cases

06df174

lexi-lambda force-pushed the 3354-faster-metadata-migrations branch from 17b1de3 to b26b7d0 Compare January 8, 2020 22:46

lexi-lambda force-pushed the 3354-faster-metadata-migrations branch from b26b7d0 to 66901d9 Compare January 8, 2020 23:09

lexi-lambda force-pushed the 3354-faster-metadata-migrations branch from 66901d9 to e47a8c4 Compare January 8, 2020 23:19

lexi-lambda merged commit 8053108 into hasura:master Jan 9, 2020

lexi-lambda deleted the 3354-faster-metadata-migrations branch January 9, 2020 00:13

lexi-lambda removed the s/wip Status: This issue is a work in progress label Jan 9, 2020

This was referenced Jan 9, 2020

Auto migrations takes a very long time (20 mins) #3654

Closed

Add fast path for server internal metadata migrations #3686

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support incremental builds of the schema cache (fix #3354) #3394

Support incremental builds of the schema cache (fix #3354) #3394

Uh oh!

lexi-lambda commented Nov 21, 2019 •

edited

Loading

Uh oh!

netlify bot commented Nov 21, 2019 •

edited

Loading

Uh oh!

hasura-bot commented Dec 16, 2019

Uh oh!

hasura-bot commented Dec 17, 2019

Uh oh!

hasura-bot commented Jan 2, 2020

Uh oh!

hasura-bot commented Jan 7, 2020

Uh oh!

hasura-bot commented Jan 8, 2020

Uh oh!

lexi-lambda commented Jan 8, 2020

Uh oh!

hasura-bot commented Jan 9, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Support incremental builds of the schema cache (fix #3354) #3394

Support incremental builds of the schema cache (fix #3354) #3394

Uh oh!

Conversation

lexi-lambda commented Nov 21, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Affected components

Related Issues

Solution and Design

TODO

Server checklist

Catalog upgrade

Metadata

GraphQL

Breaking changes

For reviewers

Uh oh!

netlify bot commented Nov 21, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hasura-bot commented Dec 16, 2019

Uh oh!

hasura-bot commented Dec 17, 2019

Uh oh!

hasura-bot commented Jan 2, 2020

Uh oh!

hasura-bot commented Jan 7, 2020

Uh oh!

hasura-bot commented Jan 8, 2020

Uh oh!

lexi-lambda commented Jan 8, 2020

Uh oh!

hasura-bot commented Jan 9, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

lexi-lambda commented Nov 21, 2019 •

edited

Loading

netlify bot commented Nov 21, 2019 •

edited

Loading