-
Notifications
You must be signed in to change notification settings - Fork 2.8k
Support incremental builds of the schema cache (fix #3354) #3394
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support incremental builds of the schema cache (fix #3354) #3394
Conversation
|
Deploy preview for hasura-docs ready! Built with commit e47a8c4 |
8130638 to
0340e38
Compare
33b4225 to
9f429ec
Compare
00601a4 to
87c0be7
Compare
|
Review app for commit 6709317 deployed to Heroku: https://hge-ci-pull-3394.herokuapp.com |
13d669e to
c378d92
Compare
|
Review app for commit c378d92 deployed to Heroku: https://hge-ci-pull-3394.herokuapp.com |
c378d92 to
4aa756a
Compare
|
Review app for commit 4aa756a deployed to Heroku: https://hge-ci-pull-3394.herokuapp.com |
4aa756a to
17b1de3
Compare
|
Review app for commit 17b1de3 deployed to Heroku: https://hge-ci-pull-3394.herokuapp.com |
wip: fix error codes in remote schema tests
This should hopefully improve compile times by avoiding the need to specialize everything at once.
Also, use the view in Schema.Diff to share some more logic.
This is significantly more performance, even without specialization, which dramatically improves compile times.
We mostly want to do this to make queries against information_schema tables work, which the console cares about. information_schema tables use types like sql_identifier, which have no corresponding array types defined! Therefore, in order to generate valid queries for _in and _nin conditions, we need to treat them as their base types, instead.
This changes TableCoreCacheT to internally record dependencies at a per-table level. In practice, this dramatically improves the performance of building permissions: it makes it far, far less likely for permissions to be needlessly rebuilt because some unrelated table changed.
As explained in the note included in the diff, this can lead to dramatically better query plans, and it seems to be especially important for versions of Postgres <12.
17b1de3 to
b26b7d0
Compare
|
Review app for commit b26b7d0 deployed to Heroku: https://hge-ci-pull-3394.herokuapp.com |
b26b7d0 to
66901d9
Compare
@rakeshkky Do you think there’s an advantage to keeping those things in a separate package rather than just in a separate directory in the same package? |
- Move MonadBase/MonadBaseControl instances for TxE into pg-client-hs - Set the -qn2 RTS option by default to limit the parallel GC to 2 threads - Remove eventlog instrumentation - Don’t rebuild the schema cache again after running a query that needs it to be rebuilt, since we do that explicitly now. - Remove some redundant checks, and relocate a couple others.
66901d9 to
e47a8c4
Compare
|
Review app https://hge-ci-pull-3394.herokuapp.com is deleted |
Description
As of #2379, we drop and recreate all system metadata using a bulk query during schema migrations. This is satisfyingly decisive, but it can lead to a performance problem because it can cause the schema cache to be rebuilt many times in a single transaction, which created the problem described in #3354.
A simple fix would be to tweak the migrations to avoid rebuilding the schema cache so many times, but I’ve been wanting to avoid doing that. Currently, we go to a lot of effort to avoid rebuilding the schema cache in places, but this duplicates a lot of logic, and as that logic becomes increasingly complicated, it becomes extremely difficult to keep in sync. Ideally, we’d like to have a single source of truth for how to build the schema cache and what rules its contents must follow, but we need to do it without incurring such a huge performance cost.
The idea I’ve been thinking about for a while is to incrementalize the process of building the schema cache, in the incremental compilation sense. Essentially, the idea is to add dependency tracking to the build process, so if some part of the metadata doesn’t change, we can avoid rebuilding it. Tracking all those dependencies by hand would be tricky and probably about as error-prone as our existing solution, so this PR adds some scaffolding for automatic incrementalization that does a lot of the work automatically. More details below.
Affected components
Related Issues
#2379, #3354
Solution and Design
The good news about this change is that it actually removes a lot of logic, so the resulting implementation is simpler. However, the bad news is that it is a fairly invasive refactor, and it touches all parts of the code that modify the catalog or schema cache. There’s also a lot of new supporting code to help with the incremental build process, especially since arrows have almost no standard library to speak of.
For some more details, see the “For reviewers” section below.
TODO
buildSchemaCacheto remove imperative updatesbuildSchemaCachecode pathSchemaDependencytrackingcachein the right placesWriterTimplementationImprove performance ofPerformance is good enough, so punting this to future workbulkqueriesTrack error codes in inconsistent metadata objectsDecided not to do this.Server checklist
Catalog upgrade
Does this PR change Hasura Catalog version?
Metadata
Does this PR add a new Metadata feature?
GraphQL
Breaking changes
No Breaking changes
There are breaking changes:
Metadata API
Existing
querytypes:argspayload which is not backward compatibleJSONschemaLogging
JSONschema has changedtypenames have changedFor reviewers
This is a big change, so here are some tips on where to start and what to focus on if you’d like to review it:
First, familiarize yourself very briefly with arrows.
A good introduction to the high-level concepts is Understanding arrows from the Haskell wikibook, but it doesn’t cover
procnotation.For a primer on
procnotation, the Arrow syntax page on the Haskell website includes some basic information, and the relevant section of the GHC User’s Guide has more details.If you want something really thorough, you can read Paterson’s paper introducing arrow notation, A New Notation for Arrows.
You don’t need to thoroughly understand arrows to understand most of the code in this pull request, and in fact if you squint
procnotation looks quite a bit like plaindonotation.Having done that, take a quick look at
Hasura.Incrementaland read the documentation comments. You don’t need to worry about the implementation to understand the rest of the code. You can also look atHasura.IncrementalSpecfor a couple small examples.Now you can start reading through the bulk of the changes. Start with
Hasura.RQL.DDL.Schema.Cache, focusing onbuildSchemaCacheRule, which defines how to build aSchemaCachefrom catalog metadata.Look at
Hasura.RQL.DDL.Schema.Cache.Dependencies, which implements dependency resolution to ensure integrity of the schema cache.Read
Hasura.RQL.Types.SchemaCache.Build, which defines the basic interface for triggering cache rebuilds.The rest of the changes are mostly just adjustments to the rest of the code to accommodate the modified schema cache construction process.