这是indexloc提供的服务,不要输入任何密码
Skip to content

Conversation

@lexi-lambda
Copy link
Contributor

@lexi-lambda lexi-lambda commented Nov 21, 2019

Description

As of #2379, we drop and recreate all system metadata using a bulk query during schema migrations. This is satisfyingly decisive, but it can lead to a performance problem because it can cause the schema cache to be rebuilt many times in a single transaction, which created the problem described in #3354.

A simple fix would be to tweak the migrations to avoid rebuilding the schema cache so many times, but I’ve been wanting to avoid doing that. Currently, we go to a lot of effort to avoid rebuilding the schema cache in places, but this duplicates a lot of logic, and as that logic becomes increasingly complicated, it becomes extremely difficult to keep in sync. Ideally, we’d like to have a single source of truth for how to build the schema cache and what rules its contents must follow, but we need to do it without incurring such a huge performance cost.

The idea I’ve been thinking about for a while is to incrementalize the process of building the schema cache, in the incremental compilation sense. Essentially, the idea is to add dependency tracking to the build process, so if some part of the metadata doesn’t change, we can avoid rebuilding it. Tracking all those dependencies by hand would be tricky and probably about as error-prone as our existing solution, so this PR adds some scaffolding for automatic incrementalization that does a lot of the work automatically. More details below.

Affected components

  • Server

Related Issues

#2379, #3354

Solution and Design

The good news about this change is that it actually removes a lot of logic, so the resulting implementation is simpler. However, the bad news is that it is a fairly invasive refactor, and it touches all parts of the code that modify the catalog or schema cache. There’s also a lot of new supporting code to help with the incremental build process, especially since arrows have almost no standard library to speak of.

For some more details, see the “For reviewers” section below.

TODO

  • Implement incrementalization abstraction
  • Refactor buildSchemaCache to remove imperative updates
  • Update metadata APIs to use buildSchemaCache code path
    • tables
    • relationships
    • computed fields
    • permissions
    • event triggers
    • remote schemas
  • Reintegrate proper SchemaDependency tracking
  • Update Diff.hs logic
  • Fix broken tests
  • Insert calls to cache in the right places
  • Use a non-leaky WriterT implementation
  • Improve performance of catalog fetch query
  • Figure out how to handle foreign key dependencies properly
  • Improve performance of bulk queries Performance is good enough, so punting this to future work
  • Handle conflicting GraphQL field names
  • Track dependencies from permissions to tables in a more fine-grained way
  • Track error codes in inconsistent metadata objects Decided not to do this.
  • Final cleanup of loose ends

Server checklist

Catalog upgrade

Does this PR change Hasura Catalog version?

  • Yes
    • Updated docs with SQL for downgrading the catalog

Metadata

Does this PR add a new Metadata feature?

  • No

GraphQL

  • No new GraphQL schema is generated

Breaking changes

  • No Breaking changes

  • There are breaking changes:

    1. Metadata API

      Existing query types:

      • Modify args payload which is not backward compatible
      • Behavioural change of the API
      • Change in response JSON schema
      • Change in error code
    2. Logging

      • Log JSON schema has changed
      • Log type names have changed

For reviewers

This is a big change, so here are some tips on where to start and what to focus on if you’d like to review it:

  1. First, familiarize yourself very briefly with arrows.

    1. A good introduction to the high-level concepts is Understanding arrows from the Haskell wikibook, but it doesn’t cover proc notation.

    2. For a primer on proc notation, the Arrow syntax page on the Haskell website includes some basic information, and the relevant section of the GHC User’s Guide has more details.

    3. If you want something really thorough, you can read Paterson’s paper introducing arrow notation, A New Notation for Arrows.

    You don’t need to thoroughly understand arrows to understand most of the code in this pull request, and in fact if you squint proc notation looks quite a bit like plain do notation.

  2. Having done that, take a quick look at Hasura.Incremental and read the documentation comments. You don’t need to worry about the implementation to understand the rest of the code. You can also look at Hasura.IncrementalSpec for a couple small examples.

  3. Now you can start reading through the bulk of the changes. Start with Hasura.RQL.DDL.Schema.Cache, focusing on buildSchemaCacheRule, which defines how to build a SchemaCache from catalog metadata.

  4. Look at Hasura.RQL.DDL.Schema.Cache.Dependencies, which implements dependency resolution to ensure integrity of the schema cache.

  5. Read Hasura.RQL.Types.SchemaCache.Build, which defines the basic interface for triggering cache rebuilds.

  6. The rest of the changes are mostly just adjustments to the rest of the code to accommodate the modified schema cache construction process.

@lexi-lambda lexi-lambda added the c/server Related to server label Nov 21, 2019
@netlify
Copy link

netlify bot commented Nov 21, 2019

Deploy preview for hasura-docs ready!

Built with commit e47a8c4

https://deploy-preview-3394--hasura-docs.netlify.com

@hasura hasura deleted a comment from hasura-bot Nov 25, 2019
@lexi-lambda lexi-lambda added the s/wip Status: This issue is a work in progress label Dec 2, 2019
@lexi-lambda lexi-lambda force-pushed the 3354-faster-metadata-migrations branch 2 times, most recently from 8130638 to 0340e38 Compare December 12, 2019 22:00
@hasura hasura deleted a comment from hasura-bot Dec 12, 2019
@hasura hasura deleted a comment from hasura-bot Dec 12, 2019
@hasura hasura deleted a comment from hasura-bot Dec 12, 2019
@hasura hasura deleted a comment from hasura-bot Dec 12, 2019
@hasura hasura deleted a comment from hasura-bot Dec 12, 2019
@lexi-lambda lexi-lambda force-pushed the 3354-faster-metadata-migrations branch from 33b4225 to 9f429ec Compare December 16, 2019 16:33
@hasura hasura deleted a comment from hasura-bot Dec 16, 2019
@hasura hasura deleted a comment from hasura-bot Dec 16, 2019
@hasura hasura deleted a comment from hasura-bot Dec 16, 2019
@hasura hasura deleted a comment from hasura-bot Dec 16, 2019
@hasura hasura deleted a comment from hasura-bot Dec 16, 2019
@lexi-lambda lexi-lambda force-pushed the 3354-faster-metadata-migrations branch from 00601a4 to 87c0be7 Compare December 16, 2019 17:12
@hasura-bot
Copy link
Contributor

Review app for commit 6709317 deployed to Heroku: https://hge-ci-pull-3394.herokuapp.com
Docker image for server: hasura/graphql-engine:pull3394-67093178

@lexi-lambda lexi-lambda force-pushed the 3354-faster-metadata-migrations branch 2 times, most recently from 13d669e to c378d92 Compare December 17, 2019 15:23
@hasura hasura deleted a comment from hasura-bot Dec 17, 2019
@hasura hasura deleted a comment from hasura-bot Dec 17, 2019
@lexi-lambda lexi-lambda marked this pull request as ready for review December 17, 2019 15:28
@lexi-lambda lexi-lambda requested a review from rikinsk as a code owner December 17, 2019 15:28
@hasura-bot
Copy link
Contributor

Review app for commit c378d92 deployed to Heroku: https://hge-ci-pull-3394.herokuapp.com
Docker image for server: hasura/graphql-engine:pull3394-c378d921

@lexi-lambda lexi-lambda force-pushed the 3354-faster-metadata-migrations branch from c378d92 to 4aa756a Compare January 2, 2020 14:48
@hasura-bot
Copy link
Contributor

Review app for commit 4aa756a deployed to Heroku: https://hge-ci-pull-3394.herokuapp.com
Docker image for server: hasura/graphql-engine:pull3394-4aa756a1

@lexi-lambda lexi-lambda dismissed stale reviews from rakeshkky and 0x777 via 17b1de3 January 7, 2020 20:33
@lexi-lambda lexi-lambda force-pushed the 3354-faster-metadata-migrations branch from 4aa756a to 17b1de3 Compare January 7, 2020 20:33
@hasura-bot
Copy link
Contributor

Review app for commit 17b1de3 deployed to Heroku: https://hge-ci-pull-3394.herokuapp.com
Docker image for server: hasura/graphql-engine:pull3394-17b1de33

wip: fix error codes in remote schema tests
This should hopefully improve compile times by avoiding the need to
specialize everything at once.
Also, use the view in Schema.Diff to share some more logic.
This is significantly more performance, even without specialization,
which dramatically improves compile times.
We mostly want to do this to make queries against information_schema
tables work, which the console cares about. information_schema tables
use types like sql_identifier, which have no corresponding array types
defined! Therefore, in order to generate valid queries for _in and _nin
conditions, we need to treat them as their base types, instead.
This changes TableCoreCacheT to internally record dependencies at a
per-table level. In practice, this dramatically improves the performance
of building permissions: it makes it far, far less likely for
permissions to be needlessly rebuilt because some unrelated table
changed.
As explained in the note included in the diff, this can lead to
dramatically better query plans, and it seems to be especially important
for versions of Postgres <12.
@lexi-lambda lexi-lambda force-pushed the 3354-faster-metadata-migrations branch from 17b1de3 to b26b7d0 Compare January 8, 2020 22:46
@hasura-bot
Copy link
Contributor

Review app for commit b26b7d0 deployed to Heroku: https://hge-ci-pull-3394.herokuapp.com
Docker image for server: hasura/graphql-engine:pull3394-b26b7d05

@lexi-lambda lexi-lambda force-pushed the 3354-faster-metadata-migrations branch from b26b7d0 to 66901d9 Compare January 8, 2020 23:09
@lexi-lambda
Copy link
Contributor Author

I think It'd be good if we keep the internal code for Incremental (and few similar modules) as a separate package which is not bound to change frequently. Probably in a subdirectory in server folder which will have a make rule to build and host haddock documentation at localhost. Interested ones may refer to the documentation (to know the magic!).

@rakeshkky Do you think there’s an advantage to keeping those things in a separate package rather than just in a separate directory in the same package?

- Move MonadBase/MonadBaseControl instances for TxE into pg-client-hs
- Set the -qn2 RTS option by default to limit the parallel GC to 2
  threads
- Remove eventlog instrumentation
- Don’t rebuild the schema cache again after running a query that needs
  it to be rebuilt, since we do that explicitly now.
- Remove some redundant checks, and relocate a couple others.
@lexi-lambda lexi-lambda force-pushed the 3354-faster-metadata-migrations branch from 66901d9 to e47a8c4 Compare January 8, 2020 23:19
@lexi-lambda lexi-lambda merged commit 8053108 into hasura:master Jan 9, 2020
@hasura-bot
Copy link
Contributor

Review app https://hge-ci-pull-3394.herokuapp.com is deleted

@lexi-lambda lexi-lambda deleted the 3354-faster-metadata-migrations branch January 9, 2020 00:13
@lexi-lambda lexi-lambda removed the s/wip Status: This issue is a work in progress label Jan 9, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

c/server Related to server

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants