allow identical fields in custom column names configuration (fix #3137) & improve root fields validation #3154

rakeshkky · 2019-10-16T07:14:23Z

Description

Do not throw an error if a custom column name is identical to the Postgres name of the column.

track_table v2 API throws an error if the request is

{
  "type": "track_table",
  "version": 2,
  "args": {
    "table": "users",
    "configuration": {
        "custom_column_names": {
          "created_at": "createdAt",
          "id": "id",
          "email": "email",
          "verified_at": "verifiedAt",
        }
    }
  }
}

Error: "column/relationship "id" of table "users" already exists"

This PR improves the logic of validating custom column names thus fixes the above. Also, this PR improves the check of duplicate root fields generated for all tables.

Affected components

Server
Tests

Related Issues

Fix a part of #3137

Solution and Design

Modify logic of validating custom column names. Create unified Hashmap of column and custom name and check for duplicate custom names.
Validate custom column names against existing not column fields for set_table_custom_fields query type.
Store root fields in GCtx as lists instead of hashmaps. Check for duplicate root field names using the lists.

Steps to test and verify

Try defining the custom_column_names with identical fields. The server shouldn't throw an error.

Limitations, known bugs & workarounds

netlify · 2019-10-16T07:15:40Z

Deploy preview for hasura-docs ready!

Built with commit 87784ef

https://deploy-preview-3154--hasura-docs.netlify.com

hasura-bot · 2019-10-16T07:22:19Z

Review app for commit f362624 deployed to Heroku: https://hge-ci-pull-3154.herokuapp.com
Docker image for server: hasura/graphql-engine:pull3154-f3626247

hasura-bot · 2019-10-16T11:32:22Z

Review app for commit b04e5fe deployed to Heroku: https://hge-ci-pull-3154.herokuapp.com
Docker image for server: hasura/graphql-engine:pull3154-b04e5fe0

hasura-bot · 2019-10-17T06:36:42Z

Review app for commit 0d40588 deployed to Heroku: https://hge-ci-pull-3154.herokuapp.com
Docker image for server: hasura/graphql-engine:pull3154-0d40588c

hasura-bot · 2019-10-17T06:43:19Z

Review app for commit f361a2c deployed to Heroku: https://hge-ci-pull-3154.herokuapp.com
Docker image for server: hasura/graphql-engine:pull3154-f361a2cf

hasura-bot · 2019-10-17T15:54:48Z

Review app for commit b4c3ede deployed to Heroku: https://hge-ci-pull-3154.herokuapp.com
Docker image for server: hasura/graphql-engine:pull3154-b4c3edea

lexi-lambda · 2019-10-17T15:59:18Z

server/src-lib/Hasura/RQL/DDL/Schema/Table.hs

  buildSchemaCacheFor (MOTable tableName)
  return successMsg
+  where
+    validateWithExistingRelationships fields = do


I don’t think I totally understand all the subtleties of this issue. Could you explain why validateWithExistingRelationships is different from/not a part of validateTableConfig?

Sure. validateTableConfig happens when initialising TableInfo of a table. At that time no relationships were added. Hence, we cannot do validation with relationships.

The set_table_custom_fields is applied for only tracked tables. We should validate custom column names with existing relationships. However, validateTableConfig is used here via buildSchemaCacheFor function.

Well, I got stuck with a thought while writing the above reply. What about validating the new relationships being added with existing custom column names? We should do that definitely. Will push a commit with that change soon.

Putting my all thoughts below:

I have a simple schema

CREATE TABLE "user"( id SERIAL PRIMARY KEY, name TEXT NOT NULL, age INTEGER )

I tracked the above table with the following custom column names configuration.

{ "id": "userId", "name": "userName" }

What could be the best way to validate a relationship with name X that is being defined? Currently, we only check with existing column names and relationships. We should definitely validate with custom column names (userId and userName).
Also, I'm thinking about validating the name for the new class of fields coming soon (computed fields and remote joins). I think we should improve the logic of function checkForFieldConflict and should change validateWithExistingRelationships to something like validateWithFieldsOtherThanColumns.

I'll answer all these in the next commit.

Sure. validateTableConfig happens when initialising TableInfo of a table. At that time no relationships were added. Hence, we cannot do validation with relationships.

Got it, that makes sense! As a followup question: do you think we could consolidate this logic somehow? I know that each code path has slightly different inputs, and you have a firmer grasp on the details than I do, but it would be nice if we could share the logic between validateTableConfig, validateWithNonColumnFields, and checkForFieldConflict somehow so that it can’t get out of sync.

Prior to this commit the server used to build the TableInfo on track_table query and add it to the cache later. Now buildSchemaCacheFor is used to construct TableInfo from scratch and add update it with other table metadata objects. Thus we don't have the luxury of having TableInfo to validate the input custom fields configuration in trackExistingTableOrViewP2 and runSetTableCustomFields. This forces the validation of custom fields configuration with existing column names to occur only in buildSchemaCacheFor code path.

In the set_table_custom_fields query API, buildSchemaCacheFor is being used to rebuild the TableConfig. Here, the validation for custom fields occur in two steps

With existing un-updated TableInfo, validate the custom column fields with existing non-columns (relationships, computed fields as of now).

We can validate the input custom column fields with existing columns here. But, it doesn't make sense since the same happens in buildSchemaCacheFor code path.

After a bit of thinking, I'm planning to do the following:

Scrap validateTableConfig function and break it into two isolated logics; validateCustomRootFields and validateWithColumns.

Use validateCustomRootFields directly in trackExistingTableOrViewP2 and runSetTableCustomFields

Use validateWithColumns only in buildTableCache

EDIT 1: The changes I mentioned above are live in the PR.

…non column fields

…graphql-engine into identical-custom-column-names

hasura-bot · 2019-10-18T08:18:42Z

Review app for commit d1cb173 deployed to Heroku: https://hge-ci-pull-3154.herokuapp.com
Docker image for server: hasura/graphql-engine:pull3154-d1cb1738

Resolve Conflicts: server/src-lib/Hasura/RQL/Types/SchemaCache.hs server/tests-py/test_v1_queries.py

hasura-bot · 2019-10-18T10:03:19Z

Review app for commit 3737651 deployed to Heroku: https://hge-ci-pull-3154.herokuapp.com
Docker image for server: hasura/graphql-engine:pull3154-37376514

hasura-bot · 2019-10-22T07:05:24Z

Review app for commit 55d7083 deployed to Heroku: https://hge-ci-pull-3154.herokuapp.com
Docker image for server: hasura/graphql-engine:pull3154-55d70830

hasura-bot · 2019-10-22T08:43:01Z

Review app for commit 44e2fb0 deployed to Heroku: https://hge-ci-pull-3154.herokuapp.com
Docker image for server: hasura/graphql-engine:pull3154-44e2fb04

hasura-bot · 2019-11-04T06:18:30Z

Review app for commit 6479d5f deployed to Heroku: https://hge-ci-pull-3154.herokuapp.com
Docker image for server: hasura/graphql-engine:pull3154-6479d5fb

lexi-lambda · 2019-11-12T15:27:51Z

server/src-lib/Hasura/RQL/DDL/Schema/Table.hs

-  where
-    duplicateNames = duplicates $ M.elems colFlds
+  => GC.TableCustomRootFields -> m ()
+validateCustomRootFields customRootFields = do


In the vein of “parse, don’t validate,” is there some way we could make this return something other than ()? 🙂 Maybe it could build a map from root field names to something? I don’t know if that makes sense, but just throwing it out there.

There is no type that TableCustomRootFields resolve to. Hence, this function cannot return other than (). But partly we can achieve "parse, don't validate," :). This function is performing two tasks:

check for duplicate custom root field names within TableCustomRootFields type.
Happily, we can move this part to FromJSON deriving of the type.

check for any conflicting root field in the remote schema.
This is something we can't avoid right now.

Sorry for taking so long to respond to this—I’ve had a few too many things going on, and in order to understand all the details here, I needed to actually take the time to checkout this branch and look at the code.

I’ve now done that, and I see what you mean about it not being able to return anything more meaningful. However, that’s actually kind of why I added the buildSchemaCacheFor function in the first place—my idea was that buildSchemaCache could be where all the “canonical” parsing/validation logic happens, and we could just optimistically assume the data is correct, update the catalog, and run buildSchemaCache to see if it builds successfully. If it doesn’t, the process will fail, and the database transaction will get rolled back.

I don’t love that solution, since it’s fairly indirect, and it duplicates a lot of work. But it does have some things going for it. Here was my reasoning:

The idea of “parse, don’t validate” is that we want to use the type system to ensure the necessary parsing happens as a necessary consequence of the data processing pipeline.

The problem is that part of our processing pipeline goes out into Postgres, then back into graphql-engine, which is tricky, since obviously the Haskell type system cannot guarantee anything about what goes on inside Postgres.

However, at least with Postgres we get database transactions, so we can always muck up the state of the catalog as much as we want as long as we always ensure it is consistent before we eventually commit.

Since buildSchemaCache fundamentally has to check integrity of the catalog (since it has to build a SchemaCache value), it makes sense as a canonical code path for ensuring invariants. Therefore, if we can successfully build the schema cache, the catalog must be well-formed, right?

There are, however, some problems with this approach, as you’ve probably seen for yourself. For one, it’s slow. For another, it can make doing error reporting tricky, since the error that gets generated might not have anything to do with the thing we’re changing. The latter issue was what buildSchemaCacheFor was supposed to be about—it’s a way of specifying that the schema cache is being rebuilt with the intent to check the integrity of a particular object.

The former issue is more complicated to deal with (though I don’t think it’s impossible), but it’s not actually a huge issue for this change, since trackExistingTableOrViewP2 already calls buildSchemaCacheFor, anyway. Therefore, we can’t be slowing it down any more by taking that code path, right?

Therefore, what if we skipped the call to validateCustomRootFields entirely and did the integrity-checking via the canonical buildSchemaCacheFor code path? Is that viable, do you think?

I want to give you a bit of history :). Initially, the GraphQL Engine has only Postgres tables with relationships and permissions. Later on, in many iterations, it has evolved to what it is now with remote schemas, event triggers, computed fields etc. If we see, the buildSchemaCache function is quite sequential. The sequence (except computed fields) implies the evolution of the GraphQL Engine (interesting!). The TableInfo for all tables are built first, relationships are added next, then computed fields, permissions etc. The validation checks are inter-dependant. For example, there shouldn't be a computed field with a relationship name or vice versa. The permissions are defined only after all "fields" for a table are resolved. Any change in the sequence may result in permanent inconsistency (a potential bug). Well, the buildSchemaCacheFor is useful in trackExistingTableOrViewP2 since the server is defining the metadata of the table for the first time. But, the validation of its custom root fields configuration dependant on the existing remote schemas. Here, building the TableInfo dependant on the remote schemas which are added much later in the buildSchemaCache. Hence, It is hard to achieve validateCustomRootFields in buildSchemaCacheFor. So I'm calling it well before buildSchemaCacheFor to validate with existing remote schemas and optimistic about the remote schemas are not being altered in the current code path.

The same happening with set_table_custom_fields metadata API. We've to validate the custom column names with existing fields other than columns, which are added to the TableInfo later in the buildSchemaCache. Also, we should definitely do validateCustomRootFields before buildSchemaCacheFor.

Apart from two problems you mentioned with buildSchemaCacheFor, the third problem aforementioned (with the peculiar sequential dependence of buildSchemaCache) in the above two paragraphs is making the function, buildSchemaCacheFor, not a one-stop solution in achieving "parse, don't validate". We should definitely do validate :: m () in the metadata APIs which rely upon validating against existing metadata objects which are being added later in the buildSchemaCache.

The validation checks are inter-dependant. For example, there shouldn't be a computed field with a relationship name or vice versa. The permissions are defined only after all "fields" for a table are resolved. Any change in the sequence may result in permanent inconsistency (a potential bug).

This makes sense, but surely buildSchemaCache must validate everything (including custom column names!), or else buildSchemaCache is buggy, right? If the metadata were to end up in a state where the custom column name configuration is invalid, then buildSchemaCache must fail. To me, this implies that, regardless of the order in which it does the validation, eventually it will get validated.

Perhaps what you’re telling me, however, is that isn’t the case? If I were to, with this PR, start up an instance of graphql-engine against a database where a custom column name configuration is somehow invalid, what would happen? Would it report an error on startup about an invalid custom column name configuration, or would it report some arcane internal error message about something else entirely (or worse, not report an error at all until something else went wrong during execution)?

I do understand that some validation (in the m () sense) must happen somewhere, and it’s highly unlikely that we can encode all of our invariants into the type system, but I don’t understand why that validation can’t happen despite the sequential dependencies inside buildSchemaCache. In the absolute worst-case scenario, we add some validation checks at the end of buildSchemaCache, after the whole schema cache has been built, and we have all the information we need.

(As a related aside, I have started working on a fix for #3354 today, which involves making the performance of buildSchemaCache better, so I’ve been implementing the “automatic incrementalization” idea I’ve had for a little while now. That should (eventually) make the performance of buildSchemaCache much better, since it can avoid rebuilding everything every time it is called. I don’t think the performance problem of buildSchemaCache is fundamental, and I think there are significant advantages to consolidating the logic around the catalog!)

In the absolute worst-case scenario, we add some validation checks at the end of buildSchemaCache, after the whole schema cache has been built, and we have all the information we need.

This seems good to me. For now, I'll move validation of custom configuration of tables which dependant on the objects added later to end of buildSchemaCache.

Please add me as a reviewer for #3354 :). I would love to see it.

server/src-lib/Hasura/GraphQL/Schema.hs

@lexi-lambda

As suggested by @lexi-lambda

hasura-bot · 2019-11-13T15:30:37Z

Review app for commit 16c6933 deployed to Heroku: https://hge-ci-pull-3154.herokuapp.com
Docker image for server: hasura/graphql-engine:pull3154-16c69330

hasura-bot · 2019-11-14T05:51:49Z

Review app for commit 5e69174 deployed to Heroku: https://hge-ci-pull-3154.herokuapp.com
Docker image for server: hasura/graphql-engine:pull3154-5e69174f

hasura-bot · 2019-11-18T07:41:08Z

Review app for commit 76c00ef deployed to Heroku: https://hge-ci-pull-3154.herokuapp.com
Docker image for server: hasura/graphql-engine:pull3154-76c00ef8

hasura-bot · 2019-11-19T11:32:08Z

Review app for commit 8b9556a deployed to Heroku: https://hge-ci-pull-3154.herokuapp.com
Docker image for server: hasura/graphql-engine:pull3154-8b9556af

lexi-lambda

I’m much happier with this change—thank you very much for your patience on this! 🙂

hasura-bot · 2019-11-19T17:25:47Z

Review app for commit a16fd0b deployed to Heroku: https://hge-ci-pull-3154.herokuapp.com
Docker image for server: hasura/graphql-engine:pull3154-a16fd0b2

hasura-bot · 2019-11-20T12:02:54Z

Review app for commit 87784ef deployed to Heroku: https://hge-ci-pull-3154.herokuapp.com
Docker image for server: hasura/graphql-engine:pull3154-87784ef2

hasura-bot · 2019-11-20T12:40:59Z

Review app https://hge-ci-pull-3154.herokuapp.com is deleted

…ra#3137) & improve root fields validation (hasura#3154) * allow identical column fields in 'custom_column_names' * improve validation of custom column names * improve 'checkForFieldConflicts' & validate custom column names with non column fields * split `validateTableConfig` into two isolated validation logic * improve validation of root fields * improve validating custom root fields and duplicate root fields * move 'validateCustomRootFields' to 'buildSchemaCache'

…cated data types. PR-URL: hasura/graphql-engine-mono#3264 GitOrigin-RevId: 923ea2794bef9850f1aa5714b610890d02165882

allow identical column fields in 'custom_column_names'

f362624

rakeshkky added the c/server Related to server label Oct 16, 2019

rakeshkky requested a review from lexi-lambda as a code owner October 16, 2019 07:14

rakeshkky self-assigned this Oct 16, 2019

improve validation of custom column names

b04e5fe

lexi-lambda and others added 2 commits October 16, 2019 09:35

Merge branch 'master' into identical-custom-column-names

0d40588

Merge branch 'master' into identical-custom-column-names

f361a2c

Merge branch 'master' into identical-custom-column-names

b4c3ede

lexi-lambda reviewed Oct 17, 2019

View reviewed changes

rakeshkky added 3 commits October 18, 2019 13:39

improve 'checkForFieldConflicts' & validate custom column names with …

b2df18f

…non column fields

Merge branch 'master' into identical-custom-column-names

a8892c5

Merge branch 'identical-custom-column-names' of github.com:rakeshkky/…

d1cb173

…graphql-engine into identical-custom-column-names

Merge branch 'master' into identical-custom-column-names

3737651

Resolve Conflicts: server/src-lib/Hasura/RQL/Types/SchemaCache.hs server/tests-py/test_v1_queries.py

rakeshkky added 2 commits October 22, 2019 11:08

Merge branch 'master' into identical-custom-column-names

26c087d

split validateTableConfig into two isolated validation logic

55d7083

Merge branch 'master' into identical-custom-column-names

44e2fb0

rakeshkky requested a review from lexi-lambda October 22, 2019 12:52

rakeshkky added 2 commits November 4, 2019 11:18

Merge branch 'master' into identical-custom-column-names

b4c99a9

improve validation of root fields

6479d5f

rakeshkky changed the title ~~allow identical fields in custom column names configuration (fix #3137)~~ allow identical fields in custom column names configuration (fix #3137) & improve root fields validation Nov 4, 2019

rakeshkky mentioned this pull request Nov 4, 2019

[wip] improve validation of root fields #3226

Closed

lexi-lambda reviewed Nov 12, 2019

View reviewed changes

server/src-lib/Hasura/GraphQL/Schema.hs Outdated Show resolved Hide resolved

rakeshkky added 2 commits November 13, 2019 17:16

Merge branch 'master' into identical-custom-column-names

39a0d81

improve validating custom root fields and duplicate root fields

16c6933

As suggested by @lexi-lambda

rakeshkky requested a review from lexi-lambda November 14, 2019 05:37

Merge branch 'master' into identical-custom-column-names

5e69174

Merge branch 'master' into identical-custom-column-names

76c00ef

rakeshkky added 2 commits November 19, 2019 16:02

Merge branch 'master' into identical-custom-column-names

097394c

move 'validateCustomRootFields' to 'buildSchemaCache'

8b9556a

lexi-lambda approved these changes Nov 19, 2019

View reviewed changes

Merge branch 'master' into identical-custom-column-names

a16fd0b

Merge branch 'master' into identical-custom-column-names

87784ef

lexi-lambda merged commit c4c5dd8 into hasura:master Nov 20, 2019

hasura-bot pushed a commit that referenced this pull request Jan 5, 2022

(Fix #3154) Remove occurrences of Text in MSSQL IR in favor of dedi…

304ec36

…cated data types. PR-URL: hasura/graphql-engine-mono#3264 GitOrigin-RevId: 923ea2794bef9850f1aa5714b610890d02165882

allow identical fields in custom column names configuration (fix #3137) & improve root fields validation #3154

allow identical fields in custom column names configuration (fix #3137) & improve root fields validation #3154

Uh oh!

Conversation

rakeshkky commented Oct 16, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Affected components

Related Issues

Solution and Design

Steps to test and verify

Limitations, known bugs & workarounds

Uh oh!

netlify bot commented Oct 16, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hasura-bot commented Oct 16, 2019

Uh oh!

hasura-bot commented Oct 16, 2019

Uh oh!

hasura-bot commented Oct 17, 2019

Uh oh!

hasura-bot commented Oct 17, 2019

Uh oh!

hasura-bot commented Oct 17, 2019

Uh oh!

lexi-lambda Oct 17, 2019

Choose a reason for hiding this comment

Uh oh!

rakeshkky Oct 18, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rakeshkky Oct 18, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rakeshkky Oct 18, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lexi-lambda Oct 19, 2019

Choose a reason for hiding this comment

Uh oh!

rakeshkky Oct 22, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hasura-bot commented Oct 18, 2019

Uh oh!

hasura-bot commented Oct 18, 2019

Uh oh!

hasura-bot commented Oct 22, 2019

Uh oh!

hasura-bot commented Oct 22, 2019

Uh oh!

hasura-bot commented Nov 4, 2019

Uh oh!

lexi-lambda Nov 12, 2019

Choose a reason for hiding this comment

Uh oh!

rakeshkky Nov 13, 2019

Choose a reason for hiding this comment

Uh oh!

lexi-lambda Nov 18, 2019

Choose a reason for hiding this comment

Uh oh!

rakeshkky Nov 18, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lexi-lambda Nov 18, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rakeshkky Nov 19, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

rakeshkky commented Oct 16, 2019 •

edited

Loading

netlify bot commented Oct 16, 2019 •

edited

Loading

rakeshkky Oct 18, 2019 •

edited

Loading

rakeshkky Oct 18, 2019 •

edited

Loading

rakeshkky Oct 18, 2019 •

edited

Loading

rakeshkky Oct 22, 2019 •

edited

Loading

rakeshkky Nov 18, 2019 •

edited

Loading

lexi-lambda Nov 18, 2019 •

edited

Loading

rakeshkky Nov 19, 2019 •

edited

Loading