这是indexloc提供的服务,不要输入任何密码
Skip to content

Conversation

@rakeshkky
Copy link
Member

@rakeshkky rakeshkky commented Oct 16, 2019

Description

Do not throw an error if a custom column name is identical to the Postgres name of the column.

track_table v2 API throws an error if the request is

{
  "type": "track_table",
  "version": 2,
  "args": {
    "table": "users",
    "configuration": {
        "custom_column_names": {
          "created_at": "createdAt",
          "id": "id",
          "email": "email",
          "verified_at": "verifiedAt",
        }
    }
  }
}

Error: "column/relationship "id" of table "users" already exists"

This PR improves the logic of validating custom column names thus fixes the above. Also, this PR improves the check of duplicate root fields generated for all tables.

Affected components

  • Server
  • Tests

Related Issues

Fix a part of #3137

Solution and Design

  • Modify logic of validating custom column names. Create unified Hashmap of column and custom name and check for duplicate custom names.
  • Validate custom column names against existing not column fields for set_table_custom_fields query type.
  • Store root fields in GCtx as lists instead of hashmaps. Check for duplicate root field names using the lists.

Steps to test and verify

Try defining the custom_column_names with identical fields. The server shouldn't throw an error.

Limitations, known bugs & workarounds

@rakeshkky rakeshkky added the c/server Related to server label Oct 16, 2019
@rakeshkky rakeshkky self-assigned this Oct 16, 2019
@netlify
Copy link

netlify bot commented Oct 16, 2019

Deploy preview for hasura-docs ready!

Built with commit 87784ef

https://deploy-preview-3154--hasura-docs.netlify.com

@hasura-bot
Copy link
Contributor

Review app for commit f362624 deployed to Heroku: https://hge-ci-pull-3154.herokuapp.com
Docker image for server: hasura/graphql-engine:pull3154-f3626247

@hasura-bot
Copy link
Contributor

Review app for commit b04e5fe deployed to Heroku: https://hge-ci-pull-3154.herokuapp.com
Docker image for server: hasura/graphql-engine:pull3154-b04e5fe0

@hasura-bot
Copy link
Contributor

Review app for commit 0d40588 deployed to Heroku: https://hge-ci-pull-3154.herokuapp.com
Docker image for server: hasura/graphql-engine:pull3154-0d40588c

@hasura-bot
Copy link
Contributor

Review app for commit f361a2c deployed to Heroku: https://hge-ci-pull-3154.herokuapp.com
Docker image for server: hasura/graphql-engine:pull3154-f361a2cf

@hasura-bot
Copy link
Contributor

Review app for commit b4c3ede deployed to Heroku: https://hge-ci-pull-3154.herokuapp.com
Docker image for server: hasura/graphql-engine:pull3154-b4c3edea

buildSchemaCacheFor (MOTable tableName)
return successMsg
where
validateWithExistingRelationships fields = do
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don’t think I totally understand all the subtleties of this issue. Could you explain why validateWithExistingRelationships is different from/not a part of validateTableConfig?

Copy link
Member Author

@rakeshkky rakeshkky Oct 18, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. validateTableConfig happens when initialising TableInfo of a table. At that time no relationships were added. Hence, we cannot do validation with relationships.

The set_table_custom_fields is applied for only tracked tables. We should validate custom column names with existing relationships. However, validateTableConfig is used here via buildSchemaCacheFor function.

Copy link
Member Author

@rakeshkky rakeshkky Oct 18, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, I got stuck with a thought while writing the above reply. What about validating the new relationships being added with existing custom column names? We should do that definitely. Will push a commit with that change soon.

Copy link
Member Author

@rakeshkky rakeshkky Oct 18, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Putting my all thoughts below:

I have a simple schema

CREATE TABLE "user"(
  id SERIAL PRIMARY KEY,
  name TEXT NOT NULL,
  age INTEGER
)

I tracked the above table with the following custom column names configuration.

{
    "id": "userId",
    "name": "userName"
}

What could be the best way to validate a relationship with name X that is being defined? Currently, we only check with existing column names and relationships. We should definitely validate with custom column names (userId and userName).
Also, I'm thinking about validating the name for the new class of fields coming soon (computed fields and remote joins). I think we should improve the logic of function checkForFieldConflict and should change validateWithExistingRelationships to something like validateWithFieldsOtherThanColumns.

I'll answer all these in the next commit.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. validateTableConfig happens when initialising TableInfo of a table. At that time no relationships were added. Hence, we cannot do validation with relationships.

Got it, that makes sense! As a followup question: do you think we could consolidate this logic somehow? I know that each code path has slightly different inputs, and you have a firmer grasp on the details than I do, but it would be nice if we could share the logic between validateTableConfig, validateWithNonColumnFields, and checkForFieldConflict somehow so that it can’t get out of sync.

Copy link
Member Author

@rakeshkky rakeshkky Oct 22, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prior to this commit the server used to build the TableInfo on track_table query and add it to the cache later. Now buildSchemaCacheFor is used to construct TableInfo from scratch and add update it with other table metadata objects. Thus we don't have the luxury of having TableInfo to validate the input custom fields configuration in trackExistingTableOrViewP2 and runSetTableCustomFields. This forces the validation of custom fields configuration with existing column names to occur only in buildSchemaCacheFor code path.

In the set_table_custom_fields query API, buildSchemaCacheFor is being used to rebuild the TableConfig. Here, the validation for custom fields occur in two steps

  1. With existing un-updated TableInfo, validate the custom column fields with existing non-columns (relationships, computed fields as of now).
  2. We can validate the input custom column fields with existing columns here. But, it doesn't make sense since the same happens in buildSchemaCacheFor code path.

After a bit of thinking, I'm planning to do the following:

  1. Scrap validateTableConfig function and break it into two isolated logics; validateCustomRootFields and validateWithColumns.
  2. Use validateCustomRootFields directly in trackExistingTableOrViewP2 and runSetTableCustomFields
  3. Use validateWithColumns only in buildTableCache

EDIT 1: The changes I mentioned above are live in the PR.

@hasura-bot
Copy link
Contributor

Review app for commit d1cb173 deployed to Heroku: https://hge-ci-pull-3154.herokuapp.com
Docker image for server: hasura/graphql-engine:pull3154-d1cb1738

Resolve Conflicts:
	server/src-lib/Hasura/RQL/Types/SchemaCache.hs
	server/tests-py/test_v1_queries.py
@hasura-bot
Copy link
Contributor

Review app for commit 3737651 deployed to Heroku: https://hge-ci-pull-3154.herokuapp.com
Docker image for server: hasura/graphql-engine:pull3154-37376514

@hasura-bot
Copy link
Contributor

Review app for commit 55d7083 deployed to Heroku: https://hge-ci-pull-3154.herokuapp.com
Docker image for server: hasura/graphql-engine:pull3154-55d70830

@hasura-bot
Copy link
Contributor

Review app for commit 44e2fb0 deployed to Heroku: https://hge-ci-pull-3154.herokuapp.com
Docker image for server: hasura/graphql-engine:pull3154-44e2fb04

@rakeshkky rakeshkky requested a review from lexi-lambda October 22, 2019 12:52
@rakeshkky rakeshkky changed the title allow identical fields in custom column names configuration (fix #3137) allow identical fields in custom column names configuration (fix #3137) & improve root fields validation Nov 4, 2019
@hasura-bot
Copy link
Contributor

Review app for commit 6479d5f deployed to Heroku: https://hge-ci-pull-3154.herokuapp.com
Docker image for server: hasura/graphql-engine:pull3154-6479d5fb

where
duplicateNames = duplicates $ M.elems colFlds
=> GC.TableCustomRootFields -> m ()
validateCustomRootFields customRootFields = do
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the vein of “parse, don’t validate,” is there some way we could make this return something other than ()? 🙂 Maybe it could build a map from root field names to something? I don’t know if that makes sense, but just throwing it out there.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no type that TableCustomRootFields resolve to. Hence, this function cannot return other than (). But partly we can achieve "parse, don't validate," :). This function is performing two tasks:

  1. check for duplicate custom root field names within TableCustomRootFields type.
    Happily, we can move this part to FromJSON deriving of the type.
  2. check for any conflicting root field in the remote schema.
    This is something we can't avoid right now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for taking so long to respond to this—I’ve had a few too many things going on, and in order to understand all the details here, I needed to actually take the time to checkout this branch and look at the code.

I’ve now done that, and I see what you mean about it not being able to return anything more meaningful. However, that’s actually kind of why I added the buildSchemaCacheFor function in the first place—my idea was that buildSchemaCache could be where all the “canonical” parsing/validation logic happens, and we could just optimistically assume the data is correct, update the catalog, and run buildSchemaCache to see if it builds successfully. If it doesn’t, the process will fail, and the database transaction will get rolled back.

I don’t love that solution, since it’s fairly indirect, and it duplicates a lot of work. But it does have some things going for it. Here was my reasoning:

  • The idea of “parse, don’t validate” is that we want to use the type system to ensure the necessary parsing happens as a necessary consequence of the data processing pipeline.

  • The problem is that part of our processing pipeline goes out into Postgres, then back into graphql-engine, which is tricky, since obviously the Haskell type system cannot guarantee anything about what goes on inside Postgres.

  • However, at least with Postgres we get database transactions, so we can always muck up the state of the catalog as much as we want as long as we always ensure it is consistent before we eventually commit.

  • Since buildSchemaCache fundamentally has to check integrity of the catalog (since it has to build a SchemaCache value), it makes sense as a canonical code path for ensuring invariants. Therefore, if we can successfully build the schema cache, the catalog must be well-formed, right?

There are, however, some problems with this approach, as you’ve probably seen for yourself. For one, it’s slow. For another, it can make doing error reporting tricky, since the error that gets generated might not have anything to do with the thing we’re changing. The latter issue was what buildSchemaCacheFor was supposed to be about—it’s a way of specifying that the schema cache is being rebuilt with the intent to check the integrity of a particular object.

The former issue is more complicated to deal with (though I don’t think it’s impossible), but it’s not actually a huge issue for this change, since trackExistingTableOrViewP2 already calls buildSchemaCacheFor, anyway. Therefore, we can’t be slowing it down any more by taking that code path, right?

Therefore, what if we skipped the call to validateCustomRootFields entirely and did the integrity-checking via the canonical buildSchemaCacheFor code path? Is that viable, do you think?

Copy link
Member Author

@rakeshkky rakeshkky Nov 18, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to give you a bit of history :). Initially, the GraphQL Engine has only Postgres tables with relationships and permissions. Later on, in many iterations, it has evolved to what it is now with remote schemas, event triggers, computed fields etc. If we see, the buildSchemaCache function is quite sequential. The sequence (except computed fields) implies the evolution of the GraphQL Engine (interesting!). The TableInfo for all tables are built first, relationships are added next, then computed fields, permissions etc. The validation checks are inter-dependant. For example, there shouldn't be a computed field with a relationship name or vice versa. The permissions are defined only after all "fields" for a table are resolved. Any change in the sequence may result in permanent inconsistency (a potential bug). Well, the buildSchemaCacheFor is useful in trackExistingTableOrViewP2 since the server is defining the metadata of the table for the first time. But, the validation of its custom root fields configuration dependant on the existing remote schemas. Here, building the TableInfo dependant on the remote schemas which are added much later in the buildSchemaCache. Hence, It is hard to achieve validateCustomRootFields in buildSchemaCacheFor. So I'm calling it well before buildSchemaCacheFor to validate with existing remote schemas and optimistic about the remote schemas are not being altered in the current code path.

The same happening with set_table_custom_fields metadata API. We've to validate the custom column names with existing fields other than columns, which are added to the TableInfo later in the buildSchemaCache. Also, we should definitely do validateCustomRootFields before buildSchemaCacheFor.

Apart from two problems you mentioned with buildSchemaCacheFor, the third problem aforementioned (with the peculiar sequential dependence of buildSchemaCache) in the above two paragraphs is making the function, buildSchemaCacheFor, not a one-stop solution in achieving "parse, don't validate". We should definitely do validate :: m () in the metadata APIs which rely upon validating against existing metadata objects which are being added later in the buildSchemaCache.

Copy link
Contributor

@lexi-lambda lexi-lambda Nov 18, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The validation checks are inter-dependant. For example, there shouldn't be a computed field with a relationship name or vice versa. The permissions are defined only after all "fields" for a table are resolved. Any change in the sequence may result in permanent inconsistency (a potential bug).

This makes sense, but surely buildSchemaCache must validate everything (including custom column names!), or else buildSchemaCache is buggy, right? If the metadata were to end up in a state where the custom column name configuration is invalid, then buildSchemaCache must fail. To me, this implies that, regardless of the order in which it does the validation, eventually it will get validated.

Perhaps what you’re telling me, however, is that isn’t the case? If I were to, with this PR, start up an instance of graphql-engine against a database where a custom column name configuration is somehow invalid, what would happen? Would it report an error on startup about an invalid custom column name configuration, or would it report some arcane internal error message about something else entirely (or worse, not report an error at all until something else went wrong during execution)?

I do understand that some validation (in the m () sense) must happen somewhere, and it’s highly unlikely that we can encode all of our invariants into the type system, but I don’t understand why that validation can’t happen despite the sequential dependencies inside buildSchemaCache. In the absolute worst-case scenario, we add some validation checks at the end of buildSchemaCache, after the whole schema cache has been built, and we have all the information we need.

(As a related aside, I have started working on a fix for #3354 today, which involves making the performance of buildSchemaCache better, so I’ve been implementing the “automatic incrementalization” idea I’ve had for a little while now. That should (eventually) make the performance of buildSchemaCache much better, since it can avoid rebuilding everything every time it is called. I don’t think the performance problem of buildSchemaCache is fundamental, and I think there are significant advantages to consolidating the logic around the catalog!)

Copy link
Member Author

@rakeshkky rakeshkky Nov 19, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the absolute worst-case scenario, we add some validation checks at the end of buildSchemaCache, after the whole schema cache has been built, and we have all the information we need.

This seems good to me. For now, I'll move validation of custom configuration of tables which dependant on the objects added later to end of buildSchemaCache.

Please add me as a reviewer for #3354 :). I would love to see it.

@hasura-bot
Copy link
Contributor

Review app for commit 16c6933 deployed to Heroku: https://hge-ci-pull-3154.herokuapp.com
Docker image for server: hasura/graphql-engine:pull3154-16c69330

@hasura-bot
Copy link
Contributor

Review app for commit 5e69174 deployed to Heroku: https://hge-ci-pull-3154.herokuapp.com
Docker image for server: hasura/graphql-engine:pull3154-5e69174f

@hasura-bot
Copy link
Contributor

Review app for commit 76c00ef deployed to Heroku: https://hge-ci-pull-3154.herokuapp.com
Docker image for server: hasura/graphql-engine:pull3154-76c00ef8

@hasura-bot
Copy link
Contributor

Review app for commit 8b9556a deployed to Heroku: https://hge-ci-pull-3154.herokuapp.com
Docker image for server: hasura/graphql-engine:pull3154-8b9556af

Copy link
Contributor

@lexi-lambda lexi-lambda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m much happier with this change—thank you very much for your patience on this! 🙂

@hasura-bot
Copy link
Contributor

Review app for commit a16fd0b deployed to Heroku: https://hge-ci-pull-3154.herokuapp.com
Docker image for server: hasura/graphql-engine:pull3154-a16fd0b2

@hasura-bot
Copy link
Contributor

Review app for commit 87784ef deployed to Heroku: https://hge-ci-pull-3154.herokuapp.com
Docker image for server: hasura/graphql-engine:pull3154-87784ef2

@lexi-lambda lexi-lambda merged commit c4c5dd8 into hasura:master Nov 20, 2019
@hasura-bot
Copy link
Contributor

Review app https://hge-ci-pull-3154.herokuapp.com is deleted

polRk pushed a commit to polRk/graphql-engine that referenced this pull request Feb 12, 2020
…ra#3137) & improve root fields validation (hasura#3154)

* allow identical column fields in 'custom_column_names'
* improve validation of custom column names
* improve 'checkForFieldConflicts' & validate custom column names with non column fields
* split `validateTableConfig` into two isolated validation logic
* improve validation of root fields
* improve validating custom root fields and duplicate root fields
* move 'validateCustomRootFields' to 'buildSchemaCache'
hasura-bot pushed a commit that referenced this pull request Jan 5, 2022
…cated data types.

PR-URL: hasura/graphql-engine-mono#3264
GitOrigin-RevId: 923ea2794bef9850f1aa5714b610890d02165882
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

c/server Related to server

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants