-
Notifications
You must be signed in to change notification settings - Fork 2.8k
allow identical fields in custom column names configuration (fix #3137) & improve root fields validation #3154
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
allow identical fields in custom column names configuration (fix #3137) & improve root fields validation #3154
Conversation
|
Deploy preview for hasura-docs ready! Built with commit 87784ef |
|
Review app for commit f362624 deployed to Heroku: https://hge-ci-pull-3154.herokuapp.com |
|
Review app for commit b04e5fe deployed to Heroku: https://hge-ci-pull-3154.herokuapp.com |
|
Review app for commit 0d40588 deployed to Heroku: https://hge-ci-pull-3154.herokuapp.com |
|
Review app for commit f361a2c deployed to Heroku: https://hge-ci-pull-3154.herokuapp.com |
|
Review app for commit b4c3ede deployed to Heroku: https://hge-ci-pull-3154.herokuapp.com |
| buildSchemaCacheFor (MOTable tableName) | ||
| return successMsg | ||
| where | ||
| validateWithExistingRelationships fields = do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don’t think I totally understand all the subtleties of this issue. Could you explain why validateWithExistingRelationships is different from/not a part of validateTableConfig?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. validateTableConfig happens when initialising TableInfo of a table. At that time no relationships were added. Hence, we cannot do validation with relationships.
The set_table_custom_fields is applied for only tracked tables. We should validate custom column names with existing relationships. However, validateTableConfig is used here via buildSchemaCacheFor function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, I got stuck with a thought while writing the above reply. What about validating the new relationships being added with existing custom column names? We should do that definitely. Will push a commit with that change soon.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Putting my all thoughts below:
I have a simple schema
CREATE TABLE "user"(
id SERIAL PRIMARY KEY,
name TEXT NOT NULL,
age INTEGER
)I tracked the above table with the following custom column names configuration.
{
"id": "userId",
"name": "userName"
}What could be the best way to validate a relationship with name X that is being defined? Currently, we only check with existing column names and relationships. We should definitely validate with custom column names (userId and userName).
Also, I'm thinking about validating the name for the new class of fields coming soon (computed fields and remote joins). I think we should improve the logic of function checkForFieldConflict and should change validateWithExistingRelationships to something like validateWithFieldsOtherThanColumns.
I'll answer all these in the next commit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure.
validateTableConfighappens when initialisingTableInfoof a table. At that time no relationships were added. Hence, we cannot do validation with relationships.
Got it, that makes sense! As a followup question: do you think we could consolidate this logic somehow? I know that each code path has slightly different inputs, and you have a firmer grasp on the details than I do, but it would be nice if we could share the logic between validateTableConfig, validateWithNonColumnFields, and checkForFieldConflict somehow so that it can’t get out of sync.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Prior to this commit the server used to build the TableInfo on track_table query and add it to the cache later. Now buildSchemaCacheFor is used to construct TableInfo from scratch and add update it with other table metadata objects. Thus we don't have the luxury of having TableInfo to validate the input custom fields configuration in trackExistingTableOrViewP2 and runSetTableCustomFields. This forces the validation of custom fields configuration with existing column names to occur only in buildSchemaCacheFor code path.
In the set_table_custom_fields query API, buildSchemaCacheFor is being used to rebuild the TableConfig. Here, the validation for custom fields occur in two steps
- With existing un-updated
TableInfo, validate the custom column fields with existing non-columns (relationships, computed fields as of now). - We can validate the input custom column fields with existing columns here. But, it doesn't make sense since the same happens in
buildSchemaCacheForcode path.
After a bit of thinking, I'm planning to do the following:
- Scrap
validateTableConfigfunction and break it into two isolated logics;validateCustomRootFieldsandvalidateWithColumns. - Use
validateCustomRootFieldsdirectly intrackExistingTableOrViewP2andrunSetTableCustomFields - Use
validateWithColumnsonly inbuildTableCache
EDIT 1: The changes I mentioned above are live in the PR.
…non column fields
…graphql-engine into identical-custom-column-names
|
Review app for commit d1cb173 deployed to Heroku: https://hge-ci-pull-3154.herokuapp.com |
Resolve Conflicts: server/src-lib/Hasura/RQL/Types/SchemaCache.hs server/tests-py/test_v1_queries.py
|
Review app for commit 3737651 deployed to Heroku: https://hge-ci-pull-3154.herokuapp.com |
|
Review app for commit 55d7083 deployed to Heroku: https://hge-ci-pull-3154.herokuapp.com |
|
Review app for commit 44e2fb0 deployed to Heroku: https://hge-ci-pull-3154.herokuapp.com |
|
Review app for commit 6479d5f deployed to Heroku: https://hge-ci-pull-3154.herokuapp.com |
| where | ||
| duplicateNames = duplicates $ M.elems colFlds | ||
| => GC.TableCustomRootFields -> m () | ||
| validateCustomRootFields customRootFields = do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the vein of “parse, don’t validate,” is there some way we could make this return something other than ()? 🙂 Maybe it could build a map from root field names to something? I don’t know if that makes sense, but just throwing it out there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no type that TableCustomRootFields resolve to. Hence, this function cannot return other than (). But partly we can achieve "parse, don't validate," :). This function is performing two tasks:
- check for duplicate custom root field names within
TableCustomRootFieldstype.
Happily, we can move this part toFromJSONderiving of the type. - check for any conflicting root field in the remote schema.
This is something we can't avoid right now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for taking so long to respond to this—I’ve had a few too many things going on, and in order to understand all the details here, I needed to actually take the time to checkout this branch and look at the code.
I’ve now done that, and I see what you mean about it not being able to return anything more meaningful. However, that’s actually kind of why I added the buildSchemaCacheFor function in the first place—my idea was that buildSchemaCache could be where all the “canonical” parsing/validation logic happens, and we could just optimistically assume the data is correct, update the catalog, and run buildSchemaCache to see if it builds successfully. If it doesn’t, the process will fail, and the database transaction will get rolled back.
I don’t love that solution, since it’s fairly indirect, and it duplicates a lot of work. But it does have some things going for it. Here was my reasoning:
-
The idea of “parse, don’t validate” is that we want to use the type system to ensure the necessary parsing happens as a necessary consequence of the data processing pipeline.
-
The problem is that part of our processing pipeline goes out into Postgres, then back into graphql-engine, which is tricky, since obviously the Haskell type system cannot guarantee anything about what goes on inside Postgres.
-
However, at least with Postgres we get database transactions, so we can always muck up the state of the catalog as much as we want as long as we always ensure it is consistent before we eventually commit.
-
Since
buildSchemaCachefundamentally has to check integrity of the catalog (since it has to build aSchemaCachevalue), it makes sense as a canonical code path for ensuring invariants. Therefore, if we can successfully build the schema cache, the catalog must be well-formed, right?
There are, however, some problems with this approach, as you’ve probably seen for yourself. For one, it’s slow. For another, it can make doing error reporting tricky, since the error that gets generated might not have anything to do with the thing we’re changing. The latter issue was what buildSchemaCacheFor was supposed to be about—it’s a way of specifying that the schema cache is being rebuilt with the intent to check the integrity of a particular object.
The former issue is more complicated to deal with (though I don’t think it’s impossible), but it’s not actually a huge issue for this change, since trackExistingTableOrViewP2 already calls buildSchemaCacheFor, anyway. Therefore, we can’t be slowing it down any more by taking that code path, right?
Therefore, what if we skipped the call to validateCustomRootFields entirely and did the integrity-checking via the canonical buildSchemaCacheFor code path? Is that viable, do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I want to give you a bit of history :). Initially, the GraphQL Engine has only Postgres tables with relationships and permissions. Later on, in many iterations, it has evolved to what it is now with remote schemas, event triggers, computed fields etc. If we see, the buildSchemaCache function is quite sequential. The sequence (except computed fields) implies the evolution of the GraphQL Engine (interesting!). The TableInfo for all tables are built first, relationships are added next, then computed fields, permissions etc. The validation checks are inter-dependant. For example, there shouldn't be a computed field with a relationship name or vice versa. The permissions are defined only after all "fields" for a table are resolved. Any change in the sequence may result in permanent inconsistency (a potential bug). Well, the buildSchemaCacheFor is useful in trackExistingTableOrViewP2 since the server is defining the metadata of the table for the first time. But, the validation of its custom root fields configuration dependant on the existing remote schemas. Here, building the TableInfo dependant on the remote schemas which are added much later in the buildSchemaCache. Hence, It is hard to achieve validateCustomRootFields in buildSchemaCacheFor. So I'm calling it well before buildSchemaCacheFor to validate with existing remote schemas and optimistic about the remote schemas are not being altered in the current code path.
The same happening with set_table_custom_fields metadata API. We've to validate the custom column names with existing fields other than columns, which are added to the TableInfo later in the buildSchemaCache. Also, we should definitely do validateCustomRootFields before buildSchemaCacheFor.
Apart from two problems you mentioned with buildSchemaCacheFor, the third problem aforementioned (with the peculiar sequential dependence of buildSchemaCache) in the above two paragraphs is making the function, buildSchemaCacheFor, not a one-stop solution in achieving "parse, don't validate". We should definitely do validate :: m () in the metadata APIs which rely upon validating against existing metadata objects which are being added later in the buildSchemaCache.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The validation checks are inter-dependant. For example, there shouldn't be a computed field with a relationship name or vice versa. The permissions are defined only after all "fields" for a table are resolved. Any change in the sequence may result in permanent inconsistency (a potential bug).
This makes sense, but surely buildSchemaCache must validate everything (including custom column names!), or else buildSchemaCache is buggy, right? If the metadata were to end up in a state where the custom column name configuration is invalid, then buildSchemaCache must fail. To me, this implies that, regardless of the order in which it does the validation, eventually it will get validated.
Perhaps what you’re telling me, however, is that isn’t the case? If I were to, with this PR, start up an instance of graphql-engine against a database where a custom column name configuration is somehow invalid, what would happen? Would it report an error on startup about an invalid custom column name configuration, or would it report some arcane internal error message about something else entirely (or worse, not report an error at all until something else went wrong during execution)?
I do understand that some validation (in the m () sense) must happen somewhere, and it’s highly unlikely that we can encode all of our invariants into the type system, but I don’t understand why that validation can’t happen despite the sequential dependencies inside buildSchemaCache. In the absolute worst-case scenario, we add some validation checks at the end of buildSchemaCache, after the whole schema cache has been built, and we have all the information we need.
(As a related aside, I have started working on a fix for #3354 today, which involves making the performance of buildSchemaCache better, so I’ve been implementing the “automatic incrementalization” idea I’ve had for a little while now. That should (eventually) make the performance of buildSchemaCache much better, since it can avoid rebuilding everything every time it is called. I don’t think the performance problem of buildSchemaCache is fundamental, and I think there are significant advantages to consolidating the logic around the catalog!)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the absolute worst-case scenario, we add some validation checks at the end of buildSchemaCache, after the whole schema cache has been built, and we have all the information we need.
This seems good to me. For now, I'll move validation of custom configuration of tables which dependant on the objects added later to end of buildSchemaCache.
Please add me as a reviewer for #3354 :). I would love to see it.
|
Review app for commit 16c6933 deployed to Heroku: https://hge-ci-pull-3154.herokuapp.com |
|
Review app for commit 5e69174 deployed to Heroku: https://hge-ci-pull-3154.herokuapp.com |
|
Review app for commit 76c00ef deployed to Heroku: https://hge-ci-pull-3154.herokuapp.com |
|
Review app for commit 8b9556a deployed to Heroku: https://hge-ci-pull-3154.herokuapp.com |
lexi-lambda
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’m much happier with this change—thank you very much for your patience on this! 🙂
|
Review app for commit a16fd0b deployed to Heroku: https://hge-ci-pull-3154.herokuapp.com |
|
Review app for commit 87784ef deployed to Heroku: https://hge-ci-pull-3154.herokuapp.com |
|
Review app https://hge-ci-pull-3154.herokuapp.com is deleted |
…ra#3137) & improve root fields validation (hasura#3154) * allow identical column fields in 'custom_column_names' * improve validation of custom column names * improve 'checkForFieldConflicts' & validate custom column names with non column fields * split `validateTableConfig` into two isolated validation logic * improve validation of root fields * improve validating custom root fields and duplicate root fields * move 'validateCustomRootFields' to 'buildSchemaCache'
…cated data types. PR-URL: hasura/graphql-engine-mono#3264 GitOrigin-RevId: 923ea2794bef9850f1aa5714b610890d02165882
Description
Do not throw an error if a custom column name is identical to the Postgres name of the column.
track_tablev2 API throws an error if the request isError:
"column/relationship "id" of table "users" already exists"This PR improves the logic of validating custom column names thus fixes the above. Also, this PR improves the check of duplicate root fields generated for all tables.
Affected components
Related Issues
Fix a part of #3137
Solution and Design
set_table_custom_fieldsquery type.GCtxas lists instead of hashmaps. Check for duplicate root field names using the lists.Steps to test and verify
Try defining the
custom_column_nameswith identical fields. The server shouldn't throw an error.Limitations, known bugs & workarounds