-
Notifications
You must be signed in to change notification settings - Fork 3
Remove OutputSchemaVisitor, add schema() as a QueryExpr method #74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
tmager
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I need to read through the code change in more detail, but a couple of initial thoughts:
- "add schema as a field of QueryExpr, then add a rewriting pass that fills this value": Could you expand on this a bit, and why you think it would be a better approach? My first thought is that having an additional field that may or may not be set at a given point in time will make things less clear.
- Not something for this MR, but it seems like
_query_expr.pyis getting very long, and we haven't incorporated that much of the compilation into it yet. Should we consider breaking it up into a multi-file submodule?
Yes, I initially proposed to have a separate intermediary object but it was super hard to prototype this and start making progress in that direction. After discussing this with @Maegereg, I changed the proposal to instead use a series of rewrite rules, in a similar way than how KeySets now work. This does lead to a situation where some fields start being not set and then get set in rewrite rules, but I think we will still end up with much cleaner logic — with a clear separation between successive rewrite rules, and each one being simpler and self-contained. In that world, it would sense to me to also have schema determination be one of the rewrite rules (but I'm not 100% sure about this, in particular I'm not super clear on when this would need to happen in the sequence of rewrite rules). I would love your feedback on this!
Yes, I think this PR basically gets this file into "should be broken up if we add more logic" territory; I may do this as a follow-up issue if I need to add more stuff in there. |
|
Ah, I see where you're coming from. For some properties I could see that, but for schemas specifically: are there situations where the rewrite of a particular expression changes the resulting schema? It seems to me that shouldn't happen for the expression end-to-end (though some of the intermediate pieces may have their schemas changed/rearranged in the rewrite). That might be why it's unclear when the rewrite rule that fills in the schema should run -- there's no obvious good time, because it should work equivalently at any point. On a related note, what would a rewrite rule that fills in the schemas look like? We presumably don't want to embed all the schema logic in that rewrite rule (that just gets us back to something like the schema visitor), which means calling some (This is slightly complicated by the fact that rewrite rules need to be able to do things like "pull out this entire expression and turn it into a separate query to get groups from it", but I think the above idea about the rewrites not changing the schema still holds: we're replacing a general expression that does a groupby where the groups have a particular schema with a concrete groupby referencing a new expression with that same output schema, so the overall schema of the original expression doesn't change.) |
|
Hmm, you're right, so maybe the way of computing the schema in this PR is the right approach. |
Maegereg
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have some suggestions for improvements, but I'm open to the idea that this should be a clean refactor that doesn't change functionality (and thus improvements should be pushed to follow-on work).
|
Also, I think I agree with Tom - doing schemas as a rewrite seems worse than this approach. |
tmager
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Found a few possible bugs, but I think they're all in the original code -- I would argue that we should try to keep this as a straight refactor, it's already a pretty big diff.
tmager
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM aside from the existing comments.
* Removes OutputSchemaVisitor, add schema() as a QueryExpr method * make linters happy * first batch of comments * review comments, mostly splitting validation * actually perform validation. otherwise validation is not performed. * ah it was just because get_bounds has no transformation! doing it this way makes a lot more sense * check the schema for *both* transformations and measurements --------- Co-authored-by: Damien Desfontaines <TedTed@users.noreply.github.com*>
This is one step towards getting rid of the visitor pattern in our code.
Halfway through this change, I suddenly realized that it would have been better to do this differently, and instead add
schemaas a field ofQueryExpr, then add a rewriting pass that fills this value. But I still think this change is doing more good than harm, so here is a PR.