这是indexloc提供的服务,不要输入任何密码
Skip to content

Rework expression inference spec draft based on review comments. #3850

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Jun 17, 2024

Conversation

stereotype441
Copy link
Member

The following changes (mostly based on suggestions by Leaf and Erik) are included:

  • The text is now under a heading called "expression inference" rather than "body inference" (which was confusing).

  • Instead of trying to describe the output of type inference using a vague notion of "expression artifacts" whose runtime behavior is described in prose, I'm specifying it using a Dart-like syntax called "elaborated expressions". Elaborated expressions look generally like Dart expressions, except they have some limitations (types are fully specified, and complex constructs like null shorting are desugared, and type checks are explicit) and support some extra forms (such as "let" expressions).

  • I've simplified the terminology I use to describe the three things that can happen when an expression is evaluated (it can diverge, throw an exception, or evaluate to a value). My previous terminology was confusing ("fail to complete at all", "complete normally with a value", or "complete with an exception").

  • I've re-worked the treatment coercions so that I'm no longer making subsumption explicit in the language of elaborated expressions. (I never meant to make subsumption explicit; it happened as a mistake due to unclear thinking on my part.)

  • I broke the "numbers" subsection into separate "integer literals" and "double literals" sections. (Previosuly I only specified how integer literals are handled.)

  • The sketches of soundness proofs have been simplified a bit. I'm no longer trying to crisply distinguish "expression soundness", "return value soundness", "tear-off soundness", and "future soundness"; rather, I'm speaking about soundness more informally, as befits an informal proof sketch.

  • On the other hand, I've clarified precisely what I mean by "is an instance of", since the soundness proof sketches don't make any sense if your definition of this concept doesn't agree with what's in my head. I've also made sure this concept correctly accounts for extension types.

  • And I've added text explaining why the interpretation of await expressions is indeed sound. (Previously I had a "TODO" because I thought maybe they weren't sound. But they're sound; it just took a little extra work to prove it to myself.)

  • Thanks for your contribution! Please replace this text with a description of what this PR is changing or adding and why, list any relevant issues, and review the contribution guidelines below.


  • I’ve reviewed the contributor guide and applied the relevant portions to this PR.
Contribution guidelines:

Note that many Dart repos have a weekly cadence for reviewing PRs - please allow for some latency before initial review feedback.

The following changes (mostly based on suggestions by Leaf and Erik)
are included:

- The text is now under a heading called "expression inference" rather
  than "body inference" (which was confusing).

- Instead of trying to describe the output of type inference using a
  vague notion of "expression artifacts" whose runtime behavior is
  described in prose, I'm specifying it using a Dart-like syntax
  called "elaborated expressions". Elaborated expressions look
  generally like Dart expressions, except they have some limitations
  (types are fully specified, and complex constructs like null
  shorting are desugared, and type checks are explicit) and support
  some extra forms (such as "let" expressions).

- I've simplified the terminology I use to describe the three things
  that can happen when an expression is evaluated (it can diverge,
  throw an exception, or evaluate to a value). My previous terminology
  was confusing ("fail to complete at all", "complete normally with a
  value", or "complete with an exception").

- I've re-worked the treatment coercions so that I'm no longer making
  subsumption explicit in the language of elaborated expressions. (I
  never meant to make subsumption explicit; it happened as a mistake
  due to unclear thinking on my part.)

- I broke the "numbers" subsection into separate "integer literals"
  and "double literals" sections. (Previosuly I only specified how
  integer literals are handled.)

- The sketches of soundness proofs have been simplified a bit. I'm no
  longer trying to crisply distinguish "expression soundness", "return
  value soundness", "tear-off soundness", and "future soundness";
  rather, I'm speaking about soundness more informally, as befits an
  informal proof sketch.

- On the other hand, I've clarified precisely what I mean by "is an
  instance of", since the soundness proof sketches don't make any
  sense if your definition of this concept doesn't agree with what's
  in my head. I've also made sure this concept correctly accounts for
  extension types.

- And I've added text explaining why the interpretation of `await`
  expressions is indeed sound. (Previously I had a "TODO" because I
  thought maybe they weren't sound. But they're sound; it just took a
  little extra work to prove it to myself.)
An invariant of expression inference, known as _soundness_, is that when the
elaborated form of any expression in the program is executed, it is guaranteed
either to diverge, throw an exception, or evaluate to a value that is an
_instance_ of its static type. _Instance_ is defined as follows: a value `v` is
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"instance...of ...type" - of usually read that as an object whose runtime type is exactly (the extension type erasure of) that type, not just a subtype of it.
I can see why this definition is practical, could we call it singing slightly less direct? "Instance satisfying type" perhaps?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting. @leafpetersen had a similar reaction. I pointed out to him that my interpretation was based on wanting to be consistent with the existing spec, which is what led to the non-normative comment below.

I would be totally open to switching to different terminology that is more consistent with common usage, but I would want to make the change both in this doc and in dartLangSpec.tex 😃

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked the specification. Most uses of "instance of" are about the types bool, int, double, String and Null. Those technically can be subclasses, but we usually pretend they're not.

Other uses are:

A function object is always an instance of some class $C$ that implements
the class \FUNCTION{} {\ref{functionType}),

If the transitive meaning was used, it could just say "an instance that implements the class \FUNCTION{}".

Or:

If $v$ is an instance of class \code{Error} or a subclass thereof,

which is about whether throwing sets stackTrace on the object.
Here it the meaning is clearly not transitive, because then it would be wrong.

And:

but $e$ evaluates to an instance of a type
which is not a subtype of \code{Iterable<\DYNAMIC>}.%

Again, this could just be "to an object which is not an instance of \code{Iterable<\DYNAMIC>}."

There are definitely also other places where "instance of" is used transitively, but I wouldn't say that the specification is consistently using it that way.

Using it transitively gets confusing, because then I don't have a good way to say that an object is an instance of a precise type. (I guess "the runtime type of the object is TheType" could work.)

I'd rather use "x is an instance of T" to mean that the runtime type of x is the class of T (and keep pretending there are no subtypes of int and String).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked the specification. Most uses of "instance of" are about the types bool, int, double, String and Null. Those technically can be subclasses, but we usually pretend they're not.

Ok, I think I've come around to your way of thinking. (FWIW, I was initially skeptical because the SDK files define int, double, and String as abstract classes, but after playing around a bunch this morning I've convinced myself that this is an implementation detail that's invisible to the user if they don't go digging into the SDK source files).

I've re-worked this PR to consistently use "instance satisfying type" instead of "instance of" in the cases where it's important to include subtypes, and I've changed this text to define "instance satisfying type" appropriately.

I might follow up with a PR to change the few places in the spec where I think we're accidentally using the "subtype" meaning of "instance of".

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After digging through the dartLangSpec.tex, I discovered that the terminology we use fairly consistently in the spec is "instance implementing type T" rather than "instance satisfying type T", so I've made a PR to switch this file to use the same convention: #3918

_Some literature uses a different definition of "instance", saying that a value
`v` is an instance of a type `T` only if the runtime type of `v` is
__precisely__ `T`. We adopt the "subtype" definition for consistency with other
parts of the language spec._
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... Well, shucks. 😭

- `m_2` completes with the value `o_2`. _Expression soundness follows from
the fact that `o_2` is an instance of `U` and `U <: T_2`._
- Otherwise, if `T_1` is an interface type that contains a method called `call`
with type `U`, and `U <: T`, then let `m_2` be `m_1.call`. _Since `m_1.call`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here I'd introduce a @CALL_TEAROFF(m_1) too, so that any elaborated expression is either an expression of the source program, or it's a new form.
That avoids having to answer whether m_1.call is elaborated or not.

The phrase "Since m_1.call has static type ..." is undefined, since the term has neither been type inferred (out was not in the source program so it was never recursed on, and it's not an unelaborated expression to begin with) nor been assigned a static type explicitly, like we've done for the other @FOO forms.

We'd have to explicitly say that the term has that static type.
Then I'd prefer to introduce a new form for it too.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting. I’m not sure how I feel about this.

On the one hand, using ordinary Dart syntax for elaborated expressions is nice because it allows readers familiar with Dart (who, let’s be honest, constitute my entire target audience) to understand what’s going on more quickly, and allows me to focus my specification efforts on the details of type inference rather than trying to rigorously define the semantics of everything (which I think was one of the drawbacks of my previous approach).

On the other hand, using new syntax for elaborated expressions has the advantage that it’s much more obvious to the reader when we are talking about an elaborated expression vs. an unelaborated expression.

As to your comment that "Since m_1.call has static type ..." is undefined, my mental model in this PR is that every elaborated expression has a well-defined static type; when the elaborated expression uses standard Dart syntax, the static type of the expression is defined using the static typing rules in dartLangSpec.tex; when it uses one of the new @ constructs I’ve introduced, the static type of the expression is defined in the section that defines all of those constructs ("New operations allowed in elaborated expressions"). To put it another way, since elaborated expressions are based on standard Dart, with the addition of some new constructs and the prohibition of some old ones, they inherit the static typing rules defined in dartLangSpec.tex. Does that seem like a reasonable approach to you? If so, is there something I could do that would help clarify that’s the approach I’m taking?

Note that this is a change from my previous mental model (prior to this CL), which was to consider the type inference process for an expression to produce two outputs: (1) an “expression artifact”, which is defined only by a prose description of its runtime semantics and therefore has no static type per se, and (2) a static type. This is actually a far better match for what actually happens in the CFE (expression artifacts are kernel Expressions, and although there is an Expression.getStaticType method in the kernel, Johnni has told me that it’s not always correct, and the CFE mostly disregards it in favor of its own independently-calculated static type). But it made the spec a lot more verbose, and I think it contributed to confusion about whether I was trying to make subsumption explicit in the language of elaborated expressions.

I’m going to leave this as is for now, but I’m working on a follow-up commit in which I add specifications for things like method invocations; I’ll experiment with your proposed invariant (any elaborated expression is either an expression of the source program, or it's a new form) in that PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm OK with keeping m_1.call as the elaborated expression produced by type inference. I'd like to highlight the .call as being synthetic in some way, but if we can do that typographically it's probably fine.

However, "every elaborated expression has a well-defined static type" is the end result of the type inference algorithm, which we are in the middle of.
It's (I think) a recursive/inductive algorithm on expressions, where type is inferred on subexpressions first, resulting in elaborated expressions with static types (for the elaborated expression and every elaborated subexpression), then those elaborated expressions and static types are used to infer the elaborated expression and static type of the parent expression, maybe filling in missing type arguments and wrapping it in coercions. Maybe even some desugaring using @..., although I'd prefer to avoid it.

So "when the elaborated expression uses standard Dart syntax" does not apply to m_1.call, because m_1 is an elaborated expression which may not use standard Dart syntax.
The m_1.call is not the result of a recursive call to type inference, so it hasn't been assigned a static type by that, it's a completely new elaborated expression which we have willed into existence from nothing, and it has no inherent static type.

We definitely don't want to do type inference on m_1.call, doing inference on m_1 recursively (now in receiver position, if it's even a valid unelaborated expression any more).

That is: a recursive invocation of type inference returns an elaborated expression where every subexpression has been assigned a static type. The current invocation needs to do the do the same, so when it returns an alaborated expression, that, and all subexpressions, must also have been assigned static types. And someone needs to do that assigning - a static type is not an inherent property of an expression, it's the explicit result of running inference on the expression, and here we are implementing the inference which assigns the static type.

The right thing to do is to say that:

  • We introduce an new elaborated expression m_1.call. This follows the grammatical rules for elaborated expressions (which is the grammar of expressions where <primary> has been extended with the new syntactic forms).
  • We immediately assign this m_1.call the static type U (the function type of the call method of T_1, the static type of m_1, which inductively has a static type since it's an existing elaborated expression).
  • Then m_2 is m_1.

That is, when we introduce a new elaborated expression, which is not the result of running type inference on an unelaborated expression, then we also have to assign its static type. (Then we can still argue that that static type is sound.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm OK with keeping m_1.call as the elaborated expression produced by type inference. I'd like to highlight the .call as being synthetic in some way, but if we can do that typographically it's probably fine.

I'd like to push back on this. What would we be trying to communicate by highlighting that .call is synthetic? The fact that .call doesn't appear in the user's source code in this case is obvious. Are we trying to highlight that coercion has inserted an operation? Again, I think that's obvious.

However, "every elaborated expression has a well-defined static type" is the end result of the type inference algorithm, which we are in the middle of. It's (I think) a recursive/inductive algorithm on expressions, where type is inferred on subexpressions first, resulting in elaborated expressions with static types (for the elaborated expression and every elaborated subexpression), then those elaborated expressions and static types are used to infer the elaborated expression and static type of the parent expression, maybe filling in missing type arguments and wrapping it in coercions. Maybe even some desugaring using @..., although I'd prefer to avoid it.

Ok, you have a good point. I think I was trying to avoid scope creep in this document by letting the rules for static typing be largely defined elsewhere, but it's really not panning out that way, is it? Not only do I have to define static type for the new constructs I'm introducing, but I'm also breaking my own principle and introducing pointless differences between the spec and the implementations (because the implementations do compute static types as a by-product of the type inference process).

I've re-worked the PR so that every rule for expression inference explicitly specifies the static type of the resulting elaborated expression, and I've clarified that the rules in this document for determining the static type of an expression are meant to supersede the rules currently in dartLangSpec.tex.

is there a coercion available, so it's a type error._
- Otherwise, there is a compile-time error. _We have an expression of type `T_1`
in a situation that requires `T`, which isn't a supertype, nor is there a
coercion available, so it's a type error._
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this work for assigning to a promoted variable?
That is: the error being part of the coercion, not something that is sometimes applied after coercion that didn't promote (or didn't promote enough).

Say we promote a variable from Function to int Function(int).
Then we assign to it:

  • an String Function(String) statically typed as dynamic. Here I think we will try to cast to int Function(int), and fail at runtime, and that's OK.
  • A callable object with a call method with type String Function(String). Here I think we currently specify that we do .call insertion for any function context type, whether it's a supertype or not, then we should assign that function to the variable as a demoting assignment.

In the second case, it's not an error for the coercion to fail to reach the target type. The coercion happens anyway, then it may or may not be an error if the resulting static type can be used in the context.

(That's also a way to combine instantiation and call-tearoff, by doing the tearoff first, and then coercing the result of that with instantiation.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this should work ok. I haven’t written the spec text for it yet, but what we do in the case of an assignment to a promoted local variable is:

  • Infer the RHS of the assignment using the promoted type of the local variable as the context.
  • Coerce the resulting elaborated expression to the (unpromoted) static type of the local variable.
  • If the static type of the coerced elaborated expression is not a subtype of the variable’s promoted type, the variable is demoted.

So in your second example, where we promote a local variable from Function to int Function(int), and then assign a callable object with a call method with type String Function(String), the target type for the coercion is Function, so the call tearoff happens, and the coerced expression has static type String Function(String), and the variable gets demoted.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Super. That also means that my idea of just putting a "try to coerce" step in between any two inference steps won't work, because the context type for inference isn't always the same as the context type for coercion.
(But I guess we could pass both to the coercion step, which then recurses with one on the inference, and then coerces the result to the other.)

expression soundness, `o_1` will be an instance of the type `bool`._
- If `e` is of the form `e_1 && e_2`, let `m` be `m_1 && m_2`. _It is valid to
form this elaborated expression because the static type of `m_1` and `m_2` are
guaranteed to be a subtype of `bool`._
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I'd say to coerce that to the context type, but we don't allow implicit tearoff of extension call methods, right?)

(Complete side track: should we let the static type of the expression be UP of the static types of the m_1 and m_2 elaborated expressions? If both are the same extension type implementing bool, we could let the result have that type too!)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You’re right, we don’t do implicit tearoff of extension call methods, so there’s no coercion that would change bool to any other type.

Also, since my goal here is to be descriptive of what the implementations currently do, I don’t want to specify that we coerce to the context "type" because the implementations don’t try to coerce logical boolean expressions to the context. (Indeed, they never really try to coerce to the context per se; it just looks like they do because coercions always happen at assignability check points, and when there's an assignability check, the context for the subexpression is usually the same as the type used in the assignability check).

As to your side-track, that’s a really interesting thought. I’m torn between a desire to implement an interesting language and a desire to write fewer crazy tests 😃

- Let `T_2` be `flatten(T_1)`.

- Let `m_2` be `@LET(T_1 v = m_1 in v is Future<T_2> ? v :
Future<T_2>.value(@PROMOTED_TYPE<T_2>(v)))`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here too I'd introduce a @FUTURE_VALUE<T_2>(v), not just for the usual reasons, but also because I don't want the specification to say which exact constructor must be used, as long as it creates a future of the correct type.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it’s also important that it creates a future with the correct semantics. Specifying this as Future<T_2>.value(@PROMOTED_TYPE<T_2>(v)) captures both the static type we want and the runtime semantics we want, so I think it would be a high bar to come up with something better. As to whether we want to specify the exact constructor that must be used, I think it goes without saying that the implementation is always welcome to replace one elaborated expression with another one that has equivalent semantics (that’s what optimization is all about, after all). I don't think we need to invent a new piece of syntax here to grant extra permission for optimization in this specific case.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I vaguely remember someone filing an issue about some backend not using Future.value to create the future (maybe they use _Future.immediateValue instead), because there was a timing difference between the two.

I'd be much happier if we specified this abstractly as introducing an object implementing Future<T_2> which completes with the value v at some later point. Saying that it has to call a specific constructor, one which has extra overhead because it can also accept a Future<T2> or null as argument, and has to check for those, even though we know we'll only call with a T2 instance.

Can we at least say that it calls Future<T_2>._value(v) instead, which is a (hypothetical) constructor with signature Future<T2> Function(T2 value) that creates a future completed with v?
(When nobody can find that constructor, we'll just say it's platform specific.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, looking at the implementation again, the CFE doesn't actually insert an explicit call to Future.value. It simply sets a flag which is documented as telling the back-end to insert an explicit call to Future.value.

Silly me for trusting the documentation comment, though, because digging further into the implementations, they actually don't all do precisely the same thing:

  • In dart2js, a call is made to a private method called _wrapAwaitedExpression (in sdk/lib/_internal/js_runtime/lib/async_patch.dart), which calls _Future.value.
  • In dev_compiler, a call is made to a private method called _awaitWithTypeCheck (in sdk/lib/_internal/js_dev_runtime/patch/async_patch.dart), which also calls calls _Future.value.
  • In wasm, a call is made to a private method called _awaitHelperWithTypeCheck (in sdk/lib/_internal/wasm/lib/async_patch.dart), which is implemented in terms of lower level primitives like scheduleMicrotask.
  • In the vm, the necessary logic is inlined directly into the method using the lower level primitive AwaitWithTypeCheck.

I've reworked this section. Rather than refer to Future<T_2>.value or create a new primitive @FUTURE_VALUE, I'm introducing a new primitive called @AWAIT_WITH_TYPE_CHECK, which encapsulates the behavior of _wrapAwaitedExpression, _awaitWithTypeCheck, etc.

establish that `v is Future<T_2>` always evaluates to `true`, in which case
`m_2` can be optimized to `@PROMOTED_TYPE<Future<T_2>>(m_1)`._

- _For soundness, we must prove that whenever `@PROMOTED_TYPE<T_2>(v)`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should prove, once and for all, that T <: FutureOr<flatten(T)> for all types T.

Then here, on the is! T_2 branch, with T_2 = flatten(T), you'd know that T <: T_2.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I think I understand what you’re hinting at here. Fleshing out some details, you’re saying that if we know that T <: FutureOr<flatten(T)> for all T, we can substitute T_1 for T and get that T_1 <: FutureOr<flatten(T_1)>, or equivalently, T_1 <: FutureOr<T_2>. In both branches, we know that v is an instance of T_1, therefore it must be an instance of FutureOr<T_2>. Clearly, then, there are two cases: v must either be an instance of Future<T_2> or an instance of T_2. The is Future<T_2> handles the first case, so obviously in the second case, v must be an instance of T_2.

I like it. It makes the motivation for this elaboration much clearer to me.

Since this suggestion requires separately proving that T <: FutureOr<flatten(T)> for all T, I’m going to postpone it to a separate PR.

Copy link
Member

@lrhn lrhn Jun 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current definition of flatten is ... not easy to work with (it's defined in terms of "derives a future type" which is a separate function, which means that the goals of flatten aren't obvious in "derives a future type", so it takes more proof to get from A to B. I've done it before, and I can probably do it again.

I've also written more useful, but equivalent, specifications. I prefer my recursive functions to do their own recursing, it's the only way I can convince myself that I'm covering all the cases.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be addressed by #3920.

@lrhn
Copy link
Member

lrhn commented Jun 11, 2024

I really like that an elaborated expression is a district concept from the unelaborated expression, and that every elaborated expression is either the same structure as an unelaborated source expression, a new fresh variable, or a new syntactic form. (So I want to make that completely true, and not introduce something that looks like source code, not which isn't. Maybe that's too much, but I think it would be nice. We could possibly just allow m_1 .to string(), with a typographical distinction between source code and synthetic code. We still have to assign a static type to that synthetic code explicitly.)

@lrhn
Copy link
Member

lrhn commented Jun 12, 2024

it can diverge, throw an exception, or evaluate to a value

I've not needed to explicitly say that something diverges. It's usually been enough to say what happens if it completes (with a value) or it throws (an error object and a stack trace) - which is the current terminology.
If it does neither, that's indistinguishable from not doing either yet.

@stereotype441
Copy link
Member Author

I really like that an elaborated expression is a district concept from the unelaborated expression, and that every elaborated expression is either the same structure as an unelaborated source expression, a new fresh variable, or a new syntactic form. (So I want to make that completely true, and not introduce something that looks like source code, not which isn't. Maybe that's too much, but I think it would be nice. We could possibly just allow m_1 **.to string()**, with a typographical distinction between source code and synthetic code. We still have to assign a static type to that synthetic code explicitly.)

Yeah, like I said in response to one of your other comments, I’m going to play with this more in a follow-up PR. I’m happy to talk more about this as I continue fleshing out the spec.

@stereotype441
Copy link
Member Author

it can diverge, throw an exception, or evaluate to a value

I've not needed to explicitly say that something diverges. It's usually been enough to say what happens if it completes (with a value) or it throws (an error object and a stack trace) - which is the current terminology. If it does neither, that's indistinguishable from not doing either yet.

Haha, the terminology I'm using in this PR was suggested by @leafpetersen in PR 3803. I don’t have a strong preference; maybe the two of you should bikeshed about it 😃

@stereotype441 stereotype441 requested a review from lrhn June 12, 2024 20:48
@lrhn
Copy link
Member

lrhn commented Jun 13, 2024

Haha, the terminology I'm using in this PR

The currently used terminology for expression evaluation in the spec is:

Evaluation of an expression either produces an object or it
throws an exception object and an associated stack trace.
In the former case, we also say that the expression evaluates to an object.

In practice, I think we always use "evaluates to an object" or "evalautes to a value", never "produces".

The word "complete" is used for execution of statements which can complete in the following ways:

  • Normally (fall through to the next statement)
  • Throw
  • Return (with or without a value)
  • Break (with or without a label)
  • Continue (with or without a label)

Using the same words gives you some benefits. It's written just once that while defining the evaluation
of an expression or the execution of a statement, which depends on evaluating some sub-expression, then
unless otherwise stated if the sub-expression evaluation throws, then so does the evaluation of the
containing expression or statement with the same error and stack trace. Similarly, unless otherwise stated, if execution of a sub-statement does not complete normally, then the containing statement completes in the same way.
That means we only really have to handle a sub-expression or statement throwing in the try/catch semantics, and at the end of functions, everywhere else we can ignore it.
Similarly, breaking or continuing can be ignored unless executing a loop, switch or labeled expression, it just flows upwards in the execution semantics.
(And we have specified yield of a cancelled async* execution to complete by returning without a value, we didn't need any new semantics for it.)

Them's the official words :)

We now consistently use "`v` is an instance of `T`" for the situation
where the runtime type of `v` is _exactly_ `T`, and "`v` is an
instance satisfying `T`" for the situation where the runtime type of
`v` is a subtype of `T`.

See discussion at
#3850 (comment).
Previously I was leaning on the definition of static types in
`dartLangSpec.tex`, but this was (a) leading to confusion, and (b) not
in alignment with the way the implementations actually compute static
types.
We now use a primitive `@AWAIT_WITH_TYPE_CHECK`, which captures the
logic in dart2js's `_wrapAwaitedExpression`, dev_compiler's
`_awaitWithTypeCheck`, wasm's `_awaitHelperWithTypeCheck`, and the
VM's `AwaitWithTypeCheck`.
@stereotype441
Copy link
Member Author

Haha, the terminology I'm using in this PR

The currently used terminology for expression evaluation in the spec is:

Evaluation of an expression either produces an object or it
throws an exception object and an associated stack trace.
In the former case, we also say that the expression evaluates to an object.

In practice, I think we always use "evaluates to an object" or "evalautes to a value", never "produces".

The word "complete" is used for execution of statements which can complete in the following ways:

  • Normally (fall through to the next statement)
  • Throw
  • Return (with or without a value)
  • Break (with or without a label)
  • Continue (with or without a label)

Using the same words gives you some benefits. It's written just once that while defining the evaluation of an expression or the execution of a statement, which depends on evaluating some sub-expression, then unless otherwise stated if the sub-expression evaluation throws, then so does the evaluation of the containing expression or statement with the same error and stack trace. Similarly, unless otherwise stated, if execution of a sub-statement does not complete normally, then the containing statement completes in the same way. That means we only really have to handle a sub-expression or statement throwing in the try/catch semantics, and at the end of functions, everywhere else we can ignore it. Similarly, breaking or continuing can be ignored unless executing a loop, switch or labeled expression, it just flows upwards in the execution semantics. (And we have specified yield of a cancelled async* execution to complete by returning without a value, we didn't need any new semantics for it.)

Them's the official words :)

Ok, cool. Then I think the only place where I diverge (no pun intended) from this terminolgy is in my mentioning of the possibility that evaluation of an expression might diverge. I get what you're saying that it's indistinguishable from an expression that simply hasn't finished evaluating yet, but I would feel weird removing it, because if I remove it, it sounds like I'm saying that every elaborated expression will either evaluate to a value or throw an exception, and that's simply not true; expressions containing infinite loops definitely won't ever do either of those two things.

Copy link
Member

@leafpetersen leafpetersen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks really great! I really like this presentation.


Expression inference is a recursive process of elaborating an expression in Dart
source code, transforming it into a form in which all types and type coercions
are explicit. An expression that has not yet undergone type inference is known
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"known as" -> "referred to as" maybe?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

source code, transforming it into a form in which all types and type coercions
are explicit. An expression that has not yet undergone type inference is known
as an _unelaborated expression_, and an expression that has completed type
inference is known as an _elaborated expression_.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ibid

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

_Often, an expression's context can be understood as the static type the
expression must have (or be coercible to) in order to avoid a compile-time
error. For example, in the statement `num n = f();`, the result of type
inferring `f()` needs to be either a subtype of `num` (or a type that's
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"either X (or Y)" reads oddly to me, perhaps choose "either X or Y" or else "X (or Y)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, thanks. I went with "a subtype of num (or a type that's coercible to num)" to be parallel with "a static type the expression must have (or be coercible to)".

types in this document should be understood to supersede the specification of
static types in the existing Dart language specification._

_Informally, we may sometimes speak of the static type of an unelaborated
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This paragraph left me a little uneasy/confused, since it's saying "sometimes we informally talk about the static type of an unelaborated expressione, and when we do, we mean the static type of the elaborated...." but also saying "but actually the static type of an expression e can't be talked about independently of context".

I think maybe what you are saying here is something like the following: "For some expressions e, elaboration does not depend on the context K: that is, for any K, elaborating e with respect to K will produce the same m with the same static type T. For such expressions, we sometimes informally refer to the static type of e as a shorthand for the T that results from elaborating e to some m in some context K".

Is that right? Or do you mean to have a K in the first part as well?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I get why this is confusing. Let me try to explain what I'm trying to achieve here, and then maybe you can help me decide how to achieve it (or whether it's even worth trying).

What's happening in my mind is that I've recently introduced a new concept to the reader: the idea of an elaborated expression. To distinguish this new concept from the old thing we used to mean by "expression" in the spec, I'm going to refer to the old thing as an "unelaborated" expression. So everywhere the existing Dart language spec talks about expressions, that should be understood to mean "unelaborated expression". Elaborated expressions are the new thing.

And now I've just told the reader that static types (which the existing Dart language spec talks about a lot) are a property of elaborated expressions. They're not a property of unelaborated expressions. I think that's a really useful thing to do, in this PR, because it keeps me honest: I never want to try to make a claim about the static type of an expression before the type inference process has finished with it, and since elaborated expressions are the outputs of the type inference process, restricting myself to talking about the static types of elaborated expressions achieves that.

So what should the reader make of all the other spec documents that pre-dated this PR? Those documents just talk about the static types of "expressions". I'm implicitly telling the reader that when those older documents say "expression" they mean "unelaborated expression". But I'm also explicitly telling them that "static type of an unelaborated expression" isn't a meaningful thing.

So, when I said:

Informally, we may sometimes speak of the static type of an unelaborated expression e; when we do, what is meant by this is the static type of the elaborated expression m that results from performing expression inference on e.

What I was trying to say was: hey reader, chill out. If you see an old spec document that says "static type of expression e", just pretend it says "static type of whatever e elaborates to", and things will work out.

But I don't really want to just leave it at that, because I also want to encourage us to try to start being more careful about our language with this sort of thing in the future.

So when I followed that with:

Note in particular that a given expression might have a different static type at different points in the code, since the behavior of expression inference depends on the context K, as well as the flow analysis state.

What I was trying to say was: but in the future let's try to be careful not to get too sloppy with this. Let's try not to say "the static type of the expression 5 is int", because sometimes the context applied to 5 causes it to elaborate to @DOUBLE(5), which of course has a static type of double. Similarly, if we say "the static type of v is Object", that can be misleading, because maybe the declared type of v is indeed Object, but a particular expression v might get elaborated to @PROMOTED_TYPE<int>(v) (because in the flow analysis state at the point where v appears, it's promoted).

Does that make any more sense? Do you think any of this is worth mentioning? Do you have any suggestions about how to condense this to something spec-worthy?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha, this makes sense. I think I might just tack on something like the following "...that results from performing expression inference on e in some context K, where K depends on where e appears in the broader program.". And then continuing with your "Note in particular.." which expands on the bit added to the previous sentence. WDYT?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like it, thanks! So updated!

type at different points in the code, since the behavior of expression inference
depends on the context `K`, as well as the flow analysis state._

An invariant of expression inference, known as _soundness_, is that when an
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

invariant -> property maybe?

Peeking ahead, I don't think the cases you've covered so far are broad enough for me to be sure which direction you're taking this, so let me just throw out a couple of thoughts for consideration as you build this out.

In general, you have some choices to make about how to structure this (I realize you're being informal, so in some sense it doesn't really matter, but at least I'll throw this out as food for thought).

  • You could choose to define elaboration such that if e -> m : T, then m is guaranteed to be a thing which can be run and will produce a value which is a subtype of T.
    • If you make this choice, then elaboration must be a partial function or else extend it to explicitly return an "error case" (e.g. you have to say what happens when you elaborateString x = 3).
    • You also will in principle have to do a lot of error checking that isn't really a part of inference (e.g. "the subclass graph is not cyclical")
  • You could choose to define elaboration such that if e -> m : T, and errorFree(e, T), then m is etc
    • Given this choice, you may or may not be able to make elaboration a total function (maybe String x = 3 elaborates fine, it just gets rejected later)
    • If you're taking this approach, then you probably want to say something fuzzy here about "and some other conditions hold"

To be completely formal, there's a lot of detail about what order various invariants need to be established. For example, inference relies on subtype matching, but it's not clear that subtype matching is well-defined if the types in question are not known to be well-formed (i.e. don't have undefined type variables, or cyclic class hierarchies, etc). Obviously, we're not trying to be completely formal here, but worth maybe nodding in this direction?

Given all this, I might be inclined to broaden what you're saying slightly to note that 1) in addition to filling in missing types, elaboration is intended to reject programs which cannot be guaranteed to be sound, and 2) there may be static checks required to ensure soundness that are not specified here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

invariant -> property maybe?

Good point. Done.

Peeking ahead, I don't think the cases you've covered so far are broad enough for me to be sure which direction you're taking this, so let me just throw out a couple of thoughts for consideration as you build this out.

In general, you have some choices to make about how to structure this (I realize you're being informal, so in some sense it doesn't really matter, but at least I'll throw this out as food for thought).

  • You could choose to define elaboration such that if e -> m : T, then m is guaranteed to be a thing which can be run and will produce a value which is a subtype of T.

    • If you make this choice, then elaboration must be a partial function or else extend it to explicitly return an "error case" (e.g. you have to say what happens when you elaborateString x = 3).
    • You also will in principle have to do a lot of error checking that isn't really a part of inference (e.g. "the subclass graph is not cyclical")
  • You could choose to define elaboration such that if e -> m : T, and errorFree(e, T), then m is etc

    • Given this choice, you may or may not be able to make elaboration a total function (maybe String x = 3 elaborates fine, it just gets rejected later)
    • If you're taking this approach, then you probably want to say something fuzzy here about "and some other conditions hold"

To be completely formal, there's a lot of detail about what order various invariants need to be established. For example, inference relies on subtype matching, but it's not clear that subtype matching is well-defined if the types in question are not known to be well-formed (i.e. don't have undefined type variables, or cyclic class hierarchies, etc). Obviously, we're not trying to be completely formal here, but worth maybe nodding in this direction?

Given all this, I might be inclined to broaden what you're saying slightly to note that 1) in addition to filling in missing types, elaboration is intended to reject programs which cannot be guaranteed to be sound, and 2) there may be static checks required to ensure soundness that are not specified here.

I like this idea. I need to spend some time thinking about where I want to add it, so I'm going to postpone it to a follow-up PR.


- Let `m_1` be the result of performing expression type inference on `e_1`, in
- Let `m_1` be the result of performing expression inference on `e_1`, in
context `_`, and then coercing the result to type `Object`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know whether it was something that we discussed, but it is definitely from my perspective the correct decision. In general the context is intended to capture "what this thing should be", and what this thing should be is a subtype of Object, not a subtype of Object?.

`Future<T_2>`, in which case `m_2` can be optimized to
`@PROMOTED_TYPE<Future<T_2>>(m_1)`._

- _For soundness, we must prove that whenever
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for clarity, I would be slightly more explicit here. Something like:

For soundness, we must prove that @AWAIT_WITH_TYPE_CHECK<T_2>(m_1) always evaluates to a value which is an instance satisfying Future<T_2. By the definition of @AWAIT_WITH_TYPE_CHECK, there are two cases to consider.

First, if m_1 evaluates to v_1 and v_1 is an instance of Future<T_2> then we are done.

Second, if ....

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds reasonable. As I mentioned earlier, this is going to get easier once we have proven that T <: FutureOr<flatten(T)> for all types T, which I intend to do in a follow-up PR. I'll address this comment in that follow-up PR.

For now I’ve added a short TODO as a reminder.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#3920 deletes this text.

soundness by assuming that `v` is an instance satisfying type `T_1` and not
an instance satisfying type `Future<T_2>`, and then considering two cases:_

- _If the runtime value of `v` is `null`, then by soundness, `T_1` must be
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically no - it could be an extension type which erase to one of those things.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I’ll address this in the follow-up PR as well.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#3920 deletes this text.

type `T_1`, but not an instance satisfying type `Future<T_2>`, then `v` is
an instance satisfying type `T_2`._

- _Substituting in the definition of `T_2`, we need to show that if `v` is a
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fishythefish this sort of thing is the kind of coherence property that I can imagine pulling out as a lemma to be validated. "Lemma: If v has static type T and runtime type S where S is not Null then either S <: flatten(T) or S <: Future<flatten<T>>.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the "not Null" restriction important? If S is Null (i.e. v is null), then T must be nullable. Therefore, flatten(T) is nullable (this lemma can be proven separately if needed), so S <: flatten(T).

- _If `T_1` is `S?`, then `flatten(T_1)` is `flatten(S)?`. We need to show
that if `v` is a non-null instance satisfying type `S?`, but not an
instance satisfying type `Future<flatten(S)?>`, then `v` is an instance
satisfying type `flatten(S)?`. Assuming `v` is a non-null instance
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fishythefish similarly here: if v has runtime type T and T <: S? then either T is Null or T <: S.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The union type rules are that T <: S? implies T <: S and/or T <: Null.
It happens that today T <: Null implies T = Null or T <: Never, and T <: Never is subsumed by T <: S.
Or actuall, that T = Null up to Norm. T can also be Never? or Null?. The flatten function is defined on non-normalized type terms. (We haven't said otherwise, although I'm not sure it behaves differently for any different norm-equivalent types. I'm also not completely sure it doesn't.)

It's better to just not make the assumption unless it's needed. If T <: Null is enough, then just use that.

@stereotype441 stereotype441 merged commit 69ecf0f into main Jun 17, 2024
3 checks passed
@stereotype441 stereotype441 deleted the inference_spec_7 branch June 17, 2024 18:53
stereotype441 added a commit to stereotype441/language that referenced this pull request Jun 17, 2024
In the code review discussion
dart-lang#3850 (comment),
we agreed to use the terminology "instance satisfying type T" to mean
"instance whose runtime type is T or a subtype thereof", and I made
the necessary changes to
`resources/type-system/inference.md`. However, after digging through
`dartLangSpec.tex`, it appears that the terminology we were already
using in the spec for this concept is "instance implementing type
T". This commit modifies `resources/type-system/inference.md` to
match the spec terminology.

It also fixes two places in the spec where we were incorrectly saying
"instance of type T", but we meant "instance implementing type T".

Furthermore, the spec consistently uses the term "dynamic type of `v`"
for the concept that we informally refer to as the "runtime type of
`v`", so this commit updates `resources/type-system/inference.md` to
follow the spec convention.
lrhn pushed a commit that referenced this pull request Jun 28, 2024
The following changes (mostly based on suggestions by Leaf and Erik)
are included:

- The text is now under a heading called "expression inference" rather
  than "body inference" (which was confusing).

- Instead of trying to describe the output of type inference using a
  vague notion of "expression artifacts" whose runtime behavior is
  described in prose, I'm specifying it using a Dart-like syntax
  called "elaborated expressions". Elaborated expressions look
  generally like Dart expressions, except they have some limitations
  (types are fully specified, and complex constructs like null
  shorting are desugared, and type checks are explicit) and support
  some extra forms (such as "let" expressions).

- I've simplified the terminology I use to describe the three things
  that can happen when an expression is evaluated (it can diverge,
  throw an exception, or evaluate to a value). My previous terminology
  was confusing ("fail to complete at all", "complete normally with a
  value", or "complete with an exception").

- I've re-worked the treatment coercions so that I'm no longer making
  subsumption explicit in the language of elaborated expressions. (I
  never meant to make subsumption explicit; it happened as a mistake
  due to unclear thinking on my part.)

- I broke the "numbers" subsection into separate "integer literals"
  and "double literals" sections. (Previosuly I only specified how
  integer literals are handled.)

- The sketches of soundness proofs have been simplified a bit. I'm no
  longer trying to crisply distinguish "expression soundness", "return
  value soundness", "tear-off soundness", and "future soundness";
  rather, I'm speaking about soundness more informally, as befits an
  informal proof sketch.

- On the other hand, I've clarified precisely what I mean by "is an
  instance of", since the soundness proof sketches don't make any
  sense if your definition of this concept doesn't agree with what's
  in my head. I've also made sure this concept correctly accounts for
  extension types.

- And I've added text explaining why the interpretation of `await`
  expressions is indeed sound. (Previously I had a "TODO" because I
  thought maybe they weren't sound. But they're sound; it just took a
  little extra work to prove it to myself.)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants