-
Notifications
You must be signed in to change notification settings - Fork 1
RFC 001 - Assistant Prefix #8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
```json | ||
"messages": [ | ||
{ "role": "user", "content": "Tell me a joke." }, | ||
{ "role": "assistant", "content": "Why did the chicken", "prefix": true } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What other examples of per-turn extra attributes exist in the ecosystem already? While this is a clear solution (as prefix
is properly an attribute of the turn it is describing), is this a pattern that can be followed throughout the standard?
Tool calls might actually be another example, but maybe that is another point for another RFC, since it does seem a bit inconsistent and poorly defined.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's a decent pattern for clarifying the intent of a message, especially given Deepseek already uses it, though I definitely want to avoid a "cat came back from Berkeley waving flags" situation :-) We definitely don't want 5 mandatory flags on every message.
The server MAY use non-standard flags (such as `continue_final_message` on the request | ||
body) to decide how to handle a trailing assistant message with an unset prefix flag. | ||
|
||
*Note: This is left implementation-defined for backwards compatibility.* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Open Question: How much do we want to consider backwards compatibility for current vendors? If we start with cut outs for back compat then we are starting with a shaky foundation. If an app is built with this standard in mind, but then a standard-compliant vendor (because of this backwards compatibility) can return undefined behavior, then I don't think the standard is doing its job.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that's a good question. My current feeling is that we should enable backwards compatibility when we can / if it doesn't adversely impact the standard too much, but also be willing to break things when needed. Basically I'm hoping we can do a Python 3.x -> 3.x+1 upgrade here, and not a Python 2 -> Python 3 :-)
That said, I do anticipate we'll need to break some things, like token strings in top_logprobs. I just want to be thoughtful about it. My thought on trailing assistant messages here is that specifically OpenRouter is huge, and they currently support prefixes via an unmarked trailing assistant message. It would be a ton of work on their end to migrate every client over to explicitly marking the prefix, and it seems like a minor cut-out to allow this to be implementation-defined (which is the status quo right now anyways), especially because we can potentially add type checking / warnings in the standard client as a client-level check in the future, or even default to prefix: false instead of unset in the client.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that leaving this ambiguous is pretty unfortunate. This is basically stating that clients which want interoperable outputs will need to always include a trailing prefix: false
or prefix: true
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@domenic I agree that is unfortunate, though if we were to standardize the behavior of a trailing assistant message without a flag, what would the behavior be? I'm actually not sure which would be the better behavior to default to, which may be a point in favor of requiring it to be explicit. (Which we should be able to do with type hints in the client even if we allow it to be ambiguous in the API.) But I'm not sure about this / am open to being convinced otherwise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could also just require it in the API as well, if we were willing to break backwards compatibility. Maybe another middle ground would be upgrading the MAY to SHOULD NOT allow a trailing message without a flag?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that generally for booleans, you want omitting the boolean to mean the same as thatBoolean: false
. So starting a new turn would be best.
But yeah, I see there's tricky constraints here about whether you can afford to be strict when creating a new spec that's hoping to get adoption.
One possible solution (at the cost of complexity) is conformance levels. You can have "strict standard completions compliant" which might be hard for existing providers to meet the standards of, and "loose standard completions compliant" which has a lot more wiggle room and undefined/underdefined behavior to allow existing providers to match it. So e.g. OpenRouter or Anthropic could be loose-compliant until/unless they're willing to take a breaking change and switch the prefix
-omitted behavior to match prefix: false
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the idea of conformance levels!
As you point out, this interacts with the feature to make reasoning blocks a separate message. I think with DeepSeek R1 it was common practise to force a reasoning block by prefilling a tag. I think we may be about to encounter a whole load of issues with the semantics of prefill if it contains special tokens. |
Presumably, if reasoning blocks ate separate messages, you can prefill a partial reasoning block, or even an empty reasoning block. (The semantics of prefilling an empty reasoning block is that the tag gets pre-filled, and nothing else.) |
Oh no! You probably ought to be able to prefill a tool call, too. |
Are alternative such as `fill_prefix as a top level arg considered? For example,
If there's a guiding principle around the design being a superset of OAI etc, then it makes sense to not consider this suggestion. If not, I'm wondering about this With respect to backward compat, this new setup should work well because given the standard design, the library provider, deals with the translation to the eventual request that will hit the provider API. Or, this project could be setup to provide a shim layer of sorts. |
@teltam It's discussed a bit in the alternatives section at the end, but should be expanded. One benefit of making it a message flag is if in the future we introduce |
### Errors | ||
|
||
If assistant prefixes are not supported for the requested model, the server MUST respond | ||
with `invalid_request_error`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally a base layer of this spec would define an error categorization, including a differentiation between invalid requests and unsupported requests. Right now you're just using the undefined term invalid_request_error
, and I guess that's fine to start out, but it relates to some discussions in the group chat about starting with cutting-edge features vs. starting with the foundations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, one of the next RFCs should be laying out at least the basic definitions for these sorts of API primitives.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Issue for errors: #10
The server MAY use non-standard flags (such as `continue_final_message` on the request | ||
body) to decide how to handle a trailing assistant message with an unset prefix flag. | ||
|
||
*Note: This is left implementation-defined for backwards compatibility.* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that leaving this ambiguous is pretty unfortunate. This is basically stating that clients which want interoperable outputs will need to always include a trailing prefix: false
or prefix: true
.
What is the drawback of being more opinionated and not including support for anything but our preferred, standard implementation? I suppose it would mean that any existing provider might need to migrate their existing API to be standard-compliant, potentially causing breaking changes, which is undesirable and might discourage adoption of the standard. Should we specify more clearly that all methods but our preferred method are deprecated in addition to being optional? Is there any other approach we should consider? |
@xlr8harder mostly backwards compatibility, yeah. My feeling is that even if we mandate a certain behavior, providers will be unwilling to break existing clients. (And I don't blame them for that!) My view of this RFC is that it provides an unambiguous way for clients aware of the standard to always get the behavior they want by specifying the value of prefix, which even with the ambiguity of an unset field, is a massive upgrade over the status quo where you just don't know what will happen with a trailing assistant message. If we mandate that leaving the prefix field off has a certain behavior (like always being a new message), and then some providers ignore that for backwards compatibility, I think that's a much worse situation than allowing some wiggle room and then linting on the client side for trailing messages without explicit prefix values. That said, I like @domenic 's idea for specifying conformance levels a lot. Loose conformance would allow wiggle room for unset prefix values, and strict conformance would require an unset prefix to behave like prefix: false, or something like that. Then in the feature flags endpoint, providers would say what conformance level they are. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks nice! Left a couple of nits that could improve clarity
Co-authored-by: Domenic Denicola <d@domenic.me>
## "prefix" on non-trailing assistant messages | ||
|
||
A "prefix" field on a non-trailing assistant message MUST be ignored. The server MAY | ||
emit a warning. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewing this, I'm a bit unsure why we would have this be ignored, instead of generating invalid_request_error. It seems like a pretty clear developer coding error, and catching those earlier is probably good?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, good question. My thought was that people may have code like this:
messages = [
{ "role": "user", "content": "blah blah blah" },
{ "role": "assistant", "content": "Certainly:", "prefix": True },
]
response_str = call_llm(messages)
messages[-1]["content"] = response
messages.append({ "role": "user", "content": "blah blah blah" })
response2_str = call_llm(messages)
And it might be nice to not require them to also clear the prefix flag. But I'm open to changing it if people think it'd be better to make this an error, it's not too onerous I don't think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that's a pretty good reason to keep it as-is, thanks for explaining.
Although, did you check what DeepSeek does?
View Proposal
Proposing a standard for assistant prefixes (prefills). I think this is a good place for Standard Completions to start—it'll let us get practice standardizing behavior across providers, while being relatively simple to implement once accepted.
Please leave feedback comments on the draft!