RFC 001 - Assistant Prefix #8

vgel · 2025-06-02T01:52:07Z

Proposing a standard for assistant prefixes (prefills). I think this is a good place for Standard Completions to start—it'll let us get practice standardizing behavior across providers, while being relatively simple to implement once accepted.

Please leave feedback comments on the draft!

eric-tramel · 2025-06-02T02:54:47Z

rfcs/001-assistant-prefix.md

+```json
+"messages": [
+  { "role": "user", "content": "Tell me a joke." },
+  { "role": "assistant", "content": "Why did the chicken", "prefix": true }


What other examples of per-turn extra attributes exist in the ecosystem already? While this is a clear solution (as prefix is properly an attribute of the turn it is describing), is this a pattern that can be followed throughout the standard?

Tool calls might actually be another example, but maybe that is another point for another RFC, since it does seem a bit inconsistent and poorly defined.

I think it's a decent pattern for clarifying the intent of a message, especially given Deepseek already uses it, though I definitely want to avoid a "cat came back from Berkeley waving flags" situation :-) We definitely don't want 5 mandatory flags on every message.

eric-tramel · 2025-06-02T02:57:15Z

rfcs/001-assistant-prefix.md

+The server MAY use non-standard flags (such as `continue_final_message` on the request
+body) to decide how to handle a trailing assistant message with an unset prefix flag.
+
+*Note: This is left implementation-defined for backwards compatibility.*


Open Question: How much do we want to consider backwards compatibility for current vendors? If we start with cut outs for back compat then we are starting with a shaky foundation. If an app is built with this standard in mind, but then a standard-compliant vendor (because of this backwards compatibility) can return undefined behavior, then I don't think the standard is doing its job.

Yeah, that's a good question. My current feeling is that we should enable backwards compatibility when we can / if it doesn't adversely impact the standard too much, but also be willing to break things when needed. Basically I'm hoping we can do a Python 3.x -> 3.x+1 upgrade here, and not a Python 2 -> Python 3 :-)

That said, I do anticipate we'll need to break some things, like token strings in top_logprobs. I just want to be thoughtful about it. My thought on trailing assistant messages here is that specifically OpenRouter is huge, and they currently support prefixes via an unmarked trailing assistant message. It would be a ton of work on their end to migrate every client over to explicitly marking the prefix, and it seems like a minor cut-out to allow this to be implementation-defined (which is the status quo right now anyways), especially because we can potentially add type checking / warnings in the standard client as a client-level check in the future, or even default to prefix: false instead of unset in the client.

I agree that leaving this ambiguous is pretty unfortunate. This is basically stating that clients which want interoperable outputs will need to always include a trailing prefix: false or prefix: true.

@domenic I agree that is unfortunate, though if we were to standardize the behavior of a trailing assistant message without a flag, what would the behavior be? I'm actually not sure which would be the better behavior to default to, which may be a point in favor of requiring it to be explicit. (Which we should be able to do with type hints in the client even if we allow it to be ambiguous in the API.) But I'm not sure about this / am open to being convinced otherwise.

We could also just require it in the API as well, if we were willing to break backwards compatibility. Maybe another middle ground would be upgrading the MAY to SHOULD NOT allow a trailing message without a flag?

I think that generally for booleans, you want omitting the boolean to mean the same as thatBoolean: false. So starting a new turn would be best.

But yeah, I see there's tricky constraints here about whether you can afford to be strict when creating a new spec that's hoping to get adoption.

One possible solution (at the cost of complexity) is conformance levels. You can have "strict standard completions compliant" which might be hard for existing providers to meet the standards of, and "loose standard completions compliant" which has a lot more wiggle room and undefined/underdefined behavior to allow existing providers to match it. So e.g. OpenRouter or Anthropic could be loose-compliant until/unless they're willing to take a breaking change and switch the prefix-omitted behavior to match prefix: false.

I like the idea of conformance levels!

michael-roe · 2025-06-02T09:59:25Z

As you point out, this interacts with the feature to make reasoning blocks a separate message.

I think with DeepSeek R1 it was common practise to force a reasoning block by prefilling a tag.

I think we may be about to encounter a whole load of issues with the semantics of prefill if it contains special tokens.

michael-roe · 2025-06-02T10:01:59Z

Presumably, if reasoning blocks ate separate messages, you can prefill a partial reasoning block, or even an empty reasoning block. (The semantics of prefilling an empty reasoning block is that the tag gets pre-filled, and nothing else.)

michael-roe · 2025-06-02T10:04:21Z

Oh no! You probably ought to be able to prefill a tool call, too.

teltam · 2025-06-02T21:17:46Z

Are alternative such as `fill_prefix as a top level arg considered? For example,

"fill_prefix": true  // or "prefix": true.
"messages": [
    { "role": "user", "content": "Tell me a joke." },
    { "role": "assistant", "content": "Why did the chicken", "prefix": true }
]

If there's a guiding principle around the design being a superset of OAI etc, then it makes sense to not consider this suggestion.

If not, I'm wondering about this fill_prefix example because it's more intuitive. The prefix is tied to the response from the model, and not specifically anything about the messages array, or the shape it takes (ie, using role and content).

With respect to backward compat, this new setup should work well because given the standard design, the library provider, deals with the translation to the eventual request that will hit the provider API. Or, this project could be setup to provide a shim layer of sorts.

vgel · 2025-06-03T00:27:56Z

@teltam It's discussed a bit in the alternatives section at the end, but should be expanded. One benefit of making it a message flag is if in the future we introduce "prefix": "reasoning", it'll be possible to combine that with a regular prefix to also prefix the response.

rfcs/001-assistant-prefix.md

domenic · 2025-06-03T06:44:51Z

rfcs/001-assistant-prefix.md

+### Errors
+
+If assistant prefixes are not supported for the requested model, the server MUST respond
+with `invalid_request_error`.


Ideally a base layer of this spec would define an error categorization, including a differentiation between invalid requests and unsupported requests. Right now you're just using the undefined term invalid_request_error, and I guess that's fine to start out, but it relates to some discussions in the group chat about starting with cutting-edge features vs. starting with the foundations.

Agreed, one of the next RFCs should be laying out at least the basic definitions for these sorts of API primitives.

Issue for errors: #10

domenic · 2025-06-03T06:48:11Z

rfcs/001-assistant-prefix.md

+The server MAY use non-standard flags (such as `continue_final_message` on the request
+body) to decide how to handle a trailing assistant message with an unset prefix flag.
+
+*Note: This is left implementation-defined for backwards compatibility.*


I agree that leaving this ambiguous is pretty unfortunate. This is basically stating that clients which want interoperable outputs will need to always include a trailing prefix: false or prefix: true.

xlr8harder · 2025-06-04T04:31:49Z

What is the drawback of being more opinionated and not including support for anything but our preferred, standard implementation?

I suppose it would mean that any existing provider might need to migrate their existing API to be standard-compliant, potentially causing breaking changes, which is undesirable and might discourage adoption of the standard.

Should we specify more clearly that all methods but our preferred method are deprecated in addition to being optional? Is there any other approach we should consider?

vgel · 2025-06-10T20:46:46Z

@xlr8harder mostly backwards compatibility, yeah. My feeling is that even if we mandate a certain behavior, providers will be unwilling to break existing clients. (And I don't blame them for that!) My view of this RFC is that it provides an unambiguous way for clients aware of the standard to always get the behavior they want by specifying the value of prefix, which even with the ambiguity of an unset field, is a massive upgrade over the status quo where you just don't know what will happen with a trailing assistant message. If we mandate that leaving the prefix field off has a certain behavior (like always being a new message), and then some providers ignore that for backwards compatibility, I think that's a much worse situation than allowing some wiggle room and then linting on the client side for trailing messages without explicit prefix values.

That said, I like @domenic 's idea for specifying conformance levels a lot. Loose conformance would allow wiggle room for unset prefix values, and strict conformance would require an unset prefix to behave like prefix: false, or something like that. Then in the feature flags endpoint, providers would say what conformance level they are.

domenic

Looks nice! Left a couple of nits that could improve clarity

rfcs/001-assistant-prefix.md

Co-authored-by: Domenic Denicola <d@domenic.me>

domenic · 2025-06-13T00:30:06Z

rfcs/001-assistant-prefix.md

+## "prefix" on non-trailing assistant messages
+
+A "prefix" field on a non-trailing assistant message MUST be ignored. The server MAY
+emit a warning.


Reviewing this, I'm a bit unsure why we would have this be ignored, instead of generating invalid_request_error. It seems like a pretty clear developer coding error, and catching those earlier is probably good?

Hm, good question. My thought was that people may have code like this:

messages = [ { "role": "user", "content": "blah blah blah" }, { "role": "assistant", "content": "Certainly:", "prefix": True }, ] response_str = call_llm(messages) messages[-1]["content"] = response messages.append({ "role": "user", "content": "blah blah blah" }) response2_str = call_llm(messages)

And it might be nice to not require them to also clear the prefix flag. But I'm open to changing it if people think it'd be better to make this an error, it's not too onerous I don't think.

I think that's a pretty good reason to keep it as-is, thanks for explaining.

Although, did you check what DeepSeek does?

vgel added 2 commits June 1, 2025 18:51

Draft RFC

f75d362

Fix example whitespace

2c65364

eric-tramel reviewed Jun 2, 2025

View reviewed changes

domenic reviewed Jun 3, 2025

View reviewed changes

Remove client language, specify behavior for non-boolean values

2d1f0aa

vgel requested a review from domenic June 10, 2025 20:40

domenic approved these changes Jun 11, 2025

View reviewed changes

rfcs/001-assistant-prefix.md Outdated Show resolved Hide resolved

rfcs/001-assistant-prefix.md Outdated Show resolved Hide resolved

Clarify language

31b8f91

Co-authored-by: Domenic Denicola <d@domenic.me>

domenic reviewed Jun 13, 2025

View reviewed changes

RFC 001 - Assistant Prefix #8

Are you sure you want to change the base?

RFC 001 - Assistant Prefix #8

Uh oh!

Conversation

vgel commented Jun 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vgel Jun 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

michael-roe commented Jun 2, 2025

Uh oh!

michael-roe commented Jun 2, 2025

Uh oh!

michael-roe commented Jun 2, 2025

Uh oh!

teltam commented Jun 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vgel commented Jun 3, 2025

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

xlr8harder commented Jun 4, 2025

Uh oh!

vgel commented Jun 10, 2025

Uh oh!

domenic left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

vgel commented Jun 2, 2025 •

edited

Loading

vgel Jun 2, 2025 •

edited

Loading

teltam commented Jun 2, 2025 •

edited

Loading