-
-
Notifications
You must be signed in to change notification settings - Fork 5.4k
Closed
Labels
Description
What would you like to see?
Right now, when you send a prompt that is going to overflow the window of a model
| if (tokenManager.statsFrom(messages) + tokenBuffer < llm.promptWindowLimit()) |
or are pinning documents that will overflow the budget for system prompts
anything-llm/server/utils/chats/stream.js
Line 110 in 42e1d8e
| maxTokens: LLMConnector.limits.system, |
We then begin to truncate the messages. This becomes an issue when the user wishes to pin many many documents and have the history and user prompt be more constrained.
Ideally, the user should not have the system prompt constrained to only 15% (fixed) of the overall window. This limits high-context models such as gemini 1.5 substantially where they can only have 150K tokens of the 1M context.