-
-
Notifications
You must be signed in to change notification settings - Fork 5.4k
Description
How are you running AnythingLLM?
Docker (local)
What happened?
When using some Gemini models (for example gemini-exp-1206, gemini-2.0-flash-thinking-exp, and learnlm-1.5-pro-experimental) with a high "Max Context Snippets" setting, the model appears to truncate or ignore recent chat history. The conversation continues as if new messages are being attached to an older part of the conversation, rather than maintaining the full, recent context.
This issue persists even when the "Document similarity threshold" is set to High, and occurs despite no citations being shown in the UI (no "Show Citations" button visible). The only workaround is reducing the Max Context Snippets value.
gemini-2.0-flash-thinking-exp has a 33k token limit so that may be where the problem starts to appear, gemini-exp-1206 is supposed to have a 2 million token limit and works correctly in Google AI Studio, so that's strange.
Are there known steps to reproduce?
- Choose gemini-exp-1206, learnlm-1.5-pro-experimental, or gemini-2.0-flash-thinking-exp as the model
- Use a high number for Max Context Snippets in the settings (like 200)
- Start a conversation with a lot of back-and-forth messages and plenty of documents
- See that the model's responses begin to ignore recent chat history, instead continuing as if responding to older messages
- You can verify this by:
- Reducing Max Context Snippets and see that the problem resolves
- Setting Document similarity threshold to High so no citations are being shown in the UI