这是indexloc提供的服务,不要输入任何密码
Skip to content

Gemini context window lookup #2966

@DangerousBerries

Description

@DangerousBerries

How are you running AnythingLLM?

Docker (local)

What happened?

When using some Gemini models (for example gemini-exp-1206, gemini-2.0-flash-thinking-exp, and learnlm-1.5-pro-experimental) with a high "Max Context Snippets" setting, the model appears to truncate or ignore recent chat history. The conversation continues as if new messages are being attached to an older part of the conversation, rather than maintaining the full, recent context.

This issue persists even when the "Document similarity threshold" is set to High, and occurs despite no citations being shown in the UI (no "Show Citations" button visible). The only workaround is reducing the Max Context Snippets value.

gemini-2.0-flash-thinking-exp has a 33k token limit so that may be where the problem starts to appear, gemini-exp-1206 is supposed to have a 2 million token limit and works correctly in Google AI Studio, so that's strange.

Are there known steps to reproduce?

  1. Choose gemini-exp-1206, learnlm-1.5-pro-experimental, or gemini-2.0-flash-thinking-exp as the model
  2. Use a high number for Max Context Snippets in the settings (like 200)
  3. Start a conversation with a lot of back-and-forth messages and plenty of documents
  4. See that the model's responses begin to ignore recent chat history, instead continuing as if responding to older messages
  5. You can verify this by:
    • Reducing Max Context Snippets and see that the problem resolves
    • Setting Document similarity threshold to High so no citations are being shown in the UI

Metadata

Metadata

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions