-
-
Notifications
You must be signed in to change notification settings - Fork 5.4k
Open
Labels
Integration RequestRequest for support of a new LLM, Embedder, or Vector databaseRequest for support of a new LLM, Embedder, or Vector databasefeature request
Description
How are you running AnythingLLM?
All versions
What happened?
Most providers nowadays DO provide token usage in the last chunk in streaming mode.
So, you should get token metrics from API responses rather than estimating.
The estimation code is not accurate. Thus, only use it when no other alternatives exist.
Full list of providers that return usage in streaming mode, tested with real API keys and console.log(chunk), in AnythingLLM project:
- openai
- azure openai
- gemini
- HuggingFace
- Ollama
- NovitaAI
- Together AI
- Fireworks AI
- Mistral
- OpenRouter
- Groq
- Cohere
- DeepSeek
- ApiPie
- Bedrock
- anthropic
- xAI
- Perplexity
Usage metrics are available in:
usagefield: OpenAI-like providersx_groq.usagefield: Groqusage_metadata: AWS BedrockusageMetadata: gemini
Some providers require stream_options: { include_usage: true } to return usage.
Are there known steps to reproduce?
handleDefaultStreamResponseV2breaks thefor await (const chunk of stream)loop before the final chunk, which containsusage.measureStreamis called withrunPromptTokenCalculation = truefor a lot of providers that do returnusagein their streaming response
Metadata
Metadata
Assignees
Labels
Integration RequestRequest for support of a new LLM, Embedder, or Vector databaseRequest for support of a new LLM, Embedder, or Vector databasefeature request