[FEAT]: LLM tracing for some providers via `include_usage` prop

### How are you running AnythingLLM?

All versions

### What happened?

Most providers nowadays DO provide token usage in the last chunk in streaming mode. 
So, you should get token metrics from API responses rather than estimating.

The estimation code is not accurate. Thus, only use it when no other alternatives exist.

Full list of providers that return usage in streaming mode, tested with real API keys and `console.log(chunk)`, in AnythingLLM project:  
- openai         
-  azure openai   
-  gemini         
-  HuggingFace    
-  Ollama         
-  NovitaAI       
-  Together AI    
-  Fireworks AI   
-  Mistral        
-  OpenRouter     
-  Groq           
-  Cohere         
-  DeepSeek       
-  ApiPie         
-  Bedrock        
-  anthropic      
-  xAI            
-  Perplexity     

Usage metrics are available in:
- `usage` field: OpenAI-like providers
- `x_groq.usage` field: Groq
- `usage_metadata`: AWS Bedrock
- `usageMetadata`: gemini

Some providers require `stream_options: { include_usage: true }` to return usage.

### Are there known steps to reproduce?

- `handleDefaultStreamResponseV2` breaks the `for await (const chunk of stream)` loop before the final chunk, which contains `usage`. 
- `measureStream` is called with `runPromptTokenCalculation = true` for a lot of providers that do return `usage` in their streaming response

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[FEAT]: LLM tracing for some providers via `include_usage` prop #2997

How are you running AnythingLLM?

What happened?

Are there known steps to reproduce?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[FEAT]: LLM tracing for some providers via include_usage prop #2997

Description

How are you running AnythingLLM?

What happened?

Are there known steps to reproduce?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

[FEAT]: LLM tracing for some providers via `include_usage` prop #2997