You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
More complete DMR support (docker#103)
The `dmr` provider now supports:
- Proper context length setup using `max_tokens`
- `temperature`, `top_p`, `frequency_penalty`, `presence_penalty` all
get mapped into the proper runtime flags based on the engine in use (for
now only `llama.cpp` mappings);
- Raw runtime flags to pass to the inference engine in use, via
`provider_opts:runtime_flags`
Configuration example supported with these changes:
```yaml
models:
root:
provider: dmr
model: ai/qwen3:14B-Q6_K
max_tokens: 32768
temperature: 0.7
top_p: 0.95
frequency_penalty: 0.2
presence_penalty: 0.1
provider_opts:
runtime_flags: |
--batch-size 1024
--ubatch-size 512
```
closesdocker#71
Signed-off-by: Christopher Petito <chrisjpetito@gmail.com>
fix bug in cagent new
we were telling the llm to use the v0 models schema instead of v1 (type-> provider)
closesdocker#86
Signed-off-by: Christopher Petito <chrisjpetito@gmail.com>