-
-
Notifications
You must be signed in to change notification settings - Fork 5.4k
Description
How are you running AnythingLLM?
Docker (remote machine)
What happened?
Install Ollama using provided script (linux version, ubuntu 22.04), install AnythingLLM using provided easy script amd docker. Everything runs great however i noticed, that out of 8gb vram on my 5700xt only 74% is reserved no matter what i set in AnythingLLM.
Before you shout at me, im retired plumber. It took me two days to check this out. Give me a brake if I made mistake with config:)
Are there known steps to reproduce?
In ollama serve, using /set parameter num_ctx 128000 ollama takes all my Vram and close to 22gb ram.
In ollama serve, using /set parameter num_ctx 11200 ollama takes 99% Vram and responses are much much better.
In ollama serve, using default settings (for newbies, like me:)) only 74% of Vram is reserved. Responses are worse than above.
Looks like AnythingLLM is not forwarding changes of context to ollama. Whatever you set, default llama3.1:latest stays at 1024.