[FEAT]: Ollama `n_ctx` for VRAM allocation and performance on responses

### How are you running AnythingLLM?

Docker (remote machine)

### What happened?

Install Ollama using provided script (linux version, ubuntu 22.04), install AnythingLLM using provided easy script amd docker. Everything runs great however i noticed, that out of 8gb vram on my 5700xt only 74% is reserved no matter what i set in AnythingLLM.
Before you shout at me, im retired plumber. It took me two days to check this out. Give me a brake if I made mistake with config:)

### Are there known steps to reproduce?

In ollama serve, using /set parameter num_ctx 128000 ollama takes all my Vram and close to 22gb ram. 
In ollama serve, using /set parameter num_ctx 11200 ollama takes 99% Vram and responses are much much better.
In ollama serve, using default settings (for newbies, like me:)) only 74% of Vram is reserved. Responses are worse than above.

Looks like AnythingLLM is not forwarding changes of context to ollama. Whatever you set, default llama3.1:latest stays at 1024.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[FEAT]: Ollama `n_ctx` for VRAM allocation and performance on responses #1991

How are you running AnythingLLM?

What happened?

Are there known steps to reproduce?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[FEAT]: Ollama n_ctx for VRAM allocation and performance on responses #1991

Description

How are you running AnythingLLM?

What happened?

Are there known steps to reproduce?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

[FEAT]: Ollama `n_ctx` for VRAM allocation and performance on responses #1991