[FEAT]:  "max tokens" parameter override for Ollama

### How are you running AnythingLLM?

Docker (local)

### What happened?

As I've discovered while using Ollama, the default context window size of 2000 tokens can be too small for modern use cases. While researching possible solutions, I found that the `num_ctx` [parameter is required when making API calls to Ollama](https://github.com/ollama/ollama/issues/5862#issuecomment-2246392867) to specify a different context window size.

I've looked into the "max tokens" feature in AnythingLLM, and it appears that we do have this option available. However, upon further inspection, I realized that this parameter is not being utilized by Ollama. The `max_tokens` parameter is present in the AnythingLLM codebase, but it's never sent to Ollama or used in the API call.

I've also looked at the LangChain tool, [which supports the needed `num_ctx` parameter](https://github.com/langchain-ai/langchain/blob/379803751e5ae40a2aadcb4072dbb2525187dd1f/libs/partners/ollama/langchain_ollama/llms.py#L55) through their integration with Ollama. This further highlights the need for us to implement this feature in AnythingLLM.

I propose adding an option to AnythingLLM that allows advanced users to select from a drop-down list of common context window sizes (e.g., 2K, 4K, 8K, 16K, 32K, 64K, 128K etc.) for Ollama models. This feature would enable users to easily adjust the context window size without requiring custom model files.

**Example Use Case:**

Users who frequently interact with large datasets or need more accurate results from their AI models can benefit from this feature. By offering a range of context window sizes, we can cater to different use cases and improve overall user experience.

**Implementation:**

The implementation would involve modifying the AnythingLLM API call to langchain to include a parameter for selecting the context window size - ending up being bound to the `num_ctx` param in langchain. This parameter would then be passed through to Ollama's API, ensuring that the Ollama API behaves correctly when trying to use larger context windows.

### Are there known steps to reproduce?

Reproducible steps are easy, simply connecting any Ollama model with any max token size and then stream the [Ollama server logs](https://github.com/ollama/ollama/blob/main/docs/troubleshooting.md). 

If the line appears like so:
`Jul 24 11:10:51 PCHOSTNAME ollama[774]: llama_new_context_with_model: n_ctx      = 2048` then this is incorrect, and is defaulting to the 2K token size. this last number should update in the logs if successfully implemented.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[FEAT]: "max tokens" parameter override for Ollama #1954

How are you running AnythingLLM?

What happened?

Are there known steps to reproduce?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[FEAT]: "max tokens" parameter override for Ollama #1954

Description

How are you running AnythingLLM?

What happened?

Are there known steps to reproduce?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions