-
-
Notifications
You must be signed in to change notification settings - Fork 5.4k
Description
How are you running AnythingLLM?
Docker (local)
What happened?
As I've discovered while using Ollama, the default context window size of 2000 tokens can be too small for modern use cases. While researching possible solutions, I found that the num_ctx parameter is required when making API calls to Ollama to specify a different context window size.
I've looked into the "max tokens" feature in AnythingLLM, and it appears that we do have this option available. However, upon further inspection, I realized that this parameter is not being utilized by Ollama. The max_tokens parameter is present in the AnythingLLM codebase, but it's never sent to Ollama or used in the API call.
I've also looked at the LangChain tool, which supports the needed num_ctx parameter through their integration with Ollama. This further highlights the need for us to implement this feature in AnythingLLM.
I propose adding an option to AnythingLLM that allows advanced users to select from a drop-down list of common context window sizes (e.g., 2K, 4K, 8K, 16K, 32K, 64K, 128K etc.) for Ollama models. This feature would enable users to easily adjust the context window size without requiring custom model files.
Example Use Case:
Users who frequently interact with large datasets or need more accurate results from their AI models can benefit from this feature. By offering a range of context window sizes, we can cater to different use cases and improve overall user experience.
Implementation:
The implementation would involve modifying the AnythingLLM API call to langchain to include a parameter for selecting the context window size - ending up being bound to the num_ctx param in langchain. This parameter would then be passed through to Ollama's API, ensuring that the Ollama API behaves correctly when trying to use larger context windows.
Are there known steps to reproduce?
Reproducible steps are easy, simply connecting any Ollama model with any max token size and then stream the Ollama server logs.
If the line appears like so:
Jul 24 11:10:51 PCHOSTNAME ollama[774]: llama_new_context_with_model: n_ctx = 2048 then this is incorrect, and is defaulting to the 2K token size. this last number should update in the logs if successfully implemented.