-
-
Notifications
You must be signed in to change notification settings - Fork 5.4k
Description
What would you like to see?
Dear Maintainers,
I hope this message finds you well. First and foremost, I would like to express my gratitude for your continuous efforts in maintaining and improving Anything LLM. Your work is highly appreciated by the community.
I am writing to request the addition of a feature that would greatly enhance the usability and flexibility of the application, specifically when using Ollama models. Currently, the Ollama models support a "keep_alive" parameter, as documented here on ollama and here in langchain. (I've seen in the source code that you use langchain as a wrapper around the call to ollama which is why I'm providing that link.)
The "keep_alive" parameter is useful for keeping ollama models in memory longer - and as of now, there is no option to configure this parameter directly from the UI.
Problem:
The model always defaults to a 5-minute load in memory only. This behavior can be observed using the "ollama ps" command once you start a chat in Anything LLM. This default setting leads to the model unloading from memory after 5 minutes, which is not ideal for scenarios requiring persistent connections. (Additionally, I've tested the OLLAMA_KEEP_ALIVE environment variable implemented by Ollama does not get respected when using the API/chat endpoint from Anything LLM). I've also tried the curl http://localhost:11434/api/generate -d '{"model": "llama3", "keep_alive": -1}' command (docs) but a new call from AnythingLLM seems to override this setting and default back to 5 mins.
Feature Request:
- Add "keep_alive" Configuration in UI:
- Introduce ani nput field in the ollama-specific UI that allows users to enter a value into the "keep_alive" parameter when configuring Ollama models.
- Optionally, provide a brief description or tooltip explaining the purpose and impact of the "keep_alive" parameter?
Benefits:
- Improved Performance: Allowing users to configure the "keep_alive" parameter can help maintain persistent connections, reducing latency and improving overall performance when making requests with a >5 min break in between.
- Enhanced Flexibility: Users can tailor the behavior of the Ollama models to better suit their specific use cases.
- Resolve Current Limitation: Address the current limitation where the model defaults to a 5-minute memory load.
I believe this feature will be a valuable addition to Anything LLM and will enhance the user experience for many. Thank you for considering this request. If you need any further information or assistance, please do not hesitate to contact me.
Best regards,
Simon