-
-
Notifications
You must be signed in to change notification settings - Fork 5.4k
Description
How are you running AnythingLLM?
Docker (remote machine)
What happened?
I have started a new document embedding run and wanted to have a look on the amchine running ollama, how it uses cores and also check which parameters have been used to start ollama process.
Then I saw this:
/usr/lib/ollama/runners/cpu_avx2/ollama_llama_server runner --model /root/.ollama/models/blobs/sha256-daec91ffb5dd0c27411bd71f29932917c49cf529a641d0168496c3a501e3062c --ctx-size 2048 --batch-size 512 --threads 64 --no-mmap --parallel 1
And I was surprised to see --ctx-size 2048 !
Because the embedding model I use is bge-m3 and for one it supports larger context windows AND I also explicitly told in the configuration to have "Max Embedding Chunk Length=4096"
And also I let my documents be splitted to have up to 4096 tokens per chunk.
But by having Ollama run with --ctx-size 2048 , it will - to my understanding - not "see" anything from my chunks going beyond 2048 tokens.
QUESTION: So, is this a bug, that AnythingLLM does not run the embedding model with the correct configured size set in "Max Embedding Chunk Length=xxx" (in my case the 4096) ???
Are there known steps to reproduce?
No response