-
-
Notifications
You must be signed in to change notification settings - Fork 5.4k
Description
How are you running AnythingLLM?
Docker (local)
What happened?
Hello there,
I started using Anything LLM a few weeks ago, and I must say — it’s very easy to use and set up.
I’m working with different types of documents like PPTs, DOCX, and PDFs. To ensure I upload only clean, structured text, I’ve been converting all these files to Markdown using Docking. Then, I upload the Markdown files into the Anything LLM workspace.
However, when I tested it by asking questions related to the uploaded content, I didn’t get the expected results — especially compared to other LLMs like Google’s Notebook LLM.
To improve the output, I started tweaking the default configuration settings. I increased the chunk size to 10,000 and the overlap size to 400. I also lowered the temperature and increased the context size — but none of these changes seemed to help.
While reviewing the logs, I noticed something strange: the number of chunks created from the document was significantly higher than the number of embeddings. I’m not completely sure, but this might be one of the reasons why the output lacks context.
Would it be possible to look into this and see if it can be fixed? I’d really like to continue using Anything LLM if I can get the desired results
Are there known steps to reproduce?
To reproduce this issue:
Try uploading a few files here and also on Notebook LLM for comparison.
Use the default config:
Vector DB: LanceDB
Embedder: Anything LLM Embedder
Text chunk size: 1000
Overlap size: 300–400
I’m using GPT-4 as the LLM
I tried both query and chat mode with the temperature set to 0.5–0.6.