这是indexloc提供的服务,不要输入任何密码
Skip to content

[BUG]: Embedded chunk is lesser than the chunk created from documents #4116

@vikashrajgupta

Description

@vikashrajgupta

How are you running AnythingLLM?

Docker (local)

What happened?

Hello there,

I started using Anything LLM a few weeks ago, and I must say — it’s very easy to use and set up.

I’m working with different types of documents like PPTs, DOCX, and PDFs. To ensure I upload only clean, structured text, I’ve been converting all these files to Markdown using Docking. Then, I upload the Markdown files into the Anything LLM workspace.

However, when I tested it by asking questions related to the uploaded content, I didn’t get the expected results — especially compared to other LLMs like Google’s Notebook LLM.

To improve the output, I started tweaking the default configuration settings. I increased the chunk size to 10,000 and the overlap size to 400. I also lowered the temperature and increased the context size — but none of these changes seemed to help.

While reviewing the logs, I noticed something strange: the number of chunks created from the document was significantly higher than the number of embeddings. I’m not completely sure, but this might be one of the reasons why the output lacks context.

Would it be possible to look into this and see if it can be fixed? I’d really like to continue using Anything LLM if I can get the desired results

Are there known steps to reproduce?

To reproduce this issue:

Try uploading a few files here and also on Notebook LLM for comparison.

Use the default config:

    Vector DB: LanceDB

    Embedder: Anything LLM Embedder

    Text chunk size: 1000

    Overlap size: 300–400

I’m using GPT-4 as the LLM

I tried both query and chat mode with the temperature set to 0.5–0.6.

Metadata

Metadata

Assignees

Labels

investigatingCore team or maintainer will or is currently looking into this issuepossible bugBug was reported but is not confirmed or is unable to be replicated.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions