[BUG]: Embedded chunk is lesser than the chunk created from documents

### How are you running AnythingLLM?

Docker (local)

### What happened?

Hello there,

I started using Anything LLM a few weeks ago, and I must say — it’s very easy to use and set up.

I’m working with different types of documents like PPTs, DOCX, and PDFs. To ensure I upload only clean, structured text, I’ve been converting all these files to Markdown using Docking. Then, I upload the Markdown files into the Anything LLM workspace.

However, when I tested it by asking questions related to the uploaded content, I didn’t get the expected results — especially compared to other LLMs like Google’s Notebook LLM.

To improve the output, I started tweaking the default configuration settings. I increased the chunk size to 10,000 and the overlap size to 400. I also lowered the temperature and increased the context size — but none of these changes seemed to help.

While reviewing the logs, I noticed something strange: the number of chunks created from the document was significantly higher than the number of embeddings. I’m not completely sure, but this might be one of the reasons why the output lacks context.

Would it be possible to look into this and see if it can be fixed? I’d really like to continue using Anything LLM if I can get the desired results

### Are there known steps to reproduce?

To reproduce this issue:

    Try uploading a few files here and also on Notebook LLM for comparison.

    Use the default config:

        Vector DB: LanceDB

        Embedder: Anything LLM Embedder

        Text chunk size: 1000

        Overlap size: 300–400

    I’m using GPT-4 as the LLM

    I tried both query and chat mode with the temperature set to 0.5–0.6.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[BUG]: Embedded chunk is lesser than the chunk created from documents #4116

How are you running AnythingLLM?

What happened?

Are there known steps to reproduce?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[BUG]: Embedded chunk is lesser than the chunk created from documents #4116

Description

How are you running AnythingLLM?

What happened?

Are there known steps to reproduce?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions