这是indexloc提供的服务,不要输入任何密码
Skip to content

[BUG]: Weaviate embeddings persist in vector DB after removal #3958

@Sarmingsteiner

Description

@Sarmingsteiner

How are you running AnythingLLM?

Docker (local)

What happened?

I’ve noticed two related issues when using AnythingLLM with a Weaviate vector store backend:

Deleted documents remain in the vector store

After removing a document from a workspace (and even deleting it entirely via the UI), its embeddings are still active and continue to surface in retrieval/citations.

Expected behavior: Deleting a document from a workspace (or permanently deleting it) should also hide/remove its embeddings from the vector store so it can no longer appear in retrieval results or citation annotations.

Cached embeddings are re-embedded in another workspace

When I embed a document once (and it is tagged as “cached”), then use that same file in a different workspace, Anything LLM re-embeds it from scratch instead of reusing the existing cached embedding.

Expected behavior: If a document’s embedding already exists in the cache (regardless of workspace), AnythingLLM should detect and reuse it rather than re-embedding, saving compute and avoiding duplicates in the vector store.

Both issues lead to stale data being returned and unnecessary re-processing, especially noticeable in multi-workspace setups.

Are there known steps to reproduce?

Configure Anything LLM to use Weaviate as the vector store (e.g., via VECTOR_DB=weaviate in .env).

a. Upload or embed Document X.
b. Confirm that Document X appears in the citation annotation results.
c. Delete Document X from Workspace A via the UI (Remove from workspace → Delete permanently).
d. Perform a retrieval query (e.g., use a prompt that would retrieve Document X).
→ Observe that Document X’s embedding is still returned in the citations.

a. Add an (already embedded) Document to the new Workspace B.
b. Check the logs or network calls to see that AnythingLLM is performing a fresh embed request, even though it was previously “cached” in Workspace A.
c. Query retrievals and observe that there are now duplicate embeddings for Document X.

Metadata

Metadata

Assignees

Labels

investigatingCore team or maintainer will or is currently looking into this issuepossible bugBug was reported but is not confirmed or is unable to be replicated.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions