-
-
Notifications
You must be signed in to change notification settings - Fork 5.4k
Description
How are you running AnythingLLM?
Docker (local)
What happened?
I’ve noticed two related issues when using AnythingLLM with a Weaviate vector store backend:
Deleted documents remain in the vector store
After removing a document from a workspace (and even deleting it entirely via the UI), its embeddings are still active and continue to surface in retrieval/citations.
Expected behavior: Deleting a document from a workspace (or permanently deleting it) should also hide/remove its embeddings from the vector store so it can no longer appear in retrieval results or citation annotations.
Cached embeddings are re-embedded in another workspace
When I embed a document once (and it is tagged as “cached”), then use that same file in a different workspace, Anything LLM re-embeds it from scratch instead of reusing the existing cached embedding.
Expected behavior: If a document’s embedding already exists in the cache (regardless of workspace), AnythingLLM should detect and reuse it rather than re-embedding, saving compute and avoiding duplicates in the vector store.
Both issues lead to stale data being returned and unnecessary re-processing, especially noticeable in multi-workspace setups.
Are there known steps to reproduce?
Configure Anything LLM to use Weaviate as the vector store (e.g., via VECTOR_DB=weaviate in .env).
a. Upload or embed Document X.
b. Confirm that Document X appears in the citation annotation results.
c. Delete Document X from Workspace A via the UI (Remove from workspace → Delete permanently).
d. Perform a retrieval query (e.g., use a prompt that would retrieve Document X).
→ Observe that Document X’s embedding is still returned in the citations.
a. Add an (already embedded) Document to the new Workspace B.
b. Check the logs or network calls to see that AnythingLLM is performing a fresh embed request, even though it was previously “cached” in Workspace A.
c. Query retrievals and observe that there are now duplicate embeddings for Document X.