θΏ™ζ˜―indexlocζδΎ›ηš„ζœεŠ‘οΌŒδΈθ¦θΎ“ε…₯任何密码
Skip to content

Conversation

@timothycarambat
Copy link
Member

Pull Request Type

  • ✨ feat
  • πŸ› fix
  • ♻️ refactor
  • πŸ’„ style
  • πŸ”¨ chore
  • πŸ“ docs

Relevant Issues

resolves #594

What is in this change?

  • documentVectors self-sanitize on delete of parent document
  • patch lanceDB not deleting vectors from workspace

RCA:
When adding a new document to a workspace for lanceDB that also created the table, the submissions item would have a different id given to it by the collector that would overwrite the known document id that was stored in the database - making deletion of it impossible.

This only impacted vectors that were totally new to the system (uncached) as cached documents would be able to be properly tracked.

If users have this issue, the easiest full solve is just to delete the workspace and it will fully remove the vectors. They will persist in the DB, but be orphaned which is a negligible impact.

Developer Validations

  • I ran yarn lint from the root of the repo & committed changes
  • Relevant documentation has been updated
  • I have tested my code functionality
  • Docker build succeeds locally

documentVectors self-sanitize on delete of parent document
@timothycarambat timothycarambat changed the title patch lanceDB not deleting vectors from workspace Patch lanceDB not deleting vectors from workspace Jan 29, 2024
@review-agent-prime
Copy link

server/utils/vectorDbProviders/lance/index.js

It's a good practice to provide more context in error messages. This will help in debugging and understanding the error better. Also, consider using a logging library instead of console.error for better logging management.
Create Issue
See the diff
Checkout the fix

    console.error(`Error in addDocumentToNamespace: ${e.message}`);
git fetch origin && git checkout -b ReviewBot/Impro-erssiuh origin/ReviewBot/Impro-erssiuh

server/models/documents.js

Instead of deleting document vectors one by one, consider deleting them in bulk. This will reduce the number of database calls and improve performance.
Create Issue
See the diff
Checkout the fix

    await prisma.document_vectors.deleteMany({
      where: { docId: { in: removals.map(removal => removal.docId) } },
    });
git fetch origin && git checkout -b ReviewBot/Impro-tg8kuzl origin/ReviewBot/Impro-tg8kuzl


vectors.push(vectorRecord);
submissions.push({
...vectorRecord.metadata,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

spread first so that our values overwrite whatever might be colliding in metadata - namely vector and id key. This way what we define as a property will carry throughout the creation and upsert of the vectors.

@timothycarambat timothycarambat merged commit dfab14a into master Jan 29, 2024
@timothycarambat timothycarambat deleted the 594-lancedb-not-removing-embeddings branch January 29, 2024 17:49
cabwds pushed a commit to cabwds/anything-llm that referenced this pull request Jul 3, 2025
patch lanceDB not deleting vectors from workspace
documentVectors self-sanitize on delete of parent document
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG]: Embeddings not removed.

2 participants