refactor: convert chunk embedding to one API call #24
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
I was having really long embed times for a couple large .docx documents ingested through the
/hotdir. One had 256 chunks and took many minutes to fully embed, sometimes never finishing at all.I converted the
embedChunk()method to accept a list of chunks instead of just one chunk, and make a single call to OpenAI's embedding API endpoint, which seems to have sped up the embed process dramatically. The refactoredembedChunks()required some changes to its usage inaddDocumentToNamespace()as well. I included these changes for the 3 vector db options at/server/utils/vectorDbProviders/[vectorDb]/index.jsLet me know what you think and if this is helpful. Again, apologies for my editor's lsp changing some of the method signatures and line-length formatting, hope it is ok.