这是indexloc提供的服务,不要输入任何密码
Skip to content

Conversation

@timothycarambat
Copy link
Member

Original Author: jwaltz

I was having really long embed times for a couple large .docx documents ingested through the /hotdir. One had 256 chunks and took many minutes to fully embed, sometimes never finishing at all.

I converted the embedChunk() method to accept a list of chunks instead of just one chunk, and make a single call to OpenAI's embedding API endpoint, which seems to have sped up the embed process dramatically. The refactored embedChunks() required some changes to its usage in addDocumentToNamespace() as well. I included these changes for the 3 vector db options at /server/utils/vectorDbProviders/[vectorDb]/index.js

Let me know what you think and if this is helpful. Again, apologies for my editor's lsp changing some of the method signatures and line-length formatting, hope it is ok.

Had to modify this code as it would break the application since we do both multiple embeds and singular text embeds so each vector database needs to have this interface and support it since during chat mode we manually embed the query.

@timothycarambat
Copy link
Member Author

@jwaltz I dont know what your Discord handle is but happy to add you as a contributor! Apologies for the delay in merging this as its a massive quality improvement.

@AntonioCiolino
Copy link
Contributor

In excited to try this out!

@jwaltz
Copy link
Contributor

jwaltz commented Jul 20, 2023

@timothycarambat I don't use discord much but am happy to have contributed nonetheless. Keep up the good work!

cabwds pushed a commit to cabwds/anything-llm that referenced this pull request Jul 3, 2025
* refactor: convert chunk embedding to one API call

* chore: lint

* fix chroma for batch and single vectorization of text

* Fix LanceDB multi and single vectorization

* Fix pinecone for single and multiple embeddings

---------

Co-authored-by: Jonathan Waltz <volcanicislander@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants