这是indexloc提供的服务,不要输入任何密码
Skip to content

Conversation

@jwaltz
Copy link
Contributor

@jwaltz jwaltz commented Jun 10, 2023

I was having really long embed times for a couple large .docx documents ingested through the /hotdir. One had 256 chunks and took many minutes to fully embed, sometimes never finishing at all.

I converted the embedChunk() method to accept a list of chunks instead of just one chunk, and make a single call to OpenAI's embedding API endpoint, which seems to have sped up the embed process dramatically. The refactored embedChunks() required some changes to its usage in addDocumentToNamespace() as well. I included these changes for the 3 vector db options at /server/utils/vectorDbProviders/[vectorDb]/index.js

Let me know what you think and if this is helpful. Again, apologies for my editor's lsp changing some of the method signatures and line-length formatting, hope it is ok.

@timothycarambat timothycarambat self-assigned this Jun 11, 2023
@timothycarambat
Copy link
Member

LFG - will review. Thank you!

@timothycarambat
Copy link
Member

@jwaltz can you run yarn lint in root to format everything so I can get a clear picture of LOCs 🙏

@jwaltz
Copy link
Contributor Author

jwaltz commented Jun 12, 2023

@timothycarambat no problem, linted and committed.

@AntonioCiolino
Copy link
Contributor

is there a reason this isn't merges (other than merge conflicts)? Anything that could enhance speed would be helpful.

@timothycarambat
Copy link
Member

Moved to #153

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants