refactor: convert chunk embedding to one API call #24

jwaltz · 2023-06-10T21:57:31Z

I was having really long embed times for a couple large .docx documents ingested through the /hotdir. One had 256 chunks and took many minutes to fully embed, sometimes never finishing at all.

I converted the embedChunk() method to accept a list of chunks instead of just one chunk, and make a single call to OpenAI's embedding API endpoint, which seems to have sped up the embed process dramatically. The refactored embedChunks() required some changes to its usage in addDocumentToNamespace() as well. I included these changes for the 3 vector db options at /server/utils/vectorDbProviders/[vectorDb]/index.js

Let me know what you think and if this is helpful. Again, apologies for my editor's lsp changing some of the method signatures and line-length formatting, hope it is ok.

timothycarambat · 2023-06-11T04:32:05Z

LFG - will review. Thank you!

timothycarambat · 2023-06-12T18:24:34Z

@jwaltz can you run yarn lint in root to format everything so I can get a clear picture of LOCs 🙏

jwaltz · 2023-06-12T21:34:30Z

@timothycarambat no problem, linted and committed.

AntonioCiolino · 2023-07-12T13:35:31Z

is there a reason this isn't merges (other than merge conflicts)? Anything that could enhance speed would be helpful.

timothycarambat · 2023-07-20T19:05:12Z

Moved to #153

refactor: convert chunk embedding to one API call

e2b3b74

timothycarambat self-assigned this Jun 11, 2023

timothycarambat assigned jwaltz and unassigned timothycarambat Jun 12, 2023

chore: lint

885d5e2

timothycarambat assigned timothycarambat and unassigned jwaltz Jun 14, 2023

timothycarambat closed this Jul 20, 2023

dandinu mentioned this pull request Aug 10, 2023

Azure OpenAI Embed API says: "Too many inputs. The max number of inputs is 16." #184

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

refactor: convert chunk embedding to one API call #24

refactor: convert chunk embedding to one API call #24

Uh oh!

jwaltz commented Jun 10, 2023

Uh oh!

timothycarambat commented Jun 11, 2023

Uh oh!

timothycarambat commented Jun 12, 2023

Uh oh!

jwaltz commented Jun 12, 2023

Uh oh!

AntonioCiolino commented Jul 12, 2023

Uh oh!

timothycarambat commented Jul 20, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

refactor: convert chunk embedding to one API call #24

refactor: convert chunk embedding to one API call #24

Uh oh!

Conversation

jwaltz commented Jun 10, 2023

Uh oh!

timothycarambat commented Jun 11, 2023

Uh oh!

timothycarambat commented Jun 12, 2023

Uh oh!

jwaltz commented Jun 12, 2023

Uh oh!

AntonioCiolino commented Jul 12, 2023

Uh oh!

timothycarambat commented Jul 20, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants