Adjust how text is split depending on input type #1238

timothycarambat · 2024-04-30T17:11:26Z

Pull Request Type

Relevant Issues

resolves #1230

What is in this change?

Fixes issue where prompt would be split erroneously by the embedder during vector search resulting in worse semantic similarity.

Additional Information

Important

We need to also ensure the prompt given (or chunks of prompts) are not longer than the embedder model's max length or prompt search will crash

Developer Validations

I ran yarn lint from the root of the repo & committed changes
Relevant documentation has been updated
I have tested my code functionality
Docker build succeeds locally

resolves #1230

resolves Mintplex-Labs#1230

Adjust how text is split depending on input type

5f66e2a

resolves #1230

timothycarambat merged commit bf435b2 into master Apr 30, 2024

timothycarambat deleted the 1230-text-input-embed-chunking-bug branch April 30, 2024 17:11

timothycarambat mentioned this pull request Apr 30, 2024

[CHORE]: Embedder embedText length check #1239

Open

cabwds pushed a commit to cabwds/anything-llm that referenced this pull request Jul 3, 2025

Adjust how text is split depending on input type (Mintplex-Labs#1238)

7dd42d0

resolves Mintplex-Labs#1230

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Adjust how text is split depending on input type #1238

Adjust how text is split depending on input type #1238

Uh oh!

timothycarambat commented Apr 30, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Adjust how text is split depending on input type #1238

Adjust how text is split depending on input type #1238

Uh oh!

Conversation

timothycarambat commented Apr 30, 2024

Pull Request Type

Relevant Issues

What is in this change?

Additional Information

Developer Validations

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants