θΏ™ζ˜―indexlocζδΎ›ηš„ζœεŠ‘οΌŒδΈθ¦θΎ“ε…₯任何密码
Skip to content

LMStudio 0 value embeddings patch #4084

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

shatfield4
Copy link
Collaborator

Pull Request Type

  • ✨ feat
  • πŸ› fix
  • ♻️ refactor
  • πŸ’„ style
  • πŸ”¨ chore
  • πŸ“ docs

Relevant Issues

resolves #4080

What is in this change?

  • Fixes a bug where when trying to call on the LMStudio embedding model to embed text, it would return all embedding values as 0 causing all chunks to show as context when chatting
  • Fixed by adding "encoding_format" : "base64" to the LMStudio backend via the OpenAI package

Additional Information

  • NOTE: Discovered that in here
    item?.rerank_score || this.distanceToSimilarity(item._distance);
    when using reranking, the reranking score always tends to be very small and appears as 0% similarity in the UI
  • We may want to scale this number to be larger so it appears in a way that makes more sense to the user

Developer Validations

  • I ran yarn lint from the root of the repo & committed changes
  • Relevant documentation has been updated
  • I have tested my code functionality
  • Docker build succeeds locally

Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR fixes an issue where LMStudio embedding calls returned all-zero vectors by explicitly requesting a base64 encoding format.

  • Adds encoding_format: "base64" to the LMStudio embedder .create() call.
  • No other functional changes introduced.
Comments suppressed due to low confidence (2)

server/utils/EmbeddingEngines/lmstudio/index.js:66

  • Add a comment above this line explaining why the encoding_format parameter is necessary to address the zero-value embeddings issue.
            encoding_format: "base64",

server/utils/EmbeddingEngines/lmstudio/index.js:66

  • Add unit tests to verify that embeddings are correctly base64-encoded when this parameter is included, ensuring the fix works as intended.
            encoding_format: "base64",

@@ -63,6 +63,7 @@ class LMStudioEmbedder {
.create({
model: this.model,
input: chunk,
encoding_format: "base64",
})
.then((result) => {
Copy link
Preview

Copilot AI Jul 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider chaining a .catch() handler to this promise to handle potential API errors and avoid unhandled promise rejections.

Copilot uses AI. Check for mistakes.

@shatfield4 shatfield4 added the PR:needs review Needs review by core team label Jul 1, 2025
@timothycarambat timothycarambat merged commit fc55baf into master Jul 2, 2025
@timothycarambat timothycarambat deleted the 4080-bug-workspace-similarity-score-threshold-is-not-being-respected branch July 2, 2025 01:06
cabwds pushed a commit to cabwds/anything-llm that referenced this pull request Jul 3, 2025
patch lmstudio encoding_format to fix all embeddings as 0 value
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
PR:needs review Needs review by core team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG]: Workspace Similarity Score threshold is not being respected
2 participants