这是indexloc提供的服务,不要输入任何密码
Skip to content

normalize embeddings #5821

@bgeneto

Description

@bgeneto

Is your feature request related to a problem?

Currently, the embeddings generated by llama.cpp via local-ai are not normalized. For many applications, especially those involving semantic search or vector similarity calculations with cosine similarity, embeddings must be L2 normalized. This forces developers to perform a normalization step on the client-side after receiving the embedding vector from the API.

Describe the solution you'd like

I propose adding a new boolean option to the embeddings model yaml config file, named embd_normalize (equivalent to llama.cpp arg --embd-normalize) that triggers the normalization. Also, I think this option should be the default when making requests to a OpenAI-like endpoint as /v1/embeddings but not enabled (by default) in /embeddings. This is compatible with OpenAI models (and endpoint) that returns L2 normalized vector embeddings.

When this option is set to true, the llama.cpp server will perform an L2 normalization on the final embedding vector before it is returned in the API response (this is already implemented in recent llama.cpp versions). When the option is false or not present, the server should return the raw, non-normalized embedding only for the endpoint /embeddings but normalized for /v1/embeddings.

This would allow users to receive ready-to-use, normalized embeddings directly from the API, simplifying client-side logic and improving overall efficiency.

Example model config file:

name: qwen3-embedding-4b
embeddings: true
backend: llama-cpp
context_size: 32768
f16: true
mmap: true
parameters:
  model: Qwen3-Embedding-4B-Q8_0.gguf
  embd_normalize: true

Describe alternatives you've considered

The only alternative at present is to manually normalize the embedding vectors on the client-side. This involves receiving the raw vector from llama.cpp and then implementing a function to calculate the L2 norm and divide each component of the vector by it. While functional, this approach is less efficient and requires every client application developer to reimplement the same logic.

Additional context

L2 normalization is a standard procedure for preparing embeddings for many machine learning tasks.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions