Documentation
¶
Overview ¶
Package embeddings contains helpers for creating vector embeddings from text using different providers.
The main components of this package are:
- Embedder interface: a common interface for creating vector embeddings from texts, with optional batching.
- NewEmbedder creates implementations of Embedder from provider LLM (or Chat) clients.
See the package example below.
Example ¶
package main
import (
"context"
"log"
"github.com/tmc/langchaingo/embeddings"
"github.com/tmc/langchaingo/llms/openai"
)
func main() { //nolint:testableexamples
llm, err := openai.New()
if err != nil {
log.Fatal(err)
}
// Create a new Embedder from the given LLM.
embedder, err := embeddings.NewEmbedder(llm)
if err != nil {
log.Fatal(err)
}
docs := []string{"doc 1", "another doc"}
embs, err := embedder.EmbedDocuments(context.Background(), docs)
if err != nil {
log.Fatal(err)
}
// Consume embs
_ = embs
}
Index ¶
- Variables
- func BatchTexts(texts []string, batchSize int) [][]string
- func BatchedEmbed(ctx context.Context, embedder EmbedderClient, texts []string, batchSize int) ([][]float32, error)
- func CombineVectors(vectors [][]float32, weights []int) ([]float32, error)
- func MaybeRemoveNewLines(texts []string, removeNewLines bool) []string
- type Embedder
- type EmbedderClient
- type EmbedderClientFunc
- type EmbedderImpl
- type Option
Examples ¶
Constants ¶
This section is empty.
Variables ¶
var ( // ErrVectorsNotSameSize is returned if the vectors returned from the // embeddings api have different sizes. ErrVectorsNotSameSize = errors.New("vectors gotten not the same size") // ErrAllTextsLenZero is returned if all texts to be embedded has the combined // length of zero. ErrAllTextsLenZero = errors.New("all texts have length 0") )
Functions ¶
func BatchTexts ¶
BatchTexts splits strings by the length batchSize.
func BatchedEmbed ¶
func BatchedEmbed(ctx context.Context, embedder EmbedderClient, texts []string, batchSize int) ([][]float32, error)
BatchedEmbed creates embeddings for the given input texts, batching them into batches of batchSize if needed.
func MaybeRemoveNewLines ¶
Types ¶
type Embedder ¶
type Embedder interface {
// EmbedDocuments returns a vector for each text.
EmbedDocuments(ctx context.Context, texts []string) ([][]float32, error)
// EmbedQuery embeds a single text.
EmbedQuery(ctx context.Context, text string) ([]float32, error)
}
Embedder is the interface for creating vector embeddings from texts.
type EmbedderClient ¶
type EmbedderClient interface {
CreateEmbedding(ctx context.Context, texts []string) ([][]float32, error)
}
EmbedderClient is the interface LLM clients implement for embeddings.
type EmbedderClientFunc ¶
EmbedderClientFunc is an adapter to allow the use of ordinary functions as Embedder Clients. If `f` is a function with the appropriate signature, `EmbedderClientFunc(f)` is an `EmbedderClient` that calls `f`.
func (EmbedderClientFunc) CreateEmbedding ¶
type EmbedderImpl ¶
type EmbedderImpl struct {
StripNewLines bool
BatchSize int
// contains filtered or unexported fields
}
func NewEmbedder ¶
func NewEmbedder(client EmbedderClient, opts ...Option) (*EmbedderImpl, error)
NewEmbedder creates a new Embedder from the given EmbedderClient, with some options that affect how embedding will be done.
func (*EmbedderImpl) EmbedDocuments ¶
EmbedDocuments creates one vector embedding for each of the texts.
func (*EmbedderImpl) EmbedQuery ¶
EmbedQuery embeds a single text.
type Option ¶
type Option func(p *EmbedderImpl)
func WithBatchSize ¶
WithBatchSize is an option for specifying the batch size.
func WithStripNewLines ¶
WithStripNewLines is an option for specifying the should it strip new lines.