这是indexloc提供的服务,不要输入任何密码
Skip to content

[BUG]: performance - tokenizeString runs unnecessary when EMBEDDING_ENGINE is not ‘openai’ #3069

@louishalbritter

Description

@louishalbritter

How are you running AnythingLLM?

All versions

What happened?

The function tokenizeString is very CPU-intensive. Its only use I found is here to estimate embedding costs for OpenAI:

// Do not do cost estimation unless the embedding engine is OpenAi.
if (systemSettings?.EmbeddingEngine === "openai") {

When run against a local LLM provider, this function isn’t necessary, thus saving significant time and energy.

Are there known steps to reproduce?

I'm using this configuration:

      - EMBEDDING_ENGINE=ollama
      - EMBEDDING_BASE_PATH=http://ollama:11434
      - EMBEDDING_MODEL_PREF=nomic-embed-text:latest

When I upload an 80KiB .xlsx file, the process takes too long and results in a timeout.
Without the token estimation, it's embedded within 1.1s.

Metadata

Metadata

Assignees

No one assigned

    Labels

    possible bugBug was reported but is not confirmed or is unable to be replicated.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions