[BUG]: performance - `tokenizeString` runs unnecessary when `EMBEDDING_ENGINE` is not ‘openai’

### How are you running AnythingLLM?

All versions

### What happened?

The function `tokenizeString` is very CPU-intensive. Its only use I found is here to estimate embedding costs for OpenAI:
https://github.com/Mintplex-Labs/anything-llm/blob/e1af72daa73a12ee96035548f81fd364d5da3c4c/frontend/src/components/Modals/ManageWorkspace/Documents/index.jsx#L145-L146
When run against a local LLM provider, this function isn’t necessary, thus saving significant time and energy.


### Are there known steps to reproduce?

I'm using this configuration:
```
      - EMBEDDING_ENGINE=ollama
      - EMBEDDING_BASE_PATH=http://ollama:11434
      - EMBEDDING_MODEL_PREF=nomic-embed-text:latest
```

When I upload an 80KiB .xlsx file, the process takes too long and results in a timeout.
Without the token estimation, it's embedded within 1.1s.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[BUG]: performance - `tokenizeString` runs unnecessary when `EMBEDDING_ENGINE` is not ‘openai’ #3069

How are you running AnythingLLM?

What happened?

Are there known steps to reproduce?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

	// Do not do cost estimation unless the embedding engine is OpenAi.
	if (systemSettings?.EmbeddingEngine === "openai") {

Uh oh!

[BUG]: performance - tokenizeString runs unnecessary when EMBEDDING_ENGINE is not ‘openai’ #3069

Description

How are you running AnythingLLM?

What happened?

Are there known steps to reproduce?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

[BUG]: performance - `tokenizeString` runs unnecessary when `EMBEDDING_ENGINE` is not ‘openai’ #3069