[Bug]: Severe performance regression for Ollama in Termux since ollama version 0.11.5 and onwards

### Problem description

`ollama` versions from `0.11.5` onwards have a really severe performance penalty when running local LLM models. This does not happen with `ollama` version `0.11.4` or lower, and this issue does not happen under `llama-cli` itself (provided by the `llama-cpp` package).

I have confirmed this issue on two different devices, which are `Redmi Note 8` and `Samsung Galaxy A04s` respectively. However, this issue seems to be Termux-specific, as this issue doesn't appear to be apparent on an `alpine` `proot-distro` instance, and also on my Linux system (specifically Arch) with GPU acceleration disabled for `ollama` (as it's not supported on my system).

A similar issue was reported on `r/termux` by a different user: <https://www.reddit.com/r/termux/comments/1o02nkf/some_kind_of_update_may_have_broken_termux_ollama/>

Also, instances of `ollama serve` running inside `proot-distro` using `alpine` as the distro with command `proot-distro login alpine --isolated --bind "$HOME/.ollama/models:/root/.ollama/models"` is not affected by this issue. The `ollama` package used inside the `alpine` `proot-distro` is the one in the Alpine's `edge` repo on `aarch64`, after following the ["Upgrading to Edge" guide](https://wiki.alpinelinux.org/wiki/Upgrading_Alpine_Linux_to_a_new_release_branch#Upgrading_to_Edge) at the Alpine Linux Wiki

**Logs**

> [!TIP]
> Search for `total duration:` to jump between each test results quickly

- [ollama-repro-termux.note8.txt](https://github.com/user-attachments/files/23565511/ollama-repro-termux.note8.txt)
- [ollama-repro-termux.a04s.txt](https://github.com/user-attachments/files/23565512/ollama-repro-termux.a04s.txt)
- [ollama-repro-linux.arch.txt](https://github.com/user-attachments/files/23565850/ollama-repro-linux.arch.txt): Same test, running on Arch Linux (x86_64). You can see that the generation speed is still consistent across different versions.

**Speculation**

My initial suspicion is that this issue is a compiler optimization issue, due to the fact that other `ollama` installations besides from `termux-packages` (e.g. from Arch Linux Packages, and from Alpine Linux `edge` `aarch64` packages, as demonstrated above) are not affected by this slowdown. This issue may be happening due to the compiler flags (or some of them) used to build `ollama`'s version of `llama.cpp` being overridden by its own compiler flag. However, I'm not sure if this is the real case, as I'm not investigating the build process yet due to resource limitations on my end.

### What steps will reproduce the bug?

> [!NOTE]
> This test requires at least 1-2 GB of free storage space and at least 4 GB of memory (or about 1-2 GB of available memory)

**Automated reproduction steps**

1. Download the following zip archive and extract it to the Android's `/storage/emulated/0/Download` directory. **Do not extract it to a new directory.**

[ollama-old-builds-termux-aarch64.zip](https://github.com/user-attachments/files/23565387/ollama-old-builds-termux-aarch64.zip)

The deb files were originally obtained from the following GitHub Actions run from this repo. Please note that these binary builds might have already been expired:

- `0.11.4`: https://github.com/termux/termux-packages/actions/runs/16823401919
- `0.11.5`: https://github.com/termux/termux-packages/actions/runs/17085251284

> [!NOTE]
> Alternatively, you might want to use your own older cached version of `ollama` if you never ran `pkg clean`, `apt clean`, or `apt autoclean` before. You can find them under the `/data/data/com.termux/cache/apt/archives/` directory, then run `termux-setup-storage` if you haven't done it already, and then copy it into `~/storage/downloads/`.

2. Open Termux and run `termux-setup-storage` (if it's not done already). Make sure that both `ollama_0.11.4_aarch64.deb` and `ollama_0.11.5_aarch64.deb` file are both directly under the `~/storage/downloads` directory by running `ls ~/storage/downloads` and see if both files exists in the output of `ls`.
3. Download the following bash script to automate the issue reproduction process:

[ollama-repro.sh](https://github.com/user-attachments/files/23565777/ollama-repro.sh)

4. Put the script into Termux's home directory with `mv ~/storage/downloads/ollama-repro.sh ~`
5. Inspect the script with a text editor to ensure that the script is not partially downloaded/corrupted. Then, make it executable by running `chmod +x ollama-repro.sh`
6. Run the script with `./ollama-repro.sh`. The script should install all of the necessary dependencies, pull the models from ollama, and test the issue automatically. **Make sure to enable the "Keep screen on" option by holding anywhere at the Terminal text, then tapping on "More", then enabling the "Keep screen on" option.**

**Manual reproduction steps**

For `ollama`

1. Install `ollama` version `0.11.4` with `apt install --allow-downgrades ~/storage/downloads/ollama_0.11.4_aarch64.deb`. See the "Automatic reproduction steps" section above for more information about obtaining older versions of `ollama`.
2. Run `ollama serve` in a terminal session
3. Launch a new session
4. Run `ollama run qwen2.5-coder:0.5b --verbose`. This will pull the model if it's not downloaded already
5. Send any message to the LLM. You'll see that the token generation speed is somewhat fast enough (around 8-12 tokens/second on my testing)
6. Exit out of the chat session by using Ctrl-D
7. Stop the previously launched `ollama serve` server launched in the previous session with Ctrl-C
8. Upgrade to any `ollama` versions beyond `0.11.4` (e.g. `0.11.5`, or `0.12.11`). 
9. Repeat step 2 until 6. You'll see that the generation speed is about 5-10 times slower than `0.11.4`

For `llama-cli` (as a control variable)

1. Install `llama-cpp` for testing with `llama-cli`
2. Open `~/.ollama/models/manifests/registry.ollama.ai/library/<model>/<variant>` JSON file in a text editor. Replace `<model>/<variant>` with your own model to test (in this case, `qwen2.5-coder/0.5b`)
3. Copy the `digest` value of an array with the `mediaType` value set to `application/vnd.ollama.image.model`. In this case, it's `sha256:20693aeb02c63304e263a72453b6ab89e1c700a87c6948cac523ac1e6f7cade0`. Paste it somewhere (e.g. Notes). After that, replace the colon symbol `:` with the hyphen (`-`) symbol.
4. Append `~/.ollama/models/blobs/` at the beginning of the previously crafted string. In this case, it will be `~/.ollama/models/blobs/sha256-20693aeb02c63304e263a72453b6ab89e1c700a87c6948cac523ac1e6f7cade0`
5. Run `llama-cli -m <path>` where `<path>` is the model file path you've obtained in step 1 until 3.
6. Send any message to the LLM. You'll see that the token generation speed is as same as `ollama` version `0.11.4` does
7. Exit out of the chat session by using Ctrl-D

### What is the expected behavior?

Ollama should still run as fast as `0.11.4` does.

### System information

```shell
Termux Variables:
TERMUX_API_VERSION=0.53.0
TERMUX_APK_RELEASE=F_DROID
TERMUX_APP_PACKAGE_MANAGER=apt
TERMUX_APP_PID=27141
TERMUX_APP__DATA_DIR=/data/user/0/com.termux
TERMUX_APP__LEGACY_DATA_DIR=/data/data/com.termux
TERMUX_APP__SE_FILE_CONTEXT=u:object_r:app_data_file:s0:c30,c257,c512,c768
TERMUX_APP__SE_INFO=default:targetSdkVersion=28:complete
TERMUX_IS_DEBUGGABLE_BUILD=0
TERMUX_MAIN_PACKAGE_FORMAT=debian
TERMUX_VERSION=0.118.3
TERMUX__HOME=/data/data/com.termux/files/home
TERMUX__PREFIX=/data/data/com.termux/files/usr
TERMUX__ROOTFS_DIR=/data/data/com.termux/files
TERMUX__SE_PROCESS_CONTEXT=u:r:untrusted_app_27:s0:c30,c257,c512,c768
TERMUX__USER_ID=0
Packages CPU architecture:
aarch64
Subscribed repositories:
# sources.list
deb https://packages-cf.termux.dev/apt/termux-main stable main
# x11-repo (sources.list.d/x11.list)
deb https://packages-cf.termux.dev/apt/termux-x11 x11 main
# tur-repo (sources.list.d/tur.list)
deb https://tur.kcubeterm.com tur-packages tur tur-on-device tur-continuous
Updatable packages:
All packages up to date
termux-tools version:
1.45.0
Android version:
11
Kernel build information:
Linux localhost 4.14.190-perf-gceaf914a4ae1-dirty #2 SMP PREEMPT Wed Feb 16 22:34:35 WIB 2022 aarch64 Android
Device manufacturer:
Xiaomi
Device model:
Redmi Note 8
Supported ABIs:
SUPPORTED_ABIS: arm64-v8a,armeabi-v7a,armeabi
SUPPORTED_32_BIT_ABIS: armeabi-v7a,armeabi
SUPPORTED_64_BIT_ABIS: arm64-v8a
LD Variables:
LD_LIBRARY_PATH=
LD_PRELOAD=/data/data/com.termux/files/usr/lib/libtermux-exec-ld-preload.so
Installed termux plugins:
com.termux.api versionCode:1002
com.termux.x11 versionCode:15
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Bug]: Severe performance regression for Ollama in Termux since ollama version 0.11.5 and onwards #27290

Problem description

What steps will reproduce the bug?

What is the expected behavior?

System information

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[Bug]: Severe performance regression for Ollama in Termux since ollama version 0.11.5 and onwards #27290

Description

Problem description

What steps will reproduce the bug?

What is the expected behavior?

System information

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions