这是indexloc提供的服务,不要输入任何密码
Skip to content

[Bug]: Severe performance regression for Ollama in Termux since ollama version 0.11.5 and onwards #27290

@explor4268

Description

@explor4268

Problem description

ollama versions from 0.11.5 onwards have a really severe performance penalty when running local LLM models. This does not happen with ollama version 0.11.4 or lower, and this issue does not happen under llama-cli itself (provided by the llama-cpp package).

I have confirmed this issue on two different devices, which are Redmi Note 8 and Samsung Galaxy A04s respectively. However, this issue seems to be Termux-specific, as this issue doesn't appear to be apparent on an alpine proot-distro instance, and also on my Linux system (specifically Arch) with GPU acceleration disabled for ollama (as it's not supported on my system).

A similar issue was reported on r/termux by a different user: https://www.reddit.com/r/termux/comments/1o02nkf/some_kind_of_update_may_have_broken_termux_ollama/

Also, instances of ollama serve running inside proot-distro using alpine as the distro with command proot-distro login alpine --isolated --bind "$HOME/.ollama/models:/root/.ollama/models" is not affected by this issue. The ollama package used inside the alpine proot-distro is the one in the Alpine's edge repo on aarch64, after following the "Upgrading to Edge" guide at the Alpine Linux Wiki

Logs

Tip

Search for total duration: to jump between each test results quickly

Speculation

My initial suspicion is that this issue is a compiler optimization issue, due to the fact that other ollama installations besides from termux-packages (e.g. from Arch Linux Packages, and from Alpine Linux edge aarch64 packages, as demonstrated above) are not affected by this slowdown. This issue may be happening due to the compiler flags (or some of them) used to build ollama's version of llama.cpp being overridden by its own compiler flag. However, I'm not sure if this is the real case, as I'm not investigating the build process yet due to resource limitations on my end.

What steps will reproduce the bug?

Note

This test requires at least 1-2 GB of free storage space and at least 4 GB of memory (or about 1-2 GB of available memory)

Automated reproduction steps

  1. Download the following zip archive and extract it to the Android's /storage/emulated/0/Download directory. Do not extract it to a new directory.

ollama-old-builds-termux-aarch64.zip

The deb files were originally obtained from the following GitHub Actions run from this repo. Please note that these binary builds might have already been expired:

Note

Alternatively, you might want to use your own older cached version of ollama if you never ran pkg clean, apt clean, or apt autoclean before. You can find them under the /data/data/com.termux/cache/apt/archives/ directory, then run termux-setup-storage if you haven't done it already, and then copy it into ~/storage/downloads/.

  1. Open Termux and run termux-setup-storage (if it's not done already). Make sure that both ollama_0.11.4_aarch64.deb and ollama_0.11.5_aarch64.deb file are both directly under the ~/storage/downloads directory by running ls ~/storage/downloads and see if both files exists in the output of ls.
  2. Download the following bash script to automate the issue reproduction process:

ollama-repro.sh

  1. Put the script into Termux's home directory with mv ~/storage/downloads/ollama-repro.sh ~
  2. Inspect the script with a text editor to ensure that the script is not partially downloaded/corrupted. Then, make it executable by running chmod +x ollama-repro.sh
  3. Run the script with ./ollama-repro.sh. The script should install all of the necessary dependencies, pull the models from ollama, and test the issue automatically. Make sure to enable the "Keep screen on" option by holding anywhere at the Terminal text, then tapping on "More", then enabling the "Keep screen on" option.

Manual reproduction steps

For ollama

  1. Install ollama version 0.11.4 with apt install --allow-downgrades ~/storage/downloads/ollama_0.11.4_aarch64.deb. See the "Automatic reproduction steps" section above for more information about obtaining older versions of ollama.
  2. Run ollama serve in a terminal session
  3. Launch a new session
  4. Run ollama run qwen2.5-coder:0.5b --verbose. This will pull the model if it's not downloaded already
  5. Send any message to the LLM. You'll see that the token generation speed is somewhat fast enough (around 8-12 tokens/second on my testing)
  6. Exit out of the chat session by using Ctrl-D
  7. Stop the previously launched ollama serve server launched in the previous session with Ctrl-C
  8. Upgrade to any ollama versions beyond 0.11.4 (e.g. 0.11.5, or 0.12.11).
  9. Repeat step 2 until 6. You'll see that the generation speed is about 5-10 times slower than 0.11.4

For llama-cli (as a control variable)

  1. Install llama-cpp for testing with llama-cli
  2. Open ~/.ollama/models/manifests/registry.ollama.ai/library/<model>/<variant> JSON file in a text editor. Replace <model>/<variant> with your own model to test (in this case, qwen2.5-coder/0.5b)
  3. Copy the digest value of an array with the mediaType value set to application/vnd.ollama.image.model. In this case, it's sha256:20693aeb02c63304e263a72453b6ab89e1c700a87c6948cac523ac1e6f7cade0. Paste it somewhere (e.g. Notes). After that, replace the colon symbol : with the hyphen (-) symbol.
  4. Append ~/.ollama/models/blobs/ at the beginning of the previously crafted string. In this case, it will be ~/.ollama/models/blobs/sha256-20693aeb02c63304e263a72453b6ab89e1c700a87c6948cac523ac1e6f7cade0
  5. Run llama-cli -m <path> where <path> is the model file path you've obtained in step 1 until 3.
  6. Send any message to the LLM. You'll see that the token generation speed is as same as ollama version 0.11.4 does
  7. Exit out of the chat session by using Ctrl-D

What is the expected behavior?

Ollama should still run as fast as 0.11.4 does.

System information

Termux Variables:
TERMUX_API_VERSION=0.53.0
TERMUX_APK_RELEASE=F_DROID
TERMUX_APP_PACKAGE_MANAGER=apt
TERMUX_APP_PID=27141
TERMUX_APP__DATA_DIR=/data/user/0/com.termux
TERMUX_APP__LEGACY_DATA_DIR=/data/data/com.termux
TERMUX_APP__SE_FILE_CONTEXT=u:object_r:app_data_file:s0:c30,c257,c512,c768
TERMUX_APP__SE_INFO=default:targetSdkVersion=28:complete
TERMUX_IS_DEBUGGABLE_BUILD=0
TERMUX_MAIN_PACKAGE_FORMAT=debian
TERMUX_VERSION=0.118.3
TERMUX__HOME=/data/data/com.termux/files/home
TERMUX__PREFIX=/data/data/com.termux/files/usr
TERMUX__ROOTFS_DIR=/data/data/com.termux/files
TERMUX__SE_PROCESS_CONTEXT=u:r:untrusted_app_27:s0:c30,c257,c512,c768
TERMUX__USER_ID=0
Packages CPU architecture:
aarch64
Subscribed repositories:
# sources.list
deb https://packages-cf.termux.dev/apt/termux-main stable main
# x11-repo (sources.list.d/x11.list)
deb https://packages-cf.termux.dev/apt/termux-x11 x11 main
# tur-repo (sources.list.d/tur.list)
deb https://tur.kcubeterm.com tur-packages tur tur-on-device tur-continuous
Updatable packages:
All packages up to date
termux-tools version:
1.45.0
Android version:
11
Kernel build information:
Linux localhost 4.14.190-perf-gceaf914a4ae1-dirty #2 SMP PREEMPT Wed Feb 16 22:34:35 WIB 2022 aarch64 Android
Device manufacturer:
Xiaomi
Device model:
Redmi Note 8
Supported ABIs:
SUPPORTED_ABIS: arm64-v8a,armeabi-v7a,armeabi
SUPPORTED_32_BIT_ABIS: armeabi-v7a,armeabi
SUPPORTED_64_BIT_ABIS: arm64-v8a
LD Variables:
LD_LIBRARY_PATH=
LD_PRELOAD=/data/data/com.termux/files/usr/lib/libtermux-exec-ld-preload.so
Installed termux plugins:
com.termux.api versionCode:1002
com.termux.x11 versionCode:15

Metadata

Metadata

Assignees

Labels

bug reportSomething is not working properly

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions