-
-
Notifications
You must be signed in to change notification settings - Fork 3.7k
Description
Problem description
ollama versions from 0.11.5 onwards have a really severe performance penalty when running local LLM models. This does not happen with ollama version 0.11.4 or lower, and this issue does not happen under llama-cli itself (provided by the llama-cpp package).
I have confirmed this issue on two different devices, which are Redmi Note 8 and Samsung Galaxy A04s respectively. However, this issue seems to be Termux-specific, as this issue doesn't appear to be apparent on an alpine proot-distro instance, and also on my Linux system (specifically Arch) with GPU acceleration disabled for ollama (as it's not supported on my system).
A similar issue was reported on r/termux by a different user: https://www.reddit.com/r/termux/comments/1o02nkf/some_kind_of_update_may_have_broken_termux_ollama/
Also, instances of ollama serve running inside proot-distro using alpine as the distro with command proot-distro login alpine --isolated --bind "$HOME/.ollama/models:/root/.ollama/models" is not affected by this issue. The ollama package used inside the alpine proot-distro is the one in the Alpine's edge repo on aarch64, after following the "Upgrading to Edge" guide at the Alpine Linux Wiki
Logs
Tip
Search for total duration: to jump between each test results quickly
- ollama-repro-termux.note8.txt
- ollama-repro-termux.a04s.txt
- ollama-repro-linux.arch.txt: Same test, running on Arch Linux (x86_64). You can see that the generation speed is still consistent across different versions.
Speculation
My initial suspicion is that this issue is a compiler optimization issue, due to the fact that other ollama installations besides from termux-packages (e.g. from Arch Linux Packages, and from Alpine Linux edge aarch64 packages, as demonstrated above) are not affected by this slowdown. This issue may be happening due to the compiler flags (or some of them) used to build ollama's version of llama.cpp being overridden by its own compiler flag. However, I'm not sure if this is the real case, as I'm not investigating the build process yet due to resource limitations on my end.
What steps will reproduce the bug?
Note
This test requires at least 1-2 GB of free storage space and at least 4 GB of memory (or about 1-2 GB of available memory)
Automated reproduction steps
- Download the following zip archive and extract it to the Android's
/storage/emulated/0/Downloaddirectory. Do not extract it to a new directory.
ollama-old-builds-termux-aarch64.zip
The deb files were originally obtained from the following GitHub Actions run from this repo. Please note that these binary builds might have already been expired:
0.11.4: https://github.com/termux/termux-packages/actions/runs/168234019190.11.5: https://github.com/termux/termux-packages/actions/runs/17085251284
Note
Alternatively, you might want to use your own older cached version of ollama if you never ran pkg clean, apt clean, or apt autoclean before. You can find them under the /data/data/com.termux/cache/apt/archives/ directory, then run termux-setup-storage if you haven't done it already, and then copy it into ~/storage/downloads/.
- Open Termux and run
termux-setup-storage(if it's not done already). Make sure that bothollama_0.11.4_aarch64.debandollama_0.11.5_aarch64.debfile are both directly under the~/storage/downloadsdirectory by runningls ~/storage/downloadsand see if both files exists in the output ofls. - Download the following bash script to automate the issue reproduction process:
- Put the script into Termux's home directory with
mv ~/storage/downloads/ollama-repro.sh ~ - Inspect the script with a text editor to ensure that the script is not partially downloaded/corrupted. Then, make it executable by running
chmod +x ollama-repro.sh - Run the script with
./ollama-repro.sh. The script should install all of the necessary dependencies, pull the models from ollama, and test the issue automatically. Make sure to enable the "Keep screen on" option by holding anywhere at the Terminal text, then tapping on "More", then enabling the "Keep screen on" option.
Manual reproduction steps
For ollama
- Install
ollamaversion0.11.4withapt install --allow-downgrades ~/storage/downloads/ollama_0.11.4_aarch64.deb. See the "Automatic reproduction steps" section above for more information about obtaining older versions ofollama. - Run
ollama servein a terminal session - Launch a new session
- Run
ollama run qwen2.5-coder:0.5b --verbose. This will pull the model if it's not downloaded already - Send any message to the LLM. You'll see that the token generation speed is somewhat fast enough (around 8-12 tokens/second on my testing)
- Exit out of the chat session by using Ctrl-D
- Stop the previously launched
ollama serveserver launched in the previous session with Ctrl-C - Upgrade to any
ollamaversions beyond0.11.4(e.g.0.11.5, or0.12.11). - Repeat step 2 until 6. You'll see that the generation speed is about 5-10 times slower than
0.11.4
For llama-cli (as a control variable)
- Install
llama-cppfor testing withllama-cli - Open
~/.ollama/models/manifests/registry.ollama.ai/library/<model>/<variant>JSON file in a text editor. Replace<model>/<variant>with your own model to test (in this case,qwen2.5-coder/0.5b) - Copy the
digestvalue of an array with themediaTypevalue set toapplication/vnd.ollama.image.model. In this case, it'ssha256:20693aeb02c63304e263a72453b6ab89e1c700a87c6948cac523ac1e6f7cade0. Paste it somewhere (e.g. Notes). After that, replace the colon symbol:with the hyphen (-) symbol. - Append
~/.ollama/models/blobs/at the beginning of the previously crafted string. In this case, it will be~/.ollama/models/blobs/sha256-20693aeb02c63304e263a72453b6ab89e1c700a87c6948cac523ac1e6f7cade0 - Run
llama-cli -m <path>where<path>is the model file path you've obtained in step 1 until 3. - Send any message to the LLM. You'll see that the token generation speed is as same as
ollamaversion0.11.4does - Exit out of the chat session by using Ctrl-D
What is the expected behavior?
Ollama should still run as fast as 0.11.4 does.
System information
Termux Variables:
TERMUX_API_VERSION=0.53.0
TERMUX_APK_RELEASE=F_DROID
TERMUX_APP_PACKAGE_MANAGER=apt
TERMUX_APP_PID=27141
TERMUX_APP__DATA_DIR=/data/user/0/com.termux
TERMUX_APP__LEGACY_DATA_DIR=/data/data/com.termux
TERMUX_APP__SE_FILE_CONTEXT=u:object_r:app_data_file:s0:c30,c257,c512,c768
TERMUX_APP__SE_INFO=default:targetSdkVersion=28:complete
TERMUX_IS_DEBUGGABLE_BUILD=0
TERMUX_MAIN_PACKAGE_FORMAT=debian
TERMUX_VERSION=0.118.3
TERMUX__HOME=/data/data/com.termux/files/home
TERMUX__PREFIX=/data/data/com.termux/files/usr
TERMUX__ROOTFS_DIR=/data/data/com.termux/files
TERMUX__SE_PROCESS_CONTEXT=u:r:untrusted_app_27:s0:c30,c257,c512,c768
TERMUX__USER_ID=0
Packages CPU architecture:
aarch64
Subscribed repositories:
# sources.list
deb https://packages-cf.termux.dev/apt/termux-main stable main
# x11-repo (sources.list.d/x11.list)
deb https://packages-cf.termux.dev/apt/termux-x11 x11 main
# tur-repo (sources.list.d/tur.list)
deb https://tur.kcubeterm.com tur-packages tur tur-on-device tur-continuous
Updatable packages:
All packages up to date
termux-tools version:
1.45.0
Android version:
11
Kernel build information:
Linux localhost 4.14.190-perf-gceaf914a4ae1-dirty #2 SMP PREEMPT Wed Feb 16 22:34:35 WIB 2022 aarch64 Android
Device manufacturer:
Xiaomi
Device model:
Redmi Note 8
Supported ABIs:
SUPPORTED_ABIS: arm64-v8a,armeabi-v7a,armeabi
SUPPORTED_32_BIT_ABIS: armeabi-v7a,armeabi
SUPPORTED_64_BIT_ABIS: arm64-v8a
LD Variables:
LD_LIBRARY_PATH=
LD_PRELOAD=/data/data/com.termux/files/usr/lib/libtermux-exec-ld-preload.so
Installed termux plugins:
com.termux.api versionCode:1002
com.termux.x11 versionCode:15