-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Open
Labels
Description
OS: Windows Server 2025
GPU: Intel Arc Pro B60 Graphics (2x GPUs)
Version: ollama-ipex-llm-2.3.0b20250725
Ollama crashes with a panic (nil pointer dereference) when attempting to load vision/multimodal models. Every vision model (gemma3, qwen-2.5vl, llava, llama32.-vision, minicpm) I have tried so far crashes.
I am able to run non-vision models like deepseek-r1:70b just fine.
Vision models work as intended if I restrict Ollama to a single card.
Ollama crashed as follows:
panic: runtime error: invalid memory address or nil pointer dereference
[signal 0xc0000005 code=0x1 addr=0x50 pc=0x7ff638d2f571]
goroutine 14 [running]:
github.com/ollama/ollama/ml/backend/ggml.(*Context).Reserve(0xc001e38100)
D:/ruonan/ollama-internal/ml/backend/ggml/ggml.go:698 +0x2b1
github.com/ollama/ollama/runner/ollamarunner.multimodalStore.getTensor(...)
D:/ruonan/ollama-internal/runner/ollamarunner/multimodal.go:98 +0x2a4
github.com/ollama/ollama/runner/ollamarunner.multimodalStore.getMultimodal(...)
D:/ruonan/ollama-internal/runner/ollamarunner/multimodal.go:56 +0xe5
github.com/ollama/ollama/runner/ollamarunner.(*Server).reserveWorstCaseGraph(0xc00033a7e0)
D:/ruonan/ollama-internal/runner/ollamarunner/runner.go:796 +0x70e
Full Log
time=2025-10-06T11:02:57.378-04:00 level=INFO source=routes.go:1235 msg="server config" env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY:localhost,127.0.0.1 OLLAMA_CONTEXT_LENGTH:4096 OLLAMA_DEBUG:INFO OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://0.0.0.0:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:10m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:D:\\Models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:2 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_SCHED_SPREAD:false ROCR_VISIBLE_DEVICES:]"
time=2025-10-06T11:02:57.384-04:00 level=INFO source=images.go:476 msg="total blobs: 58"
time=2025-10-06T11:02:57.385-04:00 level=INFO source=images.go:483 msg="total unused blobs removed: 0"
[GIN-debug] [WARNING] Creating an Engine instance with the Logger and Recovery middleware already attached.
[GIN-debug] [WARNING] Running in "debug" mode. Switch to "release" mode in production.
- using env: export GIN_MODE=release
- using code: gin.SetMode(gin.ReleaseMode)
[GIN-debug] HEAD / --> github.com/ollama/ollama/server.(*Server).GenerateRoutes.func1 (5 handlers)
[GIN-debug] GET / --> github.com/ollama/ollama/server.(*Server).GenerateRoutes.func2 (5 handlers)
[GIN-debug] HEAD /api/version --> github.com/ollama/ollama/server.(*Server).GenerateRoutes.func3 (5 handlers)
[GIN-debug] GET /api/version --> github.com/ollama/ollama/server.(*Server).GenerateRoutes.func4 (5 handlers)
[GIN-debug] POST /api/pull --> github.com/ollama/ollama/server.(*Server).PullHandler-fm (5 handlers)
[GIN-debug] POST /api/push --> github.com/ollama/ollama/server.(*Server).PushHandler-fm (5 handlers)
[GIN-debug] HEAD /api/tags --> github.com/ollama/ollama/server.(*Server).ListHandler-fm (5 handlers)
[GIN-debug] GET /api/tags --> github.com/ollama/ollama/server.(*Server).ListHandler-fm (5 handlers)
[GIN-debug] POST /api/show --> github.com/ollama/ollama/server.(*Server).ShowHandler-fm (5 handlers)
[GIN-debug] DELETE /api/delete --> github.com/ollama/ollama/server.(*Server).DeleteHandler-fm (5 handlers)
[GIN-debug] POST /api/create --> github.com/ollama/ollama/server.(*Server).CreateHandler-fm (5 handlers)
[GIN-debug] POST /api/blobs/:digest --> github.com/ollama/ollama/server.(*Server).CreateBlobHandler-fm (5 handlers)
[GIN-debug] HEAD /api/blobs/:digest --> github.com/ollama/ollama/server.(*Server).HeadBlobHandler-fm (5 handlers)
[GIN-debug] POST /api/copy --> github.com/ollama/ollama/server.(*Server).CopyHandler-fm (5 handlers)
[GIN-debug] GET /api/ps --> github.com/ollama/ollama/server.(*Server).PsHandler-fm (5 handlers)
[GIN-debug] POST /api/generate --> github.com/ollama/ollama/server.(*Server).GenerateHandler-fm (5 handlers)
[GIN-debug] POST /api/chat --> github.com/ollama/ollama/server.(*Server).ChatHandler-fm (5 handlers)
[GIN-debug] POST /api/embed --> github.com/ollama/ollama/server.(*Server).EmbedHandler-fm (5 handlers)
[GIN-debug] POST /api/embeddings --> github.com/ollama/ollama/server.(*Server).EmbeddingsHandler-fm (5 handlers)
[GIN-debug] POST /v1/chat/completions --> github.com/ollama/ollama/server.(*Server).ChatHandler-fm (6 handlers)
[GIN-debug] POST /v1/completions --> github.com/ollama/ollama/server.(*Server).GenerateHandler-fm (6 handlers)
[GIN-debug] POST /v1/embeddings --> github.com/ollama/ollama/server.(*Server).EmbedHandler-fm (6 handlers)
[GIN-debug] GET /v1/models --> github.com/ollama/ollama/server.(*Server).ListHandler-fm (6 handlers)
[GIN-debug] GET /v1/models/:model --> github.com/ollama/ollama/server.(*Server).ShowHandler-fm (6 handlers)
time=2025-10-06T11:02:57.386-04:00 level=INFO source=routes.go:1288 msg="Listening on [::]:11434 (version 0.9.3)"
time=2025-10-06T11:02:57.386-04:00 level=INFO source=gpu.go:217 msg="looking for compatible GPUs"
time=2025-10-06T11:02:57.386-04:00 level=INFO source=gpu.go:218 msg="using Intel GPU"
time=2025-10-06T11:02:57.386-04:00 level=INFO source=gpu_windows.go:167 msg=packages count=1
time=2025-10-06T11:02:57.386-04:00 level=INFO source=gpu_windows.go:214 msg="" package=0 cores=16 efficiency=0 threads=32
time=2025-10-06T11:02:57.392-04:00 level=INFO source=types.go:130 msg="inference compute" id=0 library=cpu variant="" compute="" driver=0.0 name="" total="127.9 GiB" available="108.6 GiB"
[GIN] 2025/10/06 - 11:03:49 | 200 | 7.3018ms | 127.0.0.1 | GET "/api/tags"
[GIN] 2025/10/06 - 11:03:49 | 200 | 0s | 127.0.0.1 | GET "/api/ps"
[GIN] 2025/10/06 - 11:03:50 | 200 | 0s | 127.0.0.1 | GET "/api/version"
time=2025-10-06T11:04:08.530-04:00 level=INFO source=server.go:135 msg="system memory" total="127.9 GiB" free="108.3 GiB" free_swap="126.6 GiB"
time=2025-10-06T11:04:08.532-04:00 level=INFO source=server.go:187 msg=offload library=cpu layers.requested=-1 layers.model=49 layers.offload=0 layers.split="" memory.available="[108.3 GiB]" memory.gpu_overhead="0 B" memory.required.full="10.3 GiB" memory.required.partial="0 B" memory.required.kv="1.0 GiB" memory.required.allocations="[10.3 GiB]" memory.weights.total="6.8 GiB" memory.weights.repeating="6.0 GiB" memory.weights.nonrepeating="787.5 MiB" memory.graph.full="519.5 MiB" memory.graph.partial="1.3 GiB" projector.weights="795.9 MiB" projector.graph="1.0 GiB"
time=2025-10-06T11:04:08.632-04:00 level=INFO source=server.go:458 msg="starting llama server" cmd="D:\\IPEX-LLM\\ollama-lib.exe runner --ollama-engine --model D:\\Models\\blobs\\sha256-e8ad13eff07a78d89926e9e8b882317d082ef5bf9768ad7b50fcdbbcd63748de --ctx-size 4096 --batch-size 512 --n-gpu-layers 999 --threads 16 --no-mmap --parallel 2 --port 52405"
time=2025-10-06T11:04:08.636-04:00 level=INFO source=sched.go:483 msg="loaded runners" count=1
time=2025-10-06T11:04:08.636-04:00 level=INFO source=server.go:618 msg="waiting for llama runner to start responding"
time=2025-10-06T11:04:08.637-04:00 level=INFO source=server.go:652 msg="waiting for server to become available" status="llm server error"
time=2025-10-06T11:04:08.820-04:00 level=INFO source=runner.go:925 msg="starting ollama engine"
time=2025-10-06T11:04:08.822-04:00 level=INFO source=runner.go:983 msg="Server listening on 127.0.0.1:52405"
time=2025-10-06T11:04:08.889-04:00 level=INFO source=server.go:652 msg="waiting for server to become available" status="llm server loading model"
time=2025-10-06T11:04:08.917-04:00 level=INFO source=ggml.go:96 msg="" architecture=gemma3 file_type=Q4_K_M name="" description="" num_tensors=1065 num_key_values=37
time=2025-10-06T11:04:08.930-04:00 level=INFO source=ggml.go:104 msg=system CPU.0.LLAMAFILE=1 compiler=cgo(gcc)
Running with Environment Variables:
GGML_SYCL_DEBUG: 0
GGML_SYCL_DISABLE_OPT: 1
GGML_SYCL_DISABLE_GRAPH: 1
GGML_SYCL_PRIORITIZE_DMMV: 0
Build with Macros:
GGML_SYCL_FORCE_MMQ: no
GGML_SYCL_F16: no
Found 2 SYCL devices:
| | | | |Max | |Max |Global | |
| | | | |compute|Max work|sub |mem | |
|ID| Device Type| Name|Version|units |group |group|size | Driver version|
|--|-------------------|---------------------------------------|-------|-------|--------|-----|-------|---------------------|
| 0| [level_zero:gpu:0]| Intel Arc Pro B60 Graphics| 20.1| 160| 1024| 32| 25048M| 1.6.34177|
| 1| [level_zero:gpu:1]| Intel Arc Pro B60 Graphics| 20.1| 160| 1024| 32| 25048M| 1.6.34177|
SYCL Optimization Feature:
|ID| Device Type|Reorder|
|--|-------------------|-------|
| 0| [level_zero:gpu:0]| Y|
| 1| [level_zero:gpu:1]| Y|
time=2025-10-06T11:04:09.765-04:00 level=INFO source=ggml.go:390 msg="model weights" buffer=SYCL0 size="7.6 GiB"
time=2025-10-06T11:04:09.765-04:00 level=INFO source=ggml.go:390 msg="model weights" buffer=CPU size="787.5 MiB"
time=2025-10-06T11:04:09.765-04:00 level=INFO source=ggml.go:419 msg="Sycl device count" count=2
panic: runtime error: invalid memory address or nil pointer dereference
[signal 0xc0000005 code=0x1 addr=0x50 pc=0x7ff638d2f571]
goroutine 14 [running]:
github.com/ollama/ollama/ml/backend/ggml.(*Context).Reserve(0xc001e38100)
D:/ruonan/ollama-internal/ml/backend/ggml/ggml.go:698 +0x2b1
github.com/ollama/ollama/runner/ollamarunner.multimodalStore.getTensor(0xc0014e59e8?, {0x7ff639c37390, 0xc001d3c000}, {0x7ff639c3ba30, 0xc001e39180}, {0x7ff639c48b40, 0xc000008480}, 0x1)
D:/ruonan/ollama-internal/runner/ollamarunner/multimodal.go:98 +0x2a4
github.com/ollama/ollama/runner/ollamarunner.multimodalStore.getMultimodal(0xc000047cc8, {0x7ff639c37390, 0xc001d3c000}, {0x7ff639c3ba30, 0xc001e39180}, {0xc001e30020, 0x1, 0x30?}, 0x1)
D:/ruonan/ollama-internal/runner/ollamarunner/multimodal.go:56 +0xe5
github.com/ollama/ollama/runner/ollamarunner.(*Server).reserveWorstCaseGraph(0xc00033a7e0)
D:/ruonan/ollama-internal/runner/ollamarunner/runner.go:796 +0x70e
github.com/ollama/ollama/runner/ollamarunner.(*Server).initModel(0xc00033a7e0, {0xc000134000?, 0x0?}, {0x10, 0x0, 0x3e7, {0x0, 0x0, 0x0}, 0x0}, ...)
D:/ruonan/ollama-internal/runner/ollamarunner/runner.go:865 +0x270
github.com/ollama/ollama/runner/ollamarunner.(*Server).load(0xc00033a7e0, {0x7ff639c33270, 0xc00061b630}, {0xc000134000?, 0x0?}, {0x10, 0x0, 0x3e7, {0x0, 0x0, ...}, ...}, ...)
D:/ruonan/ollama-internal/runner/ollamarunner/runner.go:878 +0xb8
created by github.com/ollama/ollama/runner/ollamarunner.Execute in goroutine 1
D:/ruonan/ollama-internal/runner/ollamarunner/runner.go:959 +0xa11
time=2025-10-06T11:04:10.342-04:00 level=INFO source=server.go:652 msg="waiting for server to become available" status="llm server not responding"
time=2025-10-06T11:04:10.676-04:00 level=INFO source=server.go:652 msg="waiting for server to become available" status="llm server error"
time=2025-10-06T11:04:10.705-04:00 level=ERROR source=server.go:484 msg="llama runner terminated" error="exit status 2"
time=2025-10-06T11:04:10.926-04:00 level=ERROR source=sched.go:489 msg="error loading llama server" error="llama runner process has terminated: error:invalid memory address or nil pointer dereference\n[signal 0xc0000005 code=0x1 addr=0x50 pc=0x7ff638d2f571]\n\ngoroutine 14 [running]:"
[GIN] 2025/10/06 - 11:04:10 | 500 | 2.7093993s | 127.0.0.1 | POST "/api/chat"