这是indexloc提供的服务,不要输入任何密码
Skip to content

Ollama crashes with nil pointer dereference when loading vision models on Intel Arc B60s #13318

@S1ntaxErr0r

Description

@S1ntaxErr0r

OS: Windows Server 2025
GPU: Intel Arc Pro B60 Graphics (2x GPUs)
Version: ollama-ipex-llm-2.3.0b20250725

Ollama crashes with a panic (nil pointer dereference) when attempting to load vision/multimodal models. Every vision model (gemma3, qwen-2.5vl, llava, llama32.-vision, minicpm) I have tried so far crashes.

I am able to run non-vision models like deepseek-r1:70b just fine.

Vision models work as intended if I restrict Ollama to a single card.

Ollama crashed as follows:

panic: runtime error: invalid memory address or nil pointer dereference
[signal 0xc0000005 code=0x1 addr=0x50 pc=0x7ff638d2f571]

goroutine 14 [running]:
github.com/ollama/ollama/ml/backend/ggml.(*Context).Reserve(0xc001e38100)
	D:/ruonan/ollama-internal/ml/backend/ggml/ggml.go:698 +0x2b1
github.com/ollama/ollama/runner/ollamarunner.multimodalStore.getTensor(...)
	D:/ruonan/ollama-internal/runner/ollamarunner/multimodal.go:98 +0x2a4
github.com/ollama/ollama/runner/ollamarunner.multimodalStore.getMultimodal(...)
	D:/ruonan/ollama-internal/runner/ollamarunner/multimodal.go:56 +0xe5
github.com/ollama/ollama/runner/ollamarunner.(*Server).reserveWorstCaseGraph(0xc00033a7e0)
	D:/ruonan/ollama-internal/runner/ollamarunner/runner.go:796 +0x70e

Full Log

time=2025-10-06T11:02:57.378-04:00 level=INFO source=routes.go:1235 msg="server config" env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY:localhost,127.0.0.1 OLLAMA_CONTEXT_LENGTH:4096 OLLAMA_DEBUG:INFO OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://0.0.0.0:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:10m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:D:\\Models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:2 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_SCHED_SPREAD:false ROCR_VISIBLE_DEVICES:]"
time=2025-10-06T11:02:57.384-04:00 level=INFO source=images.go:476 msg="total blobs: 58"
time=2025-10-06T11:02:57.385-04:00 level=INFO source=images.go:483 msg="total unused blobs removed: 0"
[GIN-debug] [WARNING] Creating an Engine instance with the Logger and Recovery middleware already attached.

[GIN-debug] [WARNING] Running in "debug" mode. Switch to "release" mode in production.
 - using env:	export GIN_MODE=release
 - using code:	gin.SetMode(gin.ReleaseMode)

[GIN-debug] HEAD   /                         --> github.com/ollama/ollama/server.(*Server).GenerateRoutes.func1 (5 handlers)
[GIN-debug] GET    /                         --> github.com/ollama/ollama/server.(*Server).GenerateRoutes.func2 (5 handlers)
[GIN-debug] HEAD   /api/version              --> github.com/ollama/ollama/server.(*Server).GenerateRoutes.func3 (5 handlers)
[GIN-debug] GET    /api/version              --> github.com/ollama/ollama/server.(*Server).GenerateRoutes.func4 (5 handlers)
[GIN-debug] POST   /api/pull                 --> github.com/ollama/ollama/server.(*Server).PullHandler-fm (5 handlers)
[GIN-debug] POST   /api/push                 --> github.com/ollama/ollama/server.(*Server).PushHandler-fm (5 handlers)
[GIN-debug] HEAD   /api/tags                 --> github.com/ollama/ollama/server.(*Server).ListHandler-fm (5 handlers)
[GIN-debug] GET    /api/tags                 --> github.com/ollama/ollama/server.(*Server).ListHandler-fm (5 handlers)
[GIN-debug] POST   /api/show                 --> github.com/ollama/ollama/server.(*Server).ShowHandler-fm (5 handlers)
[GIN-debug] DELETE /api/delete               --> github.com/ollama/ollama/server.(*Server).DeleteHandler-fm (5 handlers)
[GIN-debug] POST   /api/create               --> github.com/ollama/ollama/server.(*Server).CreateHandler-fm (5 handlers)
[GIN-debug] POST   /api/blobs/:digest        --> github.com/ollama/ollama/server.(*Server).CreateBlobHandler-fm (5 handlers)
[GIN-debug] HEAD   /api/blobs/:digest        --> github.com/ollama/ollama/server.(*Server).HeadBlobHandler-fm (5 handlers)
[GIN-debug] POST   /api/copy                 --> github.com/ollama/ollama/server.(*Server).CopyHandler-fm (5 handlers)
[GIN-debug] GET    /api/ps                   --> github.com/ollama/ollama/server.(*Server).PsHandler-fm (5 handlers)
[GIN-debug] POST   /api/generate             --> github.com/ollama/ollama/server.(*Server).GenerateHandler-fm (5 handlers)
[GIN-debug] POST   /api/chat                 --> github.com/ollama/ollama/server.(*Server).ChatHandler-fm (5 handlers)
[GIN-debug] POST   /api/embed                --> github.com/ollama/ollama/server.(*Server).EmbedHandler-fm (5 handlers)
[GIN-debug] POST   /api/embeddings           --> github.com/ollama/ollama/server.(*Server).EmbeddingsHandler-fm (5 handlers)
[GIN-debug] POST   /v1/chat/completions      --> github.com/ollama/ollama/server.(*Server).ChatHandler-fm (6 handlers)
[GIN-debug] POST   /v1/completions           --> github.com/ollama/ollama/server.(*Server).GenerateHandler-fm (6 handlers)
[GIN-debug] POST   /v1/embeddings            --> github.com/ollama/ollama/server.(*Server).EmbedHandler-fm (6 handlers)
[GIN-debug] GET    /v1/models                --> github.com/ollama/ollama/server.(*Server).ListHandler-fm (6 handlers)
[GIN-debug] GET    /v1/models/:model         --> github.com/ollama/ollama/server.(*Server).ShowHandler-fm (6 handlers)
time=2025-10-06T11:02:57.386-04:00 level=INFO source=routes.go:1288 msg="Listening on [::]:11434 (version 0.9.3)"
time=2025-10-06T11:02:57.386-04:00 level=INFO source=gpu.go:217 msg="looking for compatible GPUs"
time=2025-10-06T11:02:57.386-04:00 level=INFO source=gpu.go:218 msg="using Intel GPU"
time=2025-10-06T11:02:57.386-04:00 level=INFO source=gpu_windows.go:167 msg=packages count=1
time=2025-10-06T11:02:57.386-04:00 level=INFO source=gpu_windows.go:214 msg="" package=0 cores=16 efficiency=0 threads=32
time=2025-10-06T11:02:57.392-04:00 level=INFO source=types.go:130 msg="inference compute" id=0 library=cpu variant="" compute="" driver=0.0 name="" total="127.9 GiB" available="108.6 GiB"
[GIN] 2025/10/06 - 11:03:49 | 200 |      7.3018ms |       127.0.0.1 | GET      "/api/tags"
[GIN] 2025/10/06 - 11:03:49 | 200 |            0s |       127.0.0.1 | GET      "/api/ps"
[GIN] 2025/10/06 - 11:03:50 | 200 |            0s |       127.0.0.1 | GET      "/api/version"
time=2025-10-06T11:04:08.530-04:00 level=INFO source=server.go:135 msg="system memory" total="127.9 GiB" free="108.3 GiB" free_swap="126.6 GiB"
time=2025-10-06T11:04:08.532-04:00 level=INFO source=server.go:187 msg=offload library=cpu layers.requested=-1 layers.model=49 layers.offload=0 layers.split="" memory.available="[108.3 GiB]" memory.gpu_overhead="0 B" memory.required.full="10.3 GiB" memory.required.partial="0 B" memory.required.kv="1.0 GiB" memory.required.allocations="[10.3 GiB]" memory.weights.total="6.8 GiB" memory.weights.repeating="6.0 GiB" memory.weights.nonrepeating="787.5 MiB" memory.graph.full="519.5 MiB" memory.graph.partial="1.3 GiB" projector.weights="795.9 MiB" projector.graph="1.0 GiB"
time=2025-10-06T11:04:08.632-04:00 level=INFO source=server.go:458 msg="starting llama server" cmd="D:\\IPEX-LLM\\ollama-lib.exe runner --ollama-engine --model D:\\Models\\blobs\\sha256-e8ad13eff07a78d89926e9e8b882317d082ef5bf9768ad7b50fcdbbcd63748de --ctx-size 4096 --batch-size 512 --n-gpu-layers 999 --threads 16 --no-mmap --parallel 2 --port 52405"
time=2025-10-06T11:04:08.636-04:00 level=INFO source=sched.go:483 msg="loaded runners" count=1
time=2025-10-06T11:04:08.636-04:00 level=INFO source=server.go:618 msg="waiting for llama runner to start responding"
time=2025-10-06T11:04:08.637-04:00 level=INFO source=server.go:652 msg="waiting for server to become available" status="llm server error"
time=2025-10-06T11:04:08.820-04:00 level=INFO source=runner.go:925 msg="starting ollama engine"
time=2025-10-06T11:04:08.822-04:00 level=INFO source=runner.go:983 msg="Server listening on 127.0.0.1:52405"
time=2025-10-06T11:04:08.889-04:00 level=INFO source=server.go:652 msg="waiting for server to become available" status="llm server loading model"
time=2025-10-06T11:04:08.917-04:00 level=INFO source=ggml.go:96 msg="" architecture=gemma3 file_type=Q4_K_M name="" description="" num_tensors=1065 num_key_values=37
time=2025-10-06T11:04:08.930-04:00 level=INFO source=ggml.go:104 msg=system CPU.0.LLAMAFILE=1 compiler=cgo(gcc)
Running with Environment Variables:
  GGML_SYCL_DEBUG: 0
  GGML_SYCL_DISABLE_OPT: 1
  GGML_SYCL_DISABLE_GRAPH: 1
  GGML_SYCL_PRIORITIZE_DMMV: 0
Build with Macros:
  GGML_SYCL_FORCE_MMQ: no
  GGML_SYCL_F16: no
Found 2 SYCL devices:
|  |                   |                                       |       |Max    |        |Max  |Global |                     |
|  |                   |                                       |       |compute|Max work|sub  |mem    |                     |
|ID|        Device Type|                                   Name|Version|units  |group   |group|size   |       Driver version|
|--|-------------------|---------------------------------------|-------|-------|--------|-----|-------|---------------------|
| 0| [level_zero:gpu:0]|             Intel Arc Pro B60 Graphics|   20.1|    160|    1024|   32| 25048M|            1.6.34177|
| 1| [level_zero:gpu:1]|             Intel Arc Pro B60 Graphics|   20.1|    160|    1024|   32| 25048M|            1.6.34177|
SYCL Optimization Feature:
|ID|        Device Type|Reorder|
|--|-------------------|-------|
| 0| [level_zero:gpu:0]|      Y|
| 1| [level_zero:gpu:1]|      Y|
time=2025-10-06T11:04:09.765-04:00 level=INFO source=ggml.go:390 msg="model weights" buffer=SYCL0 size="7.6 GiB"
time=2025-10-06T11:04:09.765-04:00 level=INFO source=ggml.go:390 msg="model weights" buffer=CPU size="787.5 MiB"
time=2025-10-06T11:04:09.765-04:00 level=INFO source=ggml.go:419 msg="Sycl device count" count=2
panic: runtime error: invalid memory address or nil pointer dereference
[signal 0xc0000005 code=0x1 addr=0x50 pc=0x7ff638d2f571]

goroutine 14 [running]:
github.com/ollama/ollama/ml/backend/ggml.(*Context).Reserve(0xc001e38100)
	D:/ruonan/ollama-internal/ml/backend/ggml/ggml.go:698 +0x2b1
github.com/ollama/ollama/runner/ollamarunner.multimodalStore.getTensor(0xc0014e59e8?, {0x7ff639c37390, 0xc001d3c000}, {0x7ff639c3ba30, 0xc001e39180}, {0x7ff639c48b40, 0xc000008480}, 0x1)
	D:/ruonan/ollama-internal/runner/ollamarunner/multimodal.go:98 +0x2a4
github.com/ollama/ollama/runner/ollamarunner.multimodalStore.getMultimodal(0xc000047cc8, {0x7ff639c37390, 0xc001d3c000}, {0x7ff639c3ba30, 0xc001e39180}, {0xc001e30020, 0x1, 0x30?}, 0x1)
	D:/ruonan/ollama-internal/runner/ollamarunner/multimodal.go:56 +0xe5
github.com/ollama/ollama/runner/ollamarunner.(*Server).reserveWorstCaseGraph(0xc00033a7e0)
	D:/ruonan/ollama-internal/runner/ollamarunner/runner.go:796 +0x70e
github.com/ollama/ollama/runner/ollamarunner.(*Server).initModel(0xc00033a7e0, {0xc000134000?, 0x0?}, {0x10, 0x0, 0x3e7, {0x0, 0x0, 0x0}, 0x0}, ...)
	D:/ruonan/ollama-internal/runner/ollamarunner/runner.go:865 +0x270
github.com/ollama/ollama/runner/ollamarunner.(*Server).load(0xc00033a7e0, {0x7ff639c33270, 0xc00061b630}, {0xc000134000?, 0x0?}, {0x10, 0x0, 0x3e7, {0x0, 0x0, ...}, ...}, ...)
	D:/ruonan/ollama-internal/runner/ollamarunner/runner.go:878 +0xb8
created by github.com/ollama/ollama/runner/ollamarunner.Execute in goroutine 1
	D:/ruonan/ollama-internal/runner/ollamarunner/runner.go:959 +0xa11
time=2025-10-06T11:04:10.342-04:00 level=INFO source=server.go:652 msg="waiting for server to become available" status="llm server not responding"
time=2025-10-06T11:04:10.676-04:00 level=INFO source=server.go:652 msg="waiting for server to become available" status="llm server error"
time=2025-10-06T11:04:10.705-04:00 level=ERROR source=server.go:484 msg="llama runner terminated" error="exit status 2"
time=2025-10-06T11:04:10.926-04:00 level=ERROR source=sched.go:489 msg="error loading llama server" error="llama runner process has terminated: error:invalid memory address or nil pointer dereference\n[signal 0xc0000005 code=0x1 addr=0x50 pc=0x7ff638d2f571]\n\ngoroutine 14 [running]:"
[GIN] 2025/10/06 - 11:04:10 | 500 |    2.7093993s |       127.0.0.1 | POST     "/api/chat"

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions