GPT-OSS 120B ERROR llama.cpp  ipex-llm==2.3.0b20251104

Hello, 
latest ipex-llm ( ipex-llm==2.3.0b20251104) has issue with models: 
gpt-oss 120b, gpt-oss 20B

### ENV
```
(ipex-llm) arc@xpu:~/llm/ipex-llm/llama-cpp$ uv pip install --pre --upgrade ipex-llm[cpp]
Using Python 3.11.14 environment at: /home/arc/miniconda3/envs/ipex-llm
Resolved 45 packages in 1.12s
Prepared 9 packages in 1.84s
Uninstalled 9 packages in 244ms
Installed 9 packages in 218ms
 - bigdl-core-cpp==2.7.0b20251022
 + bigdl-core-cpp==2.7.0b20251104
 - fsspec==2025.9.0
 + fsspec==2025.10.0
 - hf-xet==1.1.11b1
 + hf-xet==1.2.0
 - huggingface-hub==0.36.0rc0
 + huggingface-hub==0.36.0
 - ipex-llm==2.3.0b20251022
 + ipex-llm==2.3.0b20251104
 - networkx==3.5
 + networkx==3.6rc0
 - psutil==7.1.1
 + psutil==7.1.3
 - regex==2025.10.23
 + regex==2025.11.3
 - safetensors==0.6.2
 + safetensors==0.7.0rc0
```

### llama-server 
**tensor 'blk.0.ffn_down_exps.weight' has invalid ggml type 39 (NONE)**
```
(ipex-llm) arc@xpu:~/llm/ipex-llm/llama-cpp$ ./llama-server -m ~/llm/models/UD-Q8_K_XL/gpt-oss-120b-UD-Q8_K_XL-00001-of-00002.gguf -ngl 99 -c 8192 --jinja
build: 1 (98abe88) with Intel(R) oneAPI DPC++/C++ Compiler 2025.0.4 (2025.0.4.20241205) for x86_64-unknown-linux-gnu
system info: n_threads = 36, n_threads_batch = 36, total_threads = 72

system_info: n_threads = 36 (n_threads_batch = 36) / 72 | CPU : SSE3 = 1 | SSSE3 = 1 | AVX = 1 | AVX2 = 1 | F16C = 1 | FMA = 1 | BMI2 = 1 | LLAMAFILE = 1 | OPENMP = 1 | REPACK = 1 | 

main: binding port with default address family
main: HTTP server is listening, hostname: 127.0.0.1, port: 8080, http threads: 71
main: loading model
srv    load_model: loading model '/home/arc/llm/models/UD-Q8_K_XL/gpt-oss-120b-UD-Q8_K_XL-00001-of-00002.gguf'
llama_model_load_from_file_impl: using device SYCL0 (Intel(R) Arc(TM) A770 Graphics) - 15473 MiB free
llama_model_load_from_file_impl: using device SYCL1 (Intel(R) Arc(TM) A770 Graphics) - 15473 MiB free
gguf_init_from_file_impl: tensor 'blk.0.ffn_down_exps.weight' has invalid ggml type 39 (NONE)
gguf_init_from_file_impl: failed to read tensor info
llama_model_load: error loading model: llama_model_loader: failed to load model from /home/arc/llm/models/UD-Q8_K_XL/gpt-oss-120b-UD-Q8_K_XL-00001-of-00002.gguf
llama_model_load_from_file_impl: failed to load model
common_init_from_params: failed to load model '/home/arc/llm/models/UD-Q8_K_XL/gpt-oss-120b-UD-Q8_K_XL-00001-of-00002.gguf'
srv    load_model: failed to load model, '/home/arc/llm/models/UD-Q8_K_XL/gpt-oss-120b-UD-Q8_K_XL-00001-of-00002.gguf'
srv    operator(): operator(): cleaning up before exit...
main: exiting due to model loading error

```




Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

GPT-OSS 120B ERROR llama.cpp ipex-llm==2.3.0b20251104 #13331

ENV

llama-server

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

GPT-OSS 120B ERROR llama.cpp ipex-llm==2.3.0b20251104 #13331

Description

ENV

llama-server

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions