这是indexloc提供的服务,不要输入任何密码
Skip to content

vllm 0.11 and A770 #13323

@savvadesogle

Description

@savvadesogle

Since Intel has so far abandoned ipex-llm and Arc cards...

vllm v0.11.1rc2.dev221+g49c00fe30 works together with A770 (4x)

Image

You can build a Docker container from the vllm repository sources (Dockerfile.xpu)
https://github.com/vllm-project/vllm/blob/main/docker/Dockerfile.xpu

docker build -f docker/Dockerfile.xpu -t vllm-xpu-0110 --shm-size=32g .

But I do not know how to properly configure it for the 4x A770, and I am sure that the performance could be higher
2 req/s -> 10+ req/s.

Image

Llama3.1 8b Instruct FP8
Sometimes the request processing speed reaches 12 requests/s, but there are problems with the process "hanging up" and then speeding up. I haven't figured out the reason yet.
1024 in, 512 out for configuration

--max-model-len "2000" 
--max-num-batched-tokens "3000"

test

vllm bench serve \
    --model /llm/models/LLM-Research/Meta-Llama-3.1-8B-Instruct \
    --served-model-name Meta-Llama-3.1-8B-Instruct \
    --dataset-name random \
    --random-input-len 1024 \
    --random-output-len 512 \
    --ignore-eos \
    --num-prompt 1500 \
    --trust-remote-code \
    --request-rate inf \
    --backend vllm \
    --port 8000

Ubuntu 25.10, 6,17.3 kernel
my numbers for 4x A770, 2x Xeon 2699 V3 is:

115 requests

Image

1500 requests

Image

Metadata

Metadata

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions