这是indexloc提供的服务,不要输入任何密码
Skip to content

[Badcase]: Performance drop using rope scaling with Qwen3-8b in vllm #1567

@mungg

Description

@mungg

Model Series

Qwen3

What are the models used?

Qwen3-8b

What is the scenario where the problem happened?

applying rope scaling at config.json file occurs performance degradation in vllm

Is this badcase known and can it be solved using avaiable techniques?

  • I have followed the GitHub README.
  • I have checked the Qwen documentation and cannot find a solution there.
  • I have checked the documentation of the related framework and cannot find useful information.
  • I have searched the issues and there is not a similar one.

Information about environment

vllm 0.9.2rc2.dev39+gc18b3b8e
transformers 4.52.4

Description

Hi Qwen team,

I observed a performance difference on the Arena-Hard-v2.0 benchmark when using Qwen3-8B on vLLM with and without RoPE scaling. Specifically, the model performs worse when RoPE scaling is applied. (39->26.5)
I added the following for RoPE scaling as per the instructions:

"rope_scaling": {
    "rope_type": "yarn",
    "factor": 4.0,
    "original_max_position_embeddings": 32768
}

According to the logs, vLLM seems to correctly apply the scaling different from similar issue #1424 (comment):

INFO 07-24 03:32:48 [config.py:1472] Using max model len 131072

performance drop is unexpected because the benchmark uses only short-context prompts, so RoPE scaling shouldn't have any impact.

Is there something I might be overlooking? I'd really appreciate any guidance or help on this!

Best,

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions