这是indexloc提供的服务,不要输入任何密码
Skip to content

Performance of Baseline Qwen 2.5 VL-7B on V* Benchmark #91

@zwyang6

Description

@zwyang6

Hi, thanks for your brilliant open-sourced project !

When I tried to evaluate Qwen 2.5 VL-7B on V* Benchmark, I foudn my result is quite lower than yours reported in Table 1.

  • Mine are
    "direct_attributes": 62.60869565217392,
    "relative_position": 64.47368421052632,
    "overall": 63.35078534031413,

  • While yours are: 73.9 67.1 and 71.2 for overall.

Do you know any reasons that could lead to these discrepancies?

Here are my evaluate scripts:

evaluate models

  1. deploy qwen

CUDA_VISIBLE_DEVICES=0,1,2,3 vllm serve Qwen/Qwen2.5-VL-7B-Instruct \ --port 18901 \ --gpu-memory-utilization 0.8 \ --max-model-len 32768 \ --tensor-parallel-size 4 --served-model-name "baseline_qwen" --trust-remote-code --disable-log-requests

  1. evaluate qwen
    MODEL_NAME=Qwen2.5-VL-7B-Instruct API_KEY="EMPTY" API_URL="http://xxxxx:18901/v1" PATH_TO_SAVE_DIR="/codebase/DeepEyes/eval_results" MODEL_NAME_VLLM=baseline_qwen PATH_TO_VSTAR="/data_public/Vstar" CUDA_VISIBLE_DEVICES=4,5,6,7 python /codebase/DeepEyes/eval/eval_vstar.py \ --model_name $MODEL_NAME \ --api_key $API_KEY \ --api_url $API_URL\ --vstar_bench_path $PATH_TO_VSTAR \ --save_path $PATH_TO_SAVE_DIR \ --eval_model_name $MODEL_NAME_VLLM \ --num_workers 4

calculate scores

  1. deploy Qwen-72B as Judge
    CUDA_VISIBLE_DEVICES=0,1,2,3 vllm serve /codebase/DeepEyes/pretrained_models/Qwen2.5-72B-Instruct\ --port 18901 \ --gpu-memory-utilization 0.8 \ --max-model-len 32768 \ --tensor-parallel-size 4 \ --served-model-name "judge" \ --trust-remote-code \ --disable-log-requests
  2. calculate
    MODEL_NAME=Qwen2.5-VL-7B-Instruct API_KEY="EMPTY" API_URL="http://xxxxx:18901/v1" PATH_TO_SAVE_DIR="/codebase/DeepEyes/eval_results" MODEL_NAME_VLLM=judge PATH_TO_VSTAR="/data_public/Vstar" CUDA_VISIBLE_DEVICES=4,5,6,7 python judge_result.py \ --model_name $MODEL_NAME \ --api_key $API_KEY \ --api_url $API_URL\ --vstar_bench_path $PATH_TO_VSTAR \ --save_path $PATH_TO_SAVE_DIR \ --eval_model_name $MODEL_NAME_VLLM \ --num_workers 4

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions