-
Notifications
You must be signed in to change notification settings - Fork 29
Closed
Description
Hi, thanks for your brilliant open-sourced project !
When I tried to evaluate Qwen 2.5 VL-7B on V* Benchmark, I foudn my result is quite lower than yours reported in Table 1.
-
Mine are
"direct_attributes": 62.60869565217392,
"relative_position": 64.47368421052632,
"overall": 63.35078534031413, -
While yours are: 73.9 67.1 and 71.2 for overall.
Do you know any reasons that could lead to these discrepancies?
Here are my evaluate scripts:
evaluate models
- deploy qwen
CUDA_VISIBLE_DEVICES=0,1,2,3 vllm serve Qwen/Qwen2.5-VL-7B-Instruct \ --port 18901 \ --gpu-memory-utilization 0.8 \ --max-model-len 32768 \ --tensor-parallel-size 4 --served-model-name "baseline_qwen" --trust-remote-code --disable-log-requests
- evaluate qwen
MODEL_NAME=Qwen2.5-VL-7B-Instruct API_KEY="EMPTY" API_URL="http://xxxxx:18901/v1" PATH_TO_SAVE_DIR="/codebase/DeepEyes/eval_results" MODEL_NAME_VLLM=baseline_qwen PATH_TO_VSTAR="/data_public/Vstar" CUDA_VISIBLE_DEVICES=4,5,6,7 python /codebase/DeepEyes/eval/eval_vstar.py \ --model_name $MODEL_NAME \ --api_key $API_KEY \ --api_url $API_URL\ --vstar_bench_path $PATH_TO_VSTAR \ --save_path $PATH_TO_SAVE_DIR \ --eval_model_name $MODEL_NAME_VLLM \ --num_workers 4
calculate scores
- deploy Qwen-72B as Judge
CUDA_VISIBLE_DEVICES=0,1,2,3 vllm serve /codebase/DeepEyes/pretrained_models/Qwen2.5-72B-Instruct\ --port 18901 \ --gpu-memory-utilization 0.8 \ --max-model-len 32768 \ --tensor-parallel-size 4 \ --served-model-name "judge" \ --trust-remote-code \ --disable-log-requests
- calculate
MODEL_NAME=Qwen2.5-VL-7B-Instruct API_KEY="EMPTY" API_URL="http://xxxxx:18901/v1" PATH_TO_SAVE_DIR="/codebase/DeepEyes/eval_results" MODEL_NAME_VLLM=judge PATH_TO_VSTAR="/data_public/Vstar" CUDA_VISIBLE_DEVICES=4,5,6,7 python judge_result.py \ --model_name $MODEL_NAME \ --api_key $API_KEY \ --api_url $API_URL\ --vstar_bench_path $PATH_TO_VSTAR \ --save_path $PATH_TO_SAVE_DIR \ --eval_model_name $MODEL_NAME_VLLM \ --num_workers 4
ycchen-tw and zwyang6
Metadata
Metadata
Assignees
Labels
No labels