这是indexloc提供的服务,不要输入任何密码
Skip to content

Conversation

@yao-matrix
Copy link
Contributor

@BenjaminBossan, pls help review, thx very much.

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>
Copy link
Member

@BenjaminBossan BenjaminBossan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for updating the text generation benchmark, this was definitely on our list to do soon. Unfortunately, I ran into an issue when trying it out. Could you please check?

Also, once you're finished with your changes, please run ruff on method_comparison/text_generation_benchmark/ (this directory is not automatically checked).

accelerator_reserved_mb = 0.0

return ram_usage_mb, gpu_allocated_mb, gpu_reserved_mb
return ram_usage_mb, accelerator_allocated_mb, accelerator_reserved_mb
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For some reason, I'm getting incorrect results. When I run python run_base.py -v --force on this branch on a 4090, I get 0 MB memory usage in the report. To debug this, you can set a breakpoint at this line and run the aforementioned command. The first time we get here, accelerator_allocated_mb and accelerator_reserved_mb are 0, which is expected. After this, the model is loaded to the accelerator. Thus, when we get here the second time, these values should be > 0, but I still get 0 here. When running the same on the main branch, I get correct reports.

I could not determine where this difference comes from, nvidia-smi shows the same output for main branch and this branch, torch.cuda.is_available() always returns True, and the model_kwargs are also identical. Can you replicate this and do you have an idea what the issue could be?

Copy link
Contributor Author

@yao-matrix yao-matrix Aug 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@BenjaminBossan oh, my fault, the chain should be if-elif-else, but i break to 2(if and if-else), so CUDA GPU goes into first if and second else(which gives it 0), sorry and thx for checking

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes, makes sense.

yao-matrix and others added 3 commits August 8, 2025 17:23
Signed-off-by: Yao, Matrix <matrix.yao@intel.com>
Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>
Signed-off-by: Yao, Matrix <matrix.yao@intel.com>
@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Member

@BenjaminBossan BenjaminBossan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing the issue with CUDA. I think the script still has a small error for XPU, which I flagged, please check.

accelerator_reserved_mb = 0.0

return ram_usage_mb, gpu_allocated_mb, gpu_reserved_mb
return ram_usage_mb, accelerator_allocated_mb, accelerator_reserved_mb
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes, makes sense.

return gpu_allocated, gpu_reserved
_, accelerator_allocated, accelerator_reserved = get_memory_usage()
return accelerator_allocated, accelerator_reserved
elif torch.xpu.is_available():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here, elif is not necessary because the previous branch returns, but it also doesn't hurt.

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>
Signed-off-by: Yao, Matrix <matrix.yao@intel.com>
Copy link
Member

@BenjaminBossan BenjaminBossan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for making the text generation benchmark XPU compatible.

@BenjaminBossan BenjaminBossan merged commit 95df499 into huggingface:main Aug 12, 2025
2 of 14 checks passed
@yao-matrix yao-matrix deleted the tgb-xpu branch August 12, 2025 20:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants