这是indexloc提供的服务,不要输入任何密码
Skip to content

Conversation

@BenjaminBossan
Copy link
Member

The shared buffers vera_A and vera_B could be on the wrong device when using multiple GPUs, resulting in an error. This PR moves the them to the correct device to fix the error.

Example of a failing run: https://github.com/huggingface/peft/actions/runs/11396317278/job/31709933958

Since these buffers are shared, I chose not to move the whole buffer to the device. Instead, when we create the slices from those buffers during forward, I move the devices only there. This could be inefficient in terms of runtime, but IIUC, the alternative would be to create new copies of these buffers per device, using more memory.

The failing tests were introduced in #2076 but the error was already there beforehand.

I did not discover these failing tests earlier because we had a concurrent error caused by a transformers issue which looked very similar and I wrongly assumed that the VeRA error was caused by the same issue. But now that the issue has been fixed, the error still persists, prompting me to investigate.

The shared buffers vera_A and vera_B could be on the wrong device when
using multiple GPUs, resulting in an error. This PR moves the them to
the correct device to fix the error.

Since these buffers are shared, I chose *not* to move the whole buffer
to the device. Instead, when we create the slices from those buffers
during forward, I move the devices only there. This could be inefficient
in terms of runtime, but IIUC, the alternative would be to create new
copies of these buffers per device, using more memory.

The failing tests were introduced in huggingface#2076 but the error was already
there beforehand.

I did not discover these failing tests earlier because we had a
concurrent error caused by a transformers issue which looked very
similar and I wrongly assumed that the VeRA error was caused by the same
issue. But now that the issue has been fixed, the error still persists,
prompting me to investigate.
@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@BenjaminBossan
Copy link
Member Author

@dkopi @vvvm23 It would be great if you could double check if this is the best solution or if there is a better way.

Copy link
Member

@SunMarc SunMarc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM ! Thanks for fixing !

@BenjaminBossan BenjaminBossan merged commit 28a5ba1 into huggingface:main Oct 25, 2024
14 checks passed
@BenjaminBossan BenjaminBossan deleted the fix-vera-multi-gpu-device-error branch October 25, 2024 13:08
@dkopi
Copy link
Contributor

dkopi commented Oct 25, 2024

Sorry for the (too) late reply, I was really busy when received the notification, and forgot about the PR later.
If it's about the model parallelism (weights of first N layers on GPU 1, next layers on GPU 2 etc) then probably a slightly better solution would be to move whole buffers between GPUs, as with the most optimal split there would be only as many transfers as there are GPUs minus one. Just not sure how transformers library splits the weights by default.
And the difference in runtime should be negligible imo, so the current solution is also good.

In case of other parallelisms, I'm not sure what would be the best solution, but the current one should be ok too.

@BenjaminBossan
Copy link
Member Author

Thanks for your comment @dkopi. I agree that moving the whole buffer once should in theory be faster. However, I was unsure how robust that solution would be across different parallelization strategies, so I went with this solution which should be more robust.

Guy-Bilitski pushed a commit to Guy-Bilitski/peft that referenced this pull request May 13, 2025
The shared buffers vera_A and vera_B could be on the wrong device when
using multiple GPUs, resulting in an error. This PR moves the them to
the correct device to fix the error.

Since these buffers are shared, I chose *not* to move the whole buffer
to the device. Instead, when we create the slices from those buffers
during forward, I move the devices only there. This could be inefficient
in terms of runtime, but IIUC, the alternative would be to create new
copies of these buffers per device, using more memory.

The failing tests were introduced in huggingface#2076 but the error was already
there beforehand.

I did not discover these failing tests earlier because we had a
concurrent error caused by a transformers issue which looked very
similar and I wrongly assumed that the VeRA error was caused by the same
issue. But now that the issue has been fixed, the error still persists,
prompting me to investigate.
cyyever pushed a commit to cyyever/peft that referenced this pull request Sep 4, 2025
cyyever pushed a commit to cyyever/peft that referenced this pull request Sep 4, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants