Update distill.py to include device agnostic code for `distill_mlp` head and `distillation_token` #324

vivekh2000 · 2024-07-25T16:34:34Z

Since in your code, the distillation_token and distill_mlp heads are defined in the DistillWrapper class, sending the model instance of the DistillableViT class to GPU does not send the distillation_token and distill_mlp head to GPU. Therefore, while training a model using this code, I got a device mismatch error, which made it hard to figure out the source of the error. Finally, the distillation_token and distill_mlp turned out to be the culprits as they are not defined in the model class but in the DistillWrapper class, which is a wrapper of loss function. Therefore, I have suggested the following changes when training a model on GPU: the training code should set the device="cude" if torch.cuda.is_available() else "cpu", or the same can be incorporated into the constructor of the DistillWrapper class.

…ead and `distillation_token` Since in your code, `distillation_token` and `distill_mlp` head are defined in the DistillWrapper class, sending the model instance of the DistillableViT class to GPU. do not send them to GPU. While training a model using this code, I got a device mismatch error, which made it hard to figure out the source of the error. Finally, the `distillation_token` and `distill_mlp` turned out to be the culprits as they are not defined in the model class but in the DistillWrapper class. Therefore, I have suggested the following changes, when training a model on GPU, the training code should set the device="cude" if torch.cuda.is_available() else "cpu". or the same can be incorporated in the constructor of the DistillWrapper class.

lucidrains force-pushed the main branch 3 times, most recently from 19eb6d4 to 5e808f4 Compare August 21, 2024 14:23

lucidrains force-pushed the main branch from 43cbcad to f50d7d1 Compare October 9, 2024 14:32

lucidrains force-pushed the main branch from 1de866d to db05a14 Compare March 5, 2025 18:50

lucidrains force-pushed the main branch from 0b273a2 to 3becf08 Compare September 25, 2025 13:21

lucidrains force-pushed the main branch 5 times, most recently from cbf6723 to 5cf8384 Compare October 28, 2025 19:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Update distill.py to include device agnostic code for `distill_mlp` head and `distillation_token` #324

Update distill.py to include device agnostic code for `distill_mlp` head and `distillation_token` #324

Uh oh!

vivekh2000 commented Jul 25, 2024 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Update distill.py to include device agnostic code for distill_mlp head and distillation_token #324

Are you sure you want to change the base?

Update distill.py to include device agnostic code for distill_mlp head and distillation_token #324

Uh oh!

Conversation

vivekh2000 commented Jul 25, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Update distill.py to include device agnostic code for `distill_mlp` head and `distillation_token` #324

Update distill.py to include device agnostic code for `distill_mlp` head and `distillation_token` #324

vivekh2000 commented Jul 25, 2024 •

edited

Loading