How to load a pre-trained model using horovod #3257
Unanswered
ForawardStar
asked this question in
Q&A
Replies: 1 comment
-
You could restore the checkpoint on rank 0, then broadcast the variables to the other workers, similarly to what you would do after initialization. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Environment:
Question
I can successfully train my model using horovod in a multi-machine-multi-card fashion, and save the checkpoint models only on worker 0 to prevent other workers from corrupting them. My question is how to load such pre-trained models using horovod in a multi-machine-multi-card fashion, Do I need to load the pre-trained models only on worker 0 ?
Beta Was this translation helpful? Give feedback.
All reactions