Using multilple GPUs to accomplish distributed training

Hello! Thanks for sharing the excellent work !!!

When I run the codes using multiple GPUs with **nn.parallel.DistributedDataParallel**, it'll always raise an error as follows : 
_RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [2]] is at version 3; expected version 2 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True)._ 

However, when I run using a single GPU, no errors raise, I am confused...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Using multilple GPUs to accomplish distributed training #30

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Using multilple GPUs to accomplish distributed training #30

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions