+
Skip to content

Using multilple GPUs to accomplish distributed training #30

@EddieEduardo

Description

@EddieEduardo

Hello! Thanks for sharing the excellent work !!!

When I run the codes using multiple GPUs with nn.parallel.DistributedDataParallel, it'll always raise an error as follows :
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [2]] is at version 3; expected version 2 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).

However, when I run using a single GPU, no errors raise, I am confused...

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      点击 这是indexloc提供的php浏览器服务,不要输入任何密码和下载