这是indexloc提供的服务,不要输入任何密码
Skip to content

[Issue]: Multi-GPU Training with HuggingFace Trainer in NVFlare #3514

@farhatkevin

Description

@farhatkevin

What I’m Trying to Do

I'm adapting the /examples/advanced/llm_hf SFT example in NVFlare to continue pretraining or fine-tuning a model using HuggingFace Trainer, with support for DDP or FSDP across multiple GPUs per client.

What I’ve Tried

  • Set gpu=[0,1,...,n] in the client config
  • Wrapped the model with nn.DataParallel and attempted DistributedDataParallel
  • Enabled launch_external_process=True in ScriptRunner
  • Verified that a similar training script works with multi-GPU when run outside of NVFlare

Issue

The main issue I'm seeing is that the workload does not get distributed to more than 1 gpu.

Questions

  1. What’s the proper way to enable multi-GPU training in NVFlare with HuggingFace Trainer using accelerate or native PyTorch? Are there any examples of multi-GPU setups that don’t use PyTorch Lightning?

  2. Should I be using PTMultiProcessExecutor instead of ScriptRunner?
    Are there any sample configs or documentation for this, specifically with HuggingFace Trainer?

  3. Does ScriptRunner with launch_external_process=true support DDP workloads?
    Or is it limited to single-process training?

  4. Can the NVFlare simulator scale to 8 clients × 8 GPUs (64 GPUs total)?
    I’d like to scale up to 16 GPUs per client. What's the difference between:

    • Simulator
    • POC
    • Actual NVFlare production platform
      both in general and in terms of distributed compute support?

Any guidance or references would be really appreciated. Thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions