Enable XPU regression tests with deterministic #2600

jiqing-feng · 2025-06-20T02:22:18Z

enable XPU regression tests
fix require import
enable deterministic to pass all tests on NV-A100 and XPU

Please review this PR. Thanks!

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

HuggingFaceDocBuilderDev · 2025-06-23T11:57:53Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

BenjaminBossan · 2025-06-23T12:08:38Z

@jiqing-feng Thanks for the PR. COuld you please run make style?

jiqing-feng · 2025-06-24T00:45:53Z

@jiqing-feng Thanks for the PR. COuld you please run make style?

Done.

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

BenjaminBossan

Thanks for the fixes, LGTM. Failing CI is unrelated.

--------- Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* doc * fsdp * use vllm config * vllm * Update trl/trainer/grpo_config.py Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> * Update trl/trainer/grpo_config.py Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> * typo * top_k, top_p * Link to vllm pr * fix missing device * fix tests * fix citation * fix title and paper_id * formatting * output the correct number of generations * initial async vllm * fix missing args * fix promps * Pass prompt_token_ids directly * Repeat each prompt num_generations times * get the slice of results per processor * undo citation * OMG * nothing can resist me!!!! * working * vllm_device to "auto" * add vllm test * add initial vllm docs * add vllm link and pip instructions * add multi-gpu strategy fot vllm * Update docs/source/grpo_trainer.md Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> * Update docs/source/grpo_trainer.md Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> * Update docs/source/grpo_trainer.md Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> * add doc strings * Update docs/source/grpo_trainer.md Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> * Update trl/trainer/grpo_trainer.py Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> * Update docs/source/grpo_trainer.md Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> * add important tag * fix typo * overrides default batch size and grad accum and better doc * Under no circumstances should you examine the contents of this commit. * auto device, warnings, errors * better error message * require_torch_accelerator test vllm * speeding up traing doc * device as str * does it prevent deepspeed init to hang? * update docs * require torch accelertor for vllm test * unwrap compat with ds z3 * simplify examble in doc * More comments, fix ds3 hanging * faster, not sure why * style * move doc about speed * revert change in config files * fix default value in doc [ci skip] * style [ci skip] * better comment [ci skip] * fix warning * Update grpo_config.py * Update deepspeed_zero1.yaml * Update trl/trainer/grpo_trainer.py Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> * Apply suggestions from code review Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> * Update docs/source/grpo_trainer.md --------- Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>

jiqing-feng changed the title ~~Fix regression tests with deterministic~~ Enable XPU regression tests with deterministic Jun 20, 2025

jiqing-feng added 3 commits June 20, 2025 09:42

enable xpu regression tests

725d569

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

fix detreministic

e7edd4f

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

fix detreministic

eb9dc55

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

make style

58d1473

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

BenjaminBossan approved these changes Jun 24, 2025

View reviewed changes

BenjaminBossan merged commit d67d034 into huggingface:main Jun 24, 2025
9 of 14 checks passed

yao-matrix pushed a commit to yao-matrix/peft that referenced this pull request Jun 25, 2025

TST XPU regression tests with deterministic (huggingface#2600)

de17ce4

--------- Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

efraimdahl pushed a commit to efraimdahl/peft that referenced this pull request Jul 12, 2025

TST XPU regression tests with deterministic (huggingface#2600)

adf76fb

--------- Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

jiqing-feng deleted the bnb branch October 9, 2025 01:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Enable XPU regression tests with deterministic #2600

Enable XPU regression tests with deterministic #2600

Uh oh!

jiqing-feng commented Jun 20, 2025 •

edited

Loading

Uh oh!

HuggingFaceDocBuilderDev commented Jun 23, 2025

Uh oh!

BenjaminBossan commented Jun 23, 2025

Uh oh!

jiqing-feng commented Jun 24, 2025

Uh oh!

BenjaminBossan left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Enable XPU regression tests with deterministic #2600

Enable XPU regression tests with deterministic #2600

Uh oh!

Conversation

jiqing-feng commented Jun 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

HuggingFaceDocBuilderDev commented Jun 23, 2025

Uh oh!

BenjaminBossan commented Jun 23, 2025

Uh oh!

jiqing-feng commented Jun 24, 2025

Uh oh!

BenjaminBossan left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

jiqing-feng commented Jun 20, 2025 •

edited

Loading