+
Skip to content

Conversation

danielvegamyhre
Copy link
Contributor

  • Add bench script to benchmark MoE layer computation in isolation, without any distributed/comms aspects.
  • Configurable options:
    • num_experts
    • dim
    • hidden_dim
    • seq_len
    • local_batch_size
  • This is useful for benchmarking the computation portion of the MoE specifically, to iterate on quickly, without having to look at a trace to exclude all2all comms etc.
  • Profiling is included though, so the developer can quickly break down the specific quantization kernels, GEMMs etc

Llama4 17bx16e shapes

CUDA_VISIBLE_DEVICES=6 python benchmarks/prototype/moe_training/bench_moe_layer.py --recipe mxfp8 --local_batch_size=16 --dim=5120 --hidden_dim=8192 --local_num_experts=8
total_M: 131072, N: 8192, K: 5120
bf16 time: 275.270 ms
mxfp8 time: 192.420 ms
speedup: 1.431x

DeepSeekV3 671b shapes

CUDA_VISIBLE_DEVICES=6 python benchmarks/prototype/moe_training/bench_moe_layer.py --recipe mxfp8 --local_batch_size=16 --dim=7168 --hidden_dim=2048 --local_num_experts=8
total_M: 131072, N: 2048, K: 7168
bf16 time: 92.032 ms
mxfp8 time: 80.182 ms
speedup: 1.148x

@danielvegamyhre danielvegamyhre added topic: not user facing Use this tag if you don't want this PR to show up in release notes moe labels Oct 6, 2025
Copy link

pytorch-bot bot commented Oct 6, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/3126

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure

As of commit 787aaee with merge base cd21d0e (image):

NEW FAILURE - The following job has failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Oct 6, 2025
@danielvegamyhre danielvegamyhre changed the title [moe training] bench script for single device moe layer [BE] [moe training] bench script for single device moe layer Oct 11, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. moe topic: not user facing Use this tag if you don't want this PR to show up in release notes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant

点击 这是indexloc提供的php浏览器服务,不要输入任何密码和下载