对于moe代码中的一个疑问

<img width="921" height="425" alt="Image" src="https://github.com/user-attachments/assets/40efc7c5-cb7e-4249-aa9d-7dec0c2e6801" />

class MoEFeedForward：
           ....
          if self.training:
            x = x.repeat_interleave(self.config.num_experts_per_tok, dim=0)
            y = torch.empty_like(x, dtype=torch.float16)
            for i, expert in enumerate(self.experts):
                y[flat_topk_idx == i] = expert(x[flat_topk_idx == i]).to(y.dtype) 
            y = (y.view(*topk_weight.shape, -1) * topk_weight.unsqueeze(-1)).sum(dim=1) #这里是dim=1 还是dim=2
            y = y.view(*orig_shape)

thanks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

对于moe代码中的一个疑问 #460

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

对于moe代码中的一个疑问 #460

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions