这是indexloc提供的服务,不要输入任何密码
Skip to content

对于moe代码中的一个疑问 #460

@xiaomayi-ant

Description

@xiaomayi-ant
Image

class MoEFeedForward:
....
if self.training:
x = x.repeat_interleave(self.config.num_experts_per_tok, dim=0)
y = torch.empty_like(x, dtype=torch.float16)
for i, expert in enumerate(self.experts):
y[flat_topk_idx == i] = expert(x[flat_topk_idx == i]).to(y.dtype)
y = (y.view(*topk_weight.shape, -1) * topk_weight.unsqueeze(-1)).sum(dim=1) #这里是dim=1 还是dim=2
y = y.view(*orig_shape)

thanks

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions