这是indexloc提供的服务，不要输入任何密码

新增注释，解释 Attention Trainer 细节 #478

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Open

zhenyu-02 wants to merge 1 commit into jingyaogong:master from zhenyu-02:master

zhenyu-02 commented Aug 15, 2025

本次提交为MiniMind模型的核心组件添加了中文注释

主要改动

模型层面 (`model/model_minimind.py`)

注意力机制：为Attention类添加了详细注释，解释了：
- 多头权重映射到统一维度的并行计算原理
- Q、K、V矩阵分割和重组的过程
- GQA（分组查询注意力）中多个Q共用KV矩阵的机制
- 注意力分数计算和三角掩码的作用机制
- attention mask对pad token的处理逻辑
位置编码：注释了在模型顶部统一计算位置编码并传递给各层的设计

训练层面 (`trainer/train_pretrain.py`)

损失计算：详细解释了loss mask在处理变长序列中的作用
辅助损失：说明了专家选择训练的辅助损失机制
梯度处理：完整注释了混合精度训练中的梯度缩放、裁剪和更新流程


          feat: add explanation

fd6448c

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet