fix: Loading LoRA parameters which saved from multi-card training #523

yuyu5333 · 2025-11-06T11:11:54Z

我使用8张卡训练的lora，训练启动如下：

torchrun --nproc_per_node 8 train_lora.py --lora_name lora

在加载lora参数进行推理时碰到了以下问题：

Traceback (most recent call last):
  File "/file_system/wyz/minimind/eval_llm.py", line 89, in <module>
    main()
  File "/file_system/wyz/minimind/eval_llm.py", line 61, in main
    model, tokenizer = init_model(args)
                       ^^^^^^^^^^^^^^^^
  File "/file_system/wyz/minimind/eval_llm.py", line 26, in init_model
    load_lora(model, f'./{args.save_dir}/lora/{args.lora_weight}_{args.hidden_size}.pth')
  File "/file_system/wyz/minimind/model/model_lora.py", line 40, in load_lora
    module.lora.load_state_dict(lora_state)
  File "/root/miniconda3/lib/python3.11/site-packages/torch/nn/modules/module.py", line 2581, in load_state_dict
    raise RuntimeError(
RuntimeError: Error(s) in loading state_dict for LoRA:
        Missing key(s) in state_dict: "A.weight", "B.weight". 
        Unexpected key(s) in state_dict: "module.A.weight", "module.B.weight".

经过分析是由于多卡训练导致的模型字典多了"module"字段，通过对model/model_lora.py中"load_lora"进行修复，能够正常运行lora推理：

python eval_llm.py  --weight full_sft --lora_weight lora_weight
MiniMind模型参数: 1610.50 M(illion)
[0] 自动测试
[1] 手动输入
0
👶: 你有什么特长？
🤖️: 我是一名大型语言模型，可以回答各种问题，提供信息和建议。我能够生成各种类型的文本，包括小说、文章、诗歌和对话等。我还能够与用户进行对话，并为他们提供有趣的观点和见解。



👶: 为什么天空是蓝色的
🤖️: 天空之所以呈现出蓝色，是因为日落和日出时太阳光经过大气层时被分散成不同颜色的光线，在大气层中分散成不同的颜色。当太阳在地平线附近时，光线经过大气层后被散射，蓝色光线在穿过大气层时被分散，而其他颜色的光线则被分散。因此，当太阳处于地平线附近时，光线需要经过更多的大气层才能到达我们的眼睛，而蓝色光线则被散射掉了。这就是为什么天空呈现出蓝色，这是因为太阳光在大气层中分散和散射，其中蓝色光线更容易被散射，而红色和黄色光线的波长较短，所以它们在我们的眼中被分散了。在日落和日出时，太阳光经过更长的路程，大气层变得更厚，因此更容易被散射，所以在日落时，太阳光需要经过更长的路程，大气层中更多的气体和尘埃会散射掉一些蓝色光线。这就是为什么天空呈现出蓝色，因为日落时太阳光经过更长的路程，更多的蓝色光线被散射掉了，而红色和黄色的光线则更容易被散射，因此天空呈现出橙红色或黄色的颜色。

......

在load_lora中进行修改而不是在save_lora中修改是考虑到已经训练好的模型不用重新保存一遍，就能兼容之前训练好的模型。

另外：我还在gitignore加上了"trainer/__pycache__"，不然会有很多中间文件被识别到。

yuyu5333 · 2025-11-06T11:13:44Z

@jingyaogong 期待您抽空review一下~ 🙏

fix: Loading LoRA parameters which saved from multi-card training

7de4b9e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: Loading LoRA parameters which saved from multi-card training #523

fix: Loading LoRA parameters which saved from multi-card training #523

Uh oh!

yuyu5333 commented Nov 6, 2025 •

edited

Loading

Uh oh!

yuyu5333 commented Nov 6, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

fix: Loading LoRA parameters which saved from multi-card training #523

Are you sure you want to change the base?

fix: Loading LoRA parameters which saved from multi-card training #523

Uh oh!

Conversation

yuyu5333 commented Nov 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yuyu5333 commented Nov 6, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

yuyu5333 commented Nov 6, 2025 •

edited

Loading