这是indexloc提供的服务,不要输入任何密码
Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -178,7 +178,7 @@ print(torch.cuda.is_available())
**3.1 预训练(学图像描述)**

```bash
python train_pretrain_vlm.py --epochs 4
bash scripts/pretrain_vlm.sh
```

> 执行预训练,得到 `pretrain_vlm_*.pth` 作为预训练的输出权重(其中*为模型的dimension,默认为512)
Expand All @@ -187,7 +187,7 @@ python train_pretrain_vlm.py --epochs 4
**3.2 监督微调(学看图对话方式)**

```bash
python train_sft_vlm.py --epochs 4
bash scripts/sft_vlm.sh
```

> 执行监督微调,得到 `sft_vlm_*.pth` 作为指令微调的输出权重
Expand Down
4 changes: 2 additions & 2 deletions README_en.md
Original file line number Diff line number Diff line change
Expand Up @@ -189,7 +189,7 @@ skipping the pretrain training step and proceed directly to SFT training.
**3.1 Pretraining (Learning image description)**

```bash
python train_pretrain_vlm.py --epochs 4
bash scripts/pretrain_vlm.sh
```

> Run pretraining to get `pretrain_vlm_*.pth` as the pretrained model's output weights (* represents the model
Expand All @@ -198,7 +198,7 @@ python train_pretrain_vlm.py --epochs 4
**3.2 Supervised Fine-Tuning (Learning image-caption dialogue style)**

```bash
python train_sft_vlm.py --epochs 4
bash scripts/sft_vlm.sh
```

> Perform supervised fine-tuning to get `sft_vlm_*.pth` as the output weights for the fine-tuned model.
Expand Down
27 changes: 27 additions & 0 deletions scripts/pretrain_vlm.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
#!/bin/bash

# 设置运行脚本的参数
python train_pretrain_vlm.py \
--out_dir="out" \
--epochs=4 \
--batch_size=16 \
--learning_rate=0.0004 \
--device="cuda:0" \
--dtype="bfloat16" \
# --use_wandb \ # 如果使用wandb,请取消注释
--wandb_project="MiniMind-V" \
--num_workers=8 \
--data_path="./dataset/pretrain_data.jsonl" \
--images_path="./dataset/pretrain_images" \
# --ddp \ # 如果使用DDP分布式训练,请取消注释
--accumulation_steps=1 \
--grad_clip=1.0 \
--warmup_iters=0 \
--log_interval=100 \
--save_interval=100 \
--local_rank=-1 \
# 模型参数
--dim=512 \
--n_layers=8 \
--max_seq_len=640 \
--use_moe=False
27 changes: 27 additions & 0 deletions scripts/sft_vlm.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
#!/bin/bash

# 设置运行脚本的参数
python train_sft_vlm.py \
--out_dir="out" \
--epochs=6 \
--batch_size=8 \
--learning_rate=0.000001 \
--device="cuda:0" \
--dtype="bfloat16" \
# --use_wandb \ # 如果使用wandb,请取消注释
--wandb_project="MiniMind-V" \
--num_workers=8 \
--data_path="./dataset/sft_data.jsonl" \
--images_path="./dataset/sft_images" \
# --ddp \ # 如果使用DDP分布式训练,请取消注释
--accumulation_steps=1 \
--grad_clip=1.0 \
--warmup_iters=0 \
--log_interval=10 \
--save_interval=10 \
--local_rank=-1 \
# 模型参数
--dim=512 \
--n_layers=8 \
--max_seq_len=1536 \
--use_moe=False
6 changes: 3 additions & 3 deletions train_pretrain_vlm.py
Original file line number Diff line number Diff line change
Expand Up @@ -104,8 +104,8 @@ def init_model(model_config: VLMConfig):
# 加载纯语言模型权重
ckp = f'./out/lm_{model_config.dim}{moe_path}.pth'
model = MiniMindVLM(model_config)
state_dict = torch.load(ckp, map_location=args.device)
model.load_state_dict(state_dict, strict=False)
# state_dict = torch.load(ckp, map_location=args.device)
# model.load_state_dict(state_dict, strict=False)

# 冻结除 vision_proj 外的所有参数
for name, param in model.named_parameters():
Expand Down Expand Up @@ -147,7 +147,7 @@ def init_distributed_mode():
parser.add_argument("--use_wandb", default=False, action="store_true")
parser.add_argument("--wandb_project", type=str, default="MiniMind-V")
parser.add_argument("--num_workers", type=int, default=8)
parser.add_argument("--data_path", type=str, default="./dataset/pretrain_vlm_data.jsonl")
parser.add_argument("--data_path", type=str, default="./dataset/pretrain_data.jsonl")
parser.add_argument("--images_path", type=str, default="./dataset/pretrain_images")
parser.add_argument("--ddp", action="store_true")
parser.add_argument("--accumulation_steps", type=int, default=1)
Expand Down
2 changes: 1 addition & 1 deletion train_sft_vlm.py
Original file line number Diff line number Diff line change
Expand Up @@ -137,7 +137,7 @@ def init_distributed_mode():
parser.add_argument("--use_wandb", default=False, action="store_true")
parser.add_argument("--wandb_project", type=str, default="MiniMind-V")
parser.add_argument("--num_workers", type=int, default=8)
parser.add_argument("--data_path", type=str, default="./dataset/sft_vlm_data.jsonl")
parser.add_argument("--data_path", type=str, default="./dataset/sft_data.jsonl")
parser.add_argument("--images_path", type=str, default="./dataset/sft_images")
parser.add_argument("--ddp", action="store_true")
parser.add_argument("--accumulation_steps", type=int, default=1)
Expand Down