这是indexloc提供的服务,不要输入任何密码
Skip to content

When using img_feats, a gradient explosion issue occurs. #80

@Garibelhj

Description

@Garibelhj

When we use the ScienceQA dataset and use CLIP image features, a gradient explosion occurred. Below is my run log.
====Input Arguments====
{
"data_root": "data",
"output_dir": "experiments",
"model": "allenai/unifiedqa-t5-base",
"options": [
"A",
"B",
"C",
"D",
"E"
],
"epoch": 50,
"lr": 5e-05,
"bs": 4,
"input_len": 512,
"output_len": 512,
"eval_bs": 4,
"eval_acc": null,
"train_split": "train",
"val_split": "val",
"test_split": "test",
"use_generate": true,
"final_eval": false,
"user_msg": "rationale",
"img_type": "clip",
"eval_le": null,
"test_le": null,
"evaluate_dir": null,
"caption_file": "data/instruct_captions.json",
"use_caption": true,
"prompt_format": "QCM-E",
"seed": 42
}
img_features size: (11208, 49, 2048)
number of train problems: 12726

number of val problems: 4241

number of test problems: 4241

You are using the default legacy behaviour of the <class 'transformers.models.t5.tokenization_t5.T5Tokenizer'>. This is expected, and simply means that the legacy (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set legacy=False. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in huggingface/transformers#24565
[14:58:49] [Model]: Loading allenai/unifiedqa-t5-base... main.py:66

       [Data]: Reading data...                                                                                                                                       main.py:67

experiments/rationale_allenai-unifiedqa-t5-base_clip_QCM-E_lr5e-05_bs4_op512_ep50
Some weights of T5ForMultimodalGeneration were not initialized from the model checkpoint at allenai/unifiedqa-t5-base and are newly initialized: ['encoder.gate_dense.bias', 'encoder.gate_dense.weight', 'encoder.image_dense.bias', 'encoder.image_dense.weight', 'encoder.mha_layer.in_proj_bias', 'encoder.mha_layer.in_proj_weight', 'encoder.mha_layer.out_proj.bias', 'encoder.mha_layer.out_proj.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
model parameters: 228019968
{'loss': 29.2632, 'grad_norm': inf, 'learning_rate': 4.984286612193589e-05, 'epoch': 0.16}
{'loss': 29.2109, 'grad_norm': inf, 'learning_rate': 4.968573224387178e-05, 'epoch': 0.31}
1%|▉ | 1106/159100 [26:08<60:19:10, 1.37s/it]{'loss': 29.2953, 'grad_norm': inf, 'learning_rate': 4.952859836580767e-05, 'epoch': 0.47}

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions