-
Notifications
You must be signed in to change notification settings - Fork 329
Description
When we use the ScienceQA dataset and use CLIP image features, a gradient explosion occurred. Below is my run log.
====Input Arguments====
{
"data_root": "data",
"output_dir": "experiments",
"model": "allenai/unifiedqa-t5-base",
"options": [
"A",
"B",
"C",
"D",
"E"
],
"epoch": 50,
"lr": 5e-05,
"bs": 4,
"input_len": 512,
"output_len": 512,
"eval_bs": 4,
"eval_acc": null,
"train_split": "train",
"val_split": "val",
"test_split": "test",
"use_generate": true,
"final_eval": false,
"user_msg": "rationale",
"img_type": "clip",
"eval_le": null,
"test_le": null,
"evaluate_dir": null,
"caption_file": "data/instruct_captions.json",
"use_caption": true,
"prompt_format": "QCM-E",
"seed": 42
}
img_features size: (11208, 49, 2048)
number of train problems: 12726
number of val problems: 4241
number of test problems: 4241
You are using the default legacy behaviour of the <class 'transformers.models.t5.tokenization_t5.T5Tokenizer'>. This is expected, and simply means that the legacy
(previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set legacy=False
. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in huggingface/transformers#24565
[14:58:49] [Model]: Loading allenai/unifiedqa-t5-base... main.py:66
[Data]: Reading data... main.py:67
experiments/rationale_allenai-unifiedqa-t5-base_clip_QCM-E_lr5e-05_bs4_op512_ep50
Some weights of T5ForMultimodalGeneration were not initialized from the model checkpoint at allenai/unifiedqa-t5-base and are newly initialized: ['encoder.gate_dense.bias', 'encoder.gate_dense.weight', 'encoder.image_dense.bias', 'encoder.image_dense.weight', 'encoder.mha_layer.in_proj_bias', 'encoder.mha_layer.in_proj_weight', 'encoder.mha_layer.out_proj.bias', 'encoder.mha_layer.out_proj.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
model parameters: 228019968
{'loss': 29.2632, 'grad_norm': inf, 'learning_rate': 4.984286612193589e-05, 'epoch': 0.16}
{'loss': 29.2109, 'grad_norm': inf, 'learning_rate': 4.968573224387178e-05, 'epoch': 0.31}
1%|▉ | 1106/159100 [26:08<60:19:10, 1.37s/it]{'loss': 29.2953, 'grad_norm': inf, 'learning_rate': 4.952859836580767e-05, 'epoch': 0.47}