-
Notifications
You must be signed in to change notification settings - Fork 39
Description
Describe the Bug
Hi, I am running the example custom model in the README (sba) and it does not recognize it, any ideas why? error log:
Traceback (most recent call last):
File "/network/scratch/n/nizar.islah/flame/fla-venv/lib/python3.10/site-packages/transformers/models/auto/configuration_auto.py", line 1218, in from_pretrained
config_class = CONFIG_MAPPING[config_dict["model_type"]]
File "/network/scratch/n/nizar.islah/flame/fla-venv/lib/python3.10/site-packages/transformers/models/auto/configuration_auto.py", line 914, in getitem
raise KeyError(key)
KeyError: 'sba'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/cvmfs/ai.mila.quebec/apps/arch/distro/python/3.10/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/cvmfs/ai.mila.quebec/apps/arch/distro/python/3.10/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/network/scratch/n/nizar.islah/flame/flame/utils/convert_dcp_to_hf.py", line 65, in
save_pretrained(args.path, args.step, args.config, args.tokenizer)
File "/network/scratch/n/nizar.islah/flame/fla-venv/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
File "/network/scratch/n/nizar.islah/flame/flame/utils/convert_dcp_to_hf.py", line 28, in save_pretrained
config = AutoConfig.from_pretrained(config, trust_remote_code=True)
File "/network/scratch/n/nizar.islah/flame/fla-venv/lib/python3.10/site-packages/transformers/models/auto/configuration_auto.py", line 1220, in from_pretrained
raise ValueError(
ValueError: The checkpoint you are trying to load has model type sba
but Transformers does not recognize this architecture. This could be because of an issue with the checkpoint, or because your version of Transformers is out of date.
Steps to Reproduce the Bug
NNODE=1 NGPU=1 LOG_RANK=0 bash train.sh
--job.config_file flame/models/fla.toml
--job.dump_folder exp/sba-340M-10B/batch32.seqlen2048.warmup1024.update1.steps20480.lr3e-4
--model.config configs/sba_340m.json
--model.tokenizer_path fla-hub/transformer-1.3B-100B
--optimizer.name AdamW
--optimizer.eps 1e-15
--optimizer.lr 3e-4
--lr_scheduler.warmup_steps 1024
--lr_scheduler.lr_min 0.1
--lr_scheduler.decay_type cosine
--training.batch_size 32
--training.seq_len 2048
--training.gradient_accumulation_steps 1
--training.steps 20480
--training.max_norm 1.0
--training.skip_nan_inf
--training.dataset HuggingFaceFW/fineweb-edu
--training.dataset_name default
--training.dataset_split train
--training.streaming
--training.num_workers 32
--training.prefetch_factor 2
--training.seed 42
--training.compile
--training.tensor_parallel_degree 1
--training.disable_loss_parallel
--checkpoint.interval 2048
--checkpoint.load_step -1
--metrics.log_freq 1
Expected Behavior
should begin training normally
Environment Information
pip install flame .