Method comparison: Add MiSS result #2740

BenjaminBossan · 2025-08-14T14:01:55Z

default
mini
bat

Results are pretty close to the corresponding experiments with Bone, which is what we expected.

- default - mini - bat Results are pretty close to the corresponding experiments with Bone, which is what we expected.

HuggingFaceDocBuilderDev · 2025-08-14T14:07:05Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

BenjaminBossan · 2025-08-14T15:38:25Z

FYI @JL-er

Joluck · 2025-08-15T12:40:44Z

FYI @JL-er

Thanks, I noticed that the BAT mode doesn't seem very stable. This time's test results were even worse than the default mode.

BenjaminBossan · 2025-08-15T12:49:27Z

Hmm, right, the test accuracy decreased from 51.7% to 50.5%, even though the train loss is pretty much identical (0.5763 vs 0.5761). I'd say that using the default is the more attractive setting anyway, as it's much more memory efficient, but it could still be worth investigating why the results changed so much.

Joluck · 2025-08-15T12:53:01Z

Hmm, right, the test accuracy decreased from 51.7% to 50.5%, even though the train loss is pretty much identical (0.5763 vs 0.5761). I'd say that using the default is the more attractive setting anyway, as it's much more memory efficient, but it could still be worth investigating why the results changed so much.

Could you run the bat file once?

BenjaminBossan · 2025-08-15T14:48:07Z

I re-ran the Bone example on the same machine, now getting an even lower accuracy (49.5%). I'll probably re-run it a few more times to see how high the variance is in general.

Result for Bone Bat

{
  "run_info": {
    "created_at": "2025-08-15T13:38:21+00:00",
    "total_time": 2758.8984308090003,
    "experiment_name": "bone/llama-3.2-3B-bat",
    "peft_branch": "main",
    "train_config": {
      "model_id": "meta-llama/Llama-3.2-3B",
      "dtype": "bfloat16",
      "max_seq_length": 768,
      "batch_size": 4,
      "batch_size_eval": 50,
      "max_steps": 5000,
      "eval_steps": 250,
      "compile": false,
      "query_template": "Question: {query} Think step by step.\nAnswer:",
      "seed": 0,
      "grad_norm_clip": 1.0,
      "optimizer_type": "AdamW",
      "optimizer_kwargs": {
        "lr": 0.0001,
        "weight_decay": 0.1
      },
      "lr_scheduler": "cosine",
      "use_amp": false,
      "autocast_adapter_dtype": true,
      "generation_kwargs": {
        "max_length": 800,
        "max_new_tokens": 300
      },
      "attn_implementation": null
    },
    "peft_config": {
      "task_type": null,
      "peft_type": "BONE",
      "auto_mapping": null,
      "base_model_name_or_path": "meta-llama/Llama-3.2-3B",
      "revision": null,
      "inference_mode": false,
      "r": 64,
      "target_modules": [
        "q_proj",
        "v_proj"
      ],
      "exclude_modules": null,
      "init_weights": "bat",
      "layers_to_transform": null,
      "layers_pattern": null,
      "bias": "none",
      "modules_to_save": null
    },
    "error_msg": ""
  },
  "train_info": {
    "accelerator_memory_reserved_avg": 14713894417,
    "accelerator_memory_max": 25251807232,
    "accelerator_memory_reserved_99th": 20472733368,
    "train_time": 2467.8785469740014,
    "file_size": 29367552,
    "num_trainable_params": 7340032,
    "num_total_params": 3220089856,
    "status": "success",
    "metrics": [
      {
        "step": 250,
        "valid accuracy": 0.32,
        "train loss": 0.8741402707099915,
        "train samples": 1000,
        "train time": 44.84663501100022,
        "eval time": 16.530845782000142,
        "tokens / sec": 4720.956208822991,
        "mem allocated avg": 6898546569.216,
        "mem reserved avg": 14772112195.584,
        "elapsed time": 125.1565625950002
      },
      {
        "step": 500,
        "valid accuracy": 0.42,
        "train loss": 0.6949697629213333,
        "train samples": 2000,
        "train time": 44.66738984100175,
        "eval time": 12.175719579000088,
        "tokens / sec": 4656.529086216588,
        "mem allocated avg": 6890138988.544,
        "mem reserved avg": 14663949484.032,
        "elapsed time": 240.0960283049999
      },
      {
        "step": 750,
        "valid accuracy": 0.38,
        "train loss": 0.667268633723259,
        "train samples": 3000,
        "train time": 45.62526284499927,
        "eval time": 8.235976585000117,
        "tokens / sec": 4699.172928129208,
        "mem allocated avg": 6901011800.064,
        "mem reserved avg": 14819080011.776,
        "elapsed time": 352.3821910219999
      },
      {
        "step": 1000,
        "valid accuracy": 0.44,
        "train loss": 0.6479405733346939,
        "train samples": 4000,
        "train time": 44.807461878997856,
        "eval time": 9.97781685100017,
        "tokens / sec": 4649.582709295373,
        "mem allocated avg": 6892128219.136,
        "mem reserved avg": 14679913005.056,
        "elapsed time": 465.37876664800024
      },
      {
        "step": 1250,
        "valid accuracy": 0.34,
        "train loss": 0.643578136086464,
        "train samples": 5000,
        "train time": 45.07155244600017,
        "eval time": 8.857318488000146,
        "tokens / sec": 4626.8208810834185,
        "mem allocated avg": 6892222337.024,
        "mem reserved avg": 14675131498.496,
        "elapsed time": 577.4979845480002
      },
      {
        "step": 1500,
        "valid accuracy": 0.48,
        "train loss": 0.6369394363164902,
        "train samples": 6000,
        "train time": 45.09846532499796,
        "eval time": 16.40352508900014,
        "tokens / sec": 4641.643534685168,
        "mem allocated avg": 6893671811.072,
        "mem reserved avg": 14706127405.056,
        "elapsed time": 697.4003612199999
      },
      {
        "step": 1750,
        "valid accuracy": 0.46,
        "train loss": 0.6277884117364884,
        "train samples": 7000,
        "train time": 45.44054208400212,
        "eval time": 16.52979276899987,
        "tokens / sec": 4607.2293682804875,
        "mem allocated avg": 6895174580.224,
        "mem reserved avg": 14716906766.336,
        "elapsed time": 817.9448886139999
      },
      {
        "step": 2000,
        "valid accuracy": 0.38,
        "train loss": 0.6284448710680008,
        "train samples": 8000,
        "train time": 44.66441460200076,
        "eval time": 16.455440011000064,
        "tokens / sec": 4650.144905082808,
        "mem allocated avg": 6891904557.056,
        "mem reserved avg": 14653740548.096,
        "elapsed time": 937.2175861730002
      },
      {
        "step": 2250,
        "valid accuracy": 0.38,
        "train loss": 0.6159043073654175,
        "train samples": 9000,
        "train time": 46.132000129995504,
        "eval time": 10.414071448999948,
        "tokens / sec": 4659.412108607851,
        "mem allocated avg": 6903182145.536,
        "mem reserved avg": 14849203503.104,
        "elapsed time": 1052.6482256069999
      },
      {
        "step": 2500,
        "valid accuracy": 0.46,
        "train loss": 0.6123742452859878,
        "train samples": 10000,
        "train time": 44.48052799800462,
        "eval time": 9.077246988999832,
        "tokens / sec": 4630.498091417431,
        "mem allocated avg": 6888177211.392,
        "mem reserved avg": 14597721423.872,
        "elapsed time": 1164.3923993470003
      },
      {
        "step": 2750,
        "valid accuracy": 0.46,
        "train loss": 0.6012357432842255,
        "train samples": 11000,
        "train time": 45.48063802700244,
        "eval time": 16.48930690899988,
        "tokens / sec": 4658.7077312812435,
        "mem allocated avg": 6898979477.504,
        "mem reserved avg": 14782279188.48,
        "elapsed time": 1285.076055947
      },
      {
        "step": 3000,
        "valid accuracy": 0.46,
        "train loss": 0.590102802991867,
        "train samples": 12000,
        "train time": 44.94007473199872,
        "eval time": 16.46088983999971,
        "tokens / sec": 4644.651822338361,
        "mem allocated avg": 6894117320.704,
        "mem reserved avg": 14690650423.296,
        "elapsed time": 1404.875696617
      },
      {
        "step": 3250,
        "valid accuracy": 0.5,
        "train loss": 0.5990242173671723,
        "train samples": 13000,
        "train time": 45.31071554800019,
        "eval time": 16.52300792999995,
        "tokens / sec": 4654.55019743797,
        "mem allocated avg": 6895772430.336,
        "mem reserved avg": 14729800056.832,
        "elapsed time": 1525.008872587
      },
      {
        "step": 3500,
        "valid accuracy": 0.5,
        "train loss": 0.5803046126365662,
        "train samples": 14000,
        "train time": 44.94620334099545,
        "eval time": 14.057770697999786,
        "tokens / sec": 4666.690051853321,
        "mem allocated avg": 6893924593.664,
        "mem reserved avg": 14704818782.208,
        "elapsed time": 1642.582958221
      },
      {
        "step": 3750,
        "valid accuracy": 0.5,
        "train loss": 0.5769718471765518,
        "train samples": 15000,
        "train time": 46.15345219799838,
        "eval time": 16.54434743999991,
        "tokens / sec": 4695.271744144811,
        "mem allocated avg": 6905346478.08,
        "mem reserved avg": 14888957116.416,
        "elapsed time": 1764.270111406
      },
      {
        "step": 4000,
        "valid accuracy": 0.5,
        "train loss": 0.5857474536895751,
        "train samples": 16000,
        "train time": 44.38817892599582,
        "eval time": 16.396790597999825,
        "tokens / sec": 4604.22132524814,
        "mem allocated avg": 6886660577.28,
        "mem reserved avg": 14582965862.4,
        "elapsed time": 1883.048193569
      },
      {
        "step": 4250,
        "valid accuracy": 0.52,
        "train loss": 0.5724298695325851,
        "train samples": 17000,
        "train time": 46.189890748002654,
        "eval time": 16.550546742999813,
        "tokens / sec": 4576.520891839106,
        "mem allocated avg": 6897394636.8,
        "mem reserved avg": 14742080978.944,
        "elapsed time": 2004.408629256
      },
      {
        "step": 4500,
        "valid accuracy": 0.5,
        "train loss": 0.5789464256763458,
        "train samples": 18000,
        "train time": 45.51842483500286,
        "eval time": 16.440088976999505,
        "tokens / sec": 4565.579779030307,
        "mem allocated avg": 6892786214.912,
        "mem reserved avg": 14656919830.528,
        "elapsed time": 2124.6471290850004
      },
      {
        "step": 4750,
        "valid accuracy": 0.5,
        "train loss": 0.567945005774498,
        "train samples": 19000,
        "train time": 44.95984372499879,
        "eval time": 16.491939075999653,
        "tokens / sec": 4669.47797425881,
        "mem allocated avg": 6893964591.104,
        "mem reserved avg": 14709189246.976,
        "elapsed time": 2244.6083886899996
      },
      {
        "step": 5000,
        "valid accuracy": 0.48,
        "train loss": 0.5767219476699829,
        "train samples": 20000,
        "train time": 45.47602556899801,
        "eval time": 16.543005558000004,
        "tokens / sec": 4579.995665715981,
        "mem allocated avg": 6891249879.04,
        "mem reserved avg": 14656341016.576,
        "elapsed time": 2364.9427617600004
      },
      {
        "step": 5000,
        "test accuracy": 0.49507202426080366,
        "train loss": 0.5767219476699829,
        "train samples": 20000,
        "train total tokens": 4198051
      }
    ]
  },
  "meta_info": {
    "model_info": {
      "sha": "13afe5124825b4f3751f836b40dafda64c1ed062",
      "created_at": "2024-09-18T15:23:48+00:00"
    },
    "dataset_info": {
      "metamath": {
        "sha": "aa4f34d3d2d3231299b5b03d9b3e5a20da45aa18",
        "created_at": "2023-09-21T17:22:46+00:00"
      },
      "gsm8k": {
        "sha": "e53f048856ff4f594e959d75785d2c2d37b678ee",
        "created_at": "2022-04-12T10:22:10+00:00"
      }
    },
    "package_info": {
      "transformers-version": "4.52.4",
      "transformers-commit-hash": null,
      "peft-version": "0.17.1.dev0",
      "peft-commit-hash": "04d41cbcd061bf1ab1185e111054bae012cb1894",
      "datasets-version": "3.6.0",
      "datasets-commit-hash": null,
      "bitsandbytes-version": "0.46.0",
      "bitsandbytes-commit-hash": null,
      "torch-version": "2.7.1+cu126",
      "torch-commit-hash": null
    },
    "system_info": {
      "system": "Linux",
      "release": "6.14.0-1010-aws",
      "version": "#10~24.04.1-Ubuntu SMP Fri Jul 18 20:44:30 UTC 2025",
      "machine": "x86_64",
      "processor": "x86_64",
      "accelerator": "NVIDIA L40S"
    },
    "pytorch_info": "PyTorch built with:\n  - GCC 11.2\n  - C++ Version: 201703\n  - Intel(R) oneAPI Math Kernel Library Version 2024.2-Product Build 20240605 for Intel(R) 64 architecture applications\n  - Intel(R) MKL-DNN v3.7.1 (Git Hash 8d263e693366ef8db40acc569cc7d8edf644556d)\n  - OpenMP 201511 (a.k.a. OpenMP 4.5)\n  - LAPACK is enabled (usually provided by MKL)\n  - NNPACK is enabled\n  - CPU capability usage: AVX2\n  - CUDA Runtime 12.6\n  - NVCC architecture flags: -gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_90,code=sm_90\n  - CuDNN 90.7.1  (built against CUDA 12.8)\n    - Built with CuDNN 90.5.1\n  - Magma 2.6.1\n  - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, COMMIT_SHA=e2d141dbde55c2a4370fac5165b0561b6af4798b, CUDA_VERSION=12.6, CUDNN_VERSION=9.5.1, CXX_COMPILER=/opt/rh/gcc-toolset-11/root/usr/bin/c++, CXX_FLAGS= -D_GLIBCXX_USE_CXX11_ABI=1 -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOROCTRACER -DLIBKINETO_NOXPUPTI=ON -DUSE_FBGEMM -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Werror=range-loop-construct -Werror=bool-operation -Wnarrowing -Wno-missing-field-initializers -Wno-unknown-pragmas -Wno-unused-parameter -Wno-strict-overflow -Wno-strict-aliasing -Wno-stringop-overflow -Wsuggest-override -Wno-psabi -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, TORCH_VERSION=2.7.1, USE_CUDA=ON, USE_CUDNN=ON, USE_CUSPARSELT=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_GLOO=ON, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=1, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF, USE_ROCM_KERNEL_ASSERT=OFF, \n"
  }
}

Joluck · 2025-08-18T09:31:50Z

I re-ran the Bone example on the same machine, now getting an even lower accuracy (49.5%). I'll probably re-run it a few more times to see how high the variance is in general.

Result for Bone Bat

{
  "run_info": {
    "created_at": "2025-08-15T13:38:21+00:00",
    "total_time": 2758.8984308090003,
    "experiment_name": "bone/llama-3.2-3B-bat",
    "peft_branch": "main",
    "train_config": {
      "model_id": "meta-llama/Llama-3.2-3B",
      "dtype": "bfloat16",
      "max_seq_length": 768,
      "batch_size": 4,
      "batch_size_eval": 50,
      "max_steps": 5000,
      "eval_steps": 250,
      "compile": false,
      "query_template": "Question: {query} Think step by step.\nAnswer:",
      "seed": 0,
      "grad_norm_clip": 1.0,
      "optimizer_type": "AdamW",
      "optimizer_kwargs": {
        "lr": 0.0001,
        "weight_decay": 0.1
      },
      "lr_scheduler": "cosine",
      "use_amp": false,
      "autocast_adapter_dtype": true,
      "generation_kwargs": {
        "max_length": 800,
        "max_new_tokens": 300
      },
      "attn_implementation": null
    },
    "peft_config": {
      "task_type": null,
      "peft_type": "BONE",
      "auto_mapping": null,
      "base_model_name_or_path": "meta-llama/Llama-3.2-3B",
      "revision": null,
      "inference_mode": false,
      "r": 64,
      "target_modules": [
        "q_proj",
        "v_proj"
      ],
      "exclude_modules": null,
      "init_weights": "bat",
      "layers_to_transform": null,
      "layers_pattern": null,
      "bias": "none",
      "modules_to_save": null
    },
    "error_msg": ""
  },
  "train_info": {
    "accelerator_memory_reserved_avg": 14713894417,
    "accelerator_memory_max": 25251807232,
    "accelerator_memory_reserved_99th": 20472733368,
    "train_time": 2467.8785469740014,
    "file_size": 29367552,
    "num_trainable_params": 7340032,
    "num_total_params": 3220089856,
    "status": "success",
    "metrics": [
      {
        "step": 250,
        "valid accuracy": 0.32,
        "train loss": 0.8741402707099915,
        "train samples": 1000,
        "train time": 44.84663501100022,
        "eval time": 16.530845782000142,
        "tokens / sec": 4720.956208822991,
        "mem allocated avg": 6898546569.216,
        "mem reserved avg": 14772112195.584,
        "elapsed time": 125.1565625950002
      },
      {
        "step": 500,
        "valid accuracy": 0.42,
        "train loss": 0.6949697629213333,
        "train samples": 2000,
        "train time": 44.66738984100175,
        "eval time": 12.175719579000088,
        "tokens / sec": 4656.529086216588,
        "mem allocated avg": 6890138988.544,
        "mem reserved avg": 14663949484.032,
        "elapsed time": 240.0960283049999
      },
      {
        "step": 750,
        "valid accuracy": 0.38,
        "train loss": 0.667268633723259,
        "train samples": 3000,
        "train time": 45.62526284499927,
        "eval time": 8.235976585000117,
        "tokens / sec": 4699.172928129208,
        "mem allocated avg": 6901011800.064,
        "mem reserved avg": 14819080011.776,
        "elapsed time": 352.3821910219999
      },
      {
        "step": 1000,
        "valid accuracy": 0.44,
        "train loss": 0.6479405733346939,
        "train samples": 4000,
        "train time": 44.807461878997856,
        "eval time": 9.97781685100017,
        "tokens / sec": 4649.582709295373,
        "mem allocated avg": 6892128219.136,
        "mem reserved avg": 14679913005.056,
        "elapsed time": 465.37876664800024
      },
      {
        "step": 1250,
        "valid accuracy": 0.34,
        "train loss": 0.643578136086464,
        "train samples": 5000,
        "train time": 45.07155244600017,
        "eval time": 8.857318488000146,
        "tokens / sec": 4626.8208810834185,
        "mem allocated avg": 6892222337.024,
        "mem reserved avg": 14675131498.496,
        "elapsed time": 577.4979845480002
      },
      {
        "step": 1500,
        "valid accuracy": 0.48,
        "train loss": 0.6369394363164902,
        "train samples": 6000,
        "train time": 45.09846532499796,
        "eval time": 16.40352508900014,
        "tokens / sec": 4641.643534685168,
        "mem allocated avg": 6893671811.072,
        "mem reserved avg": 14706127405.056,
        "elapsed time": 697.4003612199999
      },
      {
        "step": 1750,
        "valid accuracy": 0.46,
        "train loss": 0.6277884117364884,
        "train samples": 7000,
        "train time": 45.44054208400212,
        "eval time": 16.52979276899987,
        "tokens / sec": 4607.2293682804875,
        "mem allocated avg": 6895174580.224,
        "mem reserved avg": 14716906766.336,
        "elapsed time": 817.9448886139999
      },
      {
        "step": 2000,
        "valid accuracy": 0.38,
        "train loss": 0.6284448710680008,
        "train samples": 8000,
        "train time": 44.66441460200076,
        "eval time": 16.455440011000064,
        "tokens / sec": 4650.144905082808,
        "mem allocated avg": 6891904557.056,
        "mem reserved avg": 14653740548.096,
        "elapsed time": 937.2175861730002
      },
      {
        "step": 2250,
        "valid accuracy": 0.38,
        "train loss": 0.6159043073654175,
        "train samples": 9000,
        "train time": 46.132000129995504,
        "eval time": 10.414071448999948,
        "tokens / sec": 4659.412108607851,
        "mem allocated avg": 6903182145.536,
        "mem reserved avg": 14849203503.104,
        "elapsed time": 1052.6482256069999
      },
      {
        "step": 2500,
        "valid accuracy": 0.46,
        "train loss": 0.6123742452859878,
        "train samples": 10000,
        "train time": 44.48052799800462,
        "eval time": 9.077246988999832,
        "tokens / sec": 4630.498091417431,
        "mem allocated avg": 6888177211.392,
        "mem reserved avg": 14597721423.872,
        "elapsed time": 1164.3923993470003
      },
      {
        "step": 2750,
        "valid accuracy": 0.46,
        "train loss": 0.6012357432842255,
        "train samples": 11000,
        "train time": 45.48063802700244,
        "eval time": 16.48930690899988,
        "tokens / sec": 4658.7077312812435,
        "mem allocated avg": 6898979477.504,
        "mem reserved avg": 14782279188.48,
        "elapsed time": 1285.076055947
      },
      {
        "step": 3000,
        "valid accuracy": 0.46,
        "train loss": 0.590102802991867,
        "train samples": 12000,
        "train time": 44.94007473199872,
        "eval time": 16.46088983999971,
        "tokens / sec": 4644.651822338361,
        "mem allocated avg": 6894117320.704,
        "mem reserved avg": 14690650423.296,
        "elapsed time": 1404.875696617
      },
      {
        "step": 3250,
        "valid accuracy": 0.5,
        "train loss": 0.5990242173671723,
        "train samples": 13000,
        "train time": 45.31071554800019,
        "eval time": 16.52300792999995,
        "tokens / sec": 4654.55019743797,
        "mem allocated avg": 6895772430.336,
        "mem reserved avg": 14729800056.832,
        "elapsed time": 1525.008872587
      },
      {
        "step": 3500,
        "valid accuracy": 0.5,
        "train loss": 0.5803046126365662,
        "train samples": 14000,
        "train time": 44.94620334099545,
        "eval time": 14.057770697999786,
        "tokens / sec": 4666.690051853321,
        "mem allocated avg": 6893924593.664,
        "mem reserved avg": 14704818782.208,
        "elapsed time": 1642.582958221
      },
      {
        "step": 3750,
        "valid accuracy": 0.5,
        "train loss": 0.5769718471765518,
        "train samples": 15000,
        "train time": 46.15345219799838,
        "eval time": 16.54434743999991,
        "tokens / sec": 4695.271744144811,
        "mem allocated avg": 6905346478.08,
        "mem reserved avg": 14888957116.416,
        "elapsed time": 1764.270111406
      },
      {
        "step": 4000,
        "valid accuracy": 0.5,
        "train loss": 0.5857474536895751,
        "train samples": 16000,
        "train time": 44.38817892599582,
        "eval time": 16.396790597999825,
        "tokens / sec": 4604.22132524814,
        "mem allocated avg": 6886660577.28,
        "mem reserved avg": 14582965862.4,
        "elapsed time": 1883.048193569
      },
      {
        "step": 4250,
        "valid accuracy": 0.52,
        "train loss": 0.5724298695325851,
        "train samples": 17000,
        "train time": 46.189890748002654,
        "eval time": 16.550546742999813,
        "tokens / sec": 4576.520891839106,
        "mem allocated avg": 6897394636.8,
        "mem reserved avg": 14742080978.944,
        "elapsed time": 2004.408629256
      },
      {
        "step": 4500,
        "valid accuracy": 0.5,
        "train loss": 0.5789464256763458,
        "train samples": 18000,
        "train time": 45.51842483500286,
        "eval time": 16.440088976999505,
        "tokens / sec": 4565.579779030307,
        "mem allocated avg": 6892786214.912,
        "mem reserved avg": 14656919830.528,
        "elapsed time": 2124.6471290850004
      },
      {
        "step": 4750,
        "valid accuracy": 0.5,
        "train loss": 0.567945005774498,
        "train samples": 19000,
        "train time": 44.95984372499879,
        "eval time": 16.491939075999653,
        "tokens / sec": 4669.47797425881,
        "mem allocated avg": 6893964591.104,
        "mem reserved avg": 14709189246.976,
        "elapsed time": 2244.6083886899996
      },
      {
        "step": 5000,
        "valid accuracy": 0.48,
        "train loss": 0.5767219476699829,
        "train samples": 20000,
        "train time": 45.47602556899801,
        "eval time": 16.543005558000004,
        "tokens / sec": 4579.995665715981,
        "mem allocated avg": 6891249879.04,
        "mem reserved avg": 14656341016.576,
        "elapsed time": 2364.9427617600004
      },
      {
        "step": 5000,
        "test accuracy": 0.49507202426080366,
        "train loss": 0.5767219476699829,
        "train samples": 20000,
        "train total tokens": 4198051
      }
    ]
  },
  "meta_info": {
    "model_info": {
      "sha": "13afe5124825b4f3751f836b40dafda64c1ed062",
      "created_at": "2024-09-18T15:23:48+00:00"
    },
    "dataset_info": {
      "metamath": {
        "sha": "aa4f34d3d2d3231299b5b03d9b3e5a20da45aa18",
        "created_at": "2023-09-21T17:22:46+00:00"
      },
      "gsm8k": {
        "sha": "e53f048856ff4f594e959d75785d2c2d37b678ee",
        "created_at": "2022-04-12T10:22:10+00:00"
      }
    },
    "package_info": {
      "transformers-version": "4.52.4",
      "transformers-commit-hash": null,
      "peft-version": "0.17.1.dev0",
      "peft-commit-hash": "04d41cbcd061bf1ab1185e111054bae012cb1894",
      "datasets-version": "3.6.0",
      "datasets-commit-hash": null,
      "bitsandbytes-version": "0.46.0",
      "bitsandbytes-commit-hash": null,
      "torch-version": "2.7.1+cu126",
      "torch-commit-hash": null
    },
    "system_info": {
      "system": "Linux",
      "release": "6.14.0-1010-aws",
      "version": "#10~24.04.1-Ubuntu SMP Fri Jul 18 20:44:30 UTC 2025",
      "machine": "x86_64",
      "processor": "x86_64",
      "accelerator": "NVIDIA L40S"
    },
    "pytorch_info": "PyTorch built with:\n  - GCC 11.2\n  - C++ Version: 201703\n  - Intel(R) oneAPI Math Kernel Library Version 2024.2-Product Build 20240605 for Intel(R) 64 architecture applications\n  - Intel(R) MKL-DNN v3.7.1 (Git Hash 8d263e693366ef8db40acc569cc7d8edf644556d)\n  - OpenMP 201511 (a.k.a. OpenMP 4.5)\n  - LAPACK is enabled (usually provided by MKL)\n  - NNPACK is enabled\n  - CPU capability usage: AVX2\n  - CUDA Runtime 12.6\n  - NVCC architecture flags: -gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_90,code=sm_90\n  - CuDNN 90.7.1  (built against CUDA 12.8)\n    - Built with CuDNN 90.5.1\n  - Magma 2.6.1\n  - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, COMMIT_SHA=e2d141dbde55c2a4370fac5165b0561b6af4798b, CUDA_VERSION=12.6, CUDNN_VERSION=9.5.1, CXX_COMPILER=/opt/rh/gcc-toolset-11/root/usr/bin/c++, CXX_FLAGS= -D_GLIBCXX_USE_CXX11_ABI=1 -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOROCTRACER -DLIBKINETO_NOXPUPTI=ON -DUSE_FBGEMM -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Werror=range-loop-construct -Werror=bool-operation -Wnarrowing -Wno-missing-field-initializers -Wno-unknown-pragmas -Wno-unused-parameter -Wno-strict-overflow -Wno-strict-aliasing -Wno-stringop-overflow -Wsuggest-override -Wno-psabi -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, TORCH_VERSION=2.7.1, USE_CUDA=ON, USE_CUDNN=ON, USE_CUSPARSELT=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_GLOO=ON, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=1, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF, USE_ROCM_KERNEL_ASSERT=OFF, \n"
  }
}

I will also check it.

BenjaminBossan · 2025-08-19T16:52:07Z

I ran a few Bone-bat experiments, variance of test accuracy is quite high:

52.0%
49.1%
49.6%
50.6%

I'll try a few other PEFT methods like LoRA next.

Joluck · 2025-08-20T06:15:49Z

I ran a few Bone-bat experiments, variance of test accuracy is quite high:

52.0%

49.1%

49.6%

50.6%

I'll try a few other PEFT methods like LoRA next.

Yes, we should indeed verify whether other PEFT methods also produce such large variances.

BenjaminBossan · 2025-08-20T15:29:49Z

I ran a few more experiments with LoRA rank 32, the rest being the same, and collected the test accuracy. I just from eyeballing, and from running a Levene/Brown-Forsythe test, it looks pretty much like {MiSS,Bone}-Bat has a higher variance than LoRA:

import numpy as np
import scipy as sp

# note: test set size is 1319
acc_lora = [0.48218347232752085, 0.47536012130401817, 0.4829416224412434, 0.4836997725549659, 0.47763457164518575, 0.4783927217589083, 0.4829416224412434, 0.4715693707354056]
acc_bat = [0.5200909780136467, 0.49052312357846856, 0.49583017437452614, 0.5056861258529188, 0.5170583775587566, 0.5049279757391963]

stat, p = sp.stats.levene(acc_lora, acc_bat, center='mean')
print(f"Levene W={stat:.3f}, p={p:.4f}")
# prints: Levene W=4.087, p=0.0661

stat, p = sp.stats.levene(acc_lora, acc_bat, center='median')
print(f"Brown–Forsythe W={stat:.3f}, p={p:.4f}")
# prints: Brown–Forsythe W=3.960, p=0.0699

Of course, this is not definitive proof, but it strongly suggests that Bat has higher variance.

BenjaminBossan · 2025-08-26T12:30:05Z

@JL-er Any ideas how to proceed? Is it possible that Bat is especially sensitive to small changes? The training loss is pretty consistent between the runs, so the differences there are not big, but the test accuracy still differs a lot (as generation can exacerbate small differences).

Joluck · 2025-08-28T05:33:29Z

@JL-er Any ideas how to proceed? Is it possible that Bat is especially sensitive to small changes? The training loss is pretty consistent between the runs, so the differences there are not big, but the test accuracy still differs a lot (as generation can exacerbate small differences).

I haven’t had time to run tests recently, but I suspect that the instability in evaluation may be caused by the interaction between bat and the original weights.

* Nit fix about max_lenth argument. * copy to docstring * typo * consistency --------- Co-authored-by: Quentin Gallouédec <quentin.gallouedec@huggingface.co> Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>

github-actions · 2025-09-21T15:03:31Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

BenjaminBossan · 2025-09-23T12:32:34Z

@JL-er Did you have the time to investigate this further?

Joluck · 2025-09-23T12:35:49Z

@JL-er Did you have the time to investigate this further?

I haven’t had time to run tests recently, but I suspect that the instability in evaluation may be caused by the interaction between bat and the original weights.

The bat should be fine, but it is too sensitive to LR.

BenjaminBossan · 2025-09-23T12:44:15Z

Okay, so we can just merge this PR, right? Maybe in a separate PR, it could be documented that Bat init can be unstable.

Joluck · 2025-09-23T12:46:26Z

Okay, so we can just merge this PR, right? Maybe in a separate PR, it could be documented that Bat init can be unstable.

Yes, we can merge the PR. We will propose new methods later.

Method comparison: Add MiSS result

2f4297e

- default - mini - bat Results are pretty close to the corresponding experiments with Bone, which is what we expected.

BenjaminBossan requested a review from githubnemo September 23, 2025 13:16

githubnemo approved these changes Sep 23, 2025

View reviewed changes

BenjaminBossan merged commit 530d7bb into huggingface:main Sep 25, 2025
25 of 40 checks passed

BenjaminBossan deleted the method-comparison-results-for-miss branch September 25, 2025 15:58

Method comparison: Add MiSS result #2740

Method comparison: Add MiSS result #2740

Uh oh!

Conversation

BenjaminBossan commented Aug 14, 2025

Uh oh!

HuggingFaceDocBuilderDev commented Aug 14, 2025

Uh oh!

BenjaminBossan commented Aug 14, 2025

Uh oh!

Joluck commented Aug 15, 2025

Uh oh!

BenjaminBossan commented Aug 15, 2025

Uh oh!

Joluck commented Aug 15, 2025

Uh oh!

BenjaminBossan commented Aug 15, 2025

Uh oh!

Joluck commented Aug 18, 2025

Uh oh!

BenjaminBossan commented Aug 19, 2025

Uh oh!

Joluck commented Aug 20, 2025

Uh oh!

BenjaminBossan commented Aug 20, 2025

Uh oh!

BenjaminBossan commented Aug 26, 2025

Uh oh!

Joluck commented Aug 28, 2025

Uh oh!

github-actions bot commented Sep 21, 2025

Uh oh!

BenjaminBossan commented Sep 23, 2025

Uh oh!

Joluck commented Sep 23, 2025

Uh oh!

BenjaminBossan commented Sep 23, 2025

Uh oh!

Joluck commented Sep 23, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants