这是indexloc提供的服务,不要输入任何密码
Skip to content

Conversation

@BenjaminBossan
Copy link
Member

  • default
  • mini
  • bat

Results are pretty close to the corresponding experiments with Bone, which is what we expected.

- default
- mini
- bat

Results are pretty close to the corresponding experiments with Bone,
which is what we expected.
@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@BenjaminBossan
Copy link
Member Author

FYI @JL-er

@Joluck
Copy link
Contributor

Joluck commented Aug 15, 2025

FYI @JL-er

Thanks, I noticed that the BAT mode doesn't seem very stable. This time's test results were even worse than the default mode.

@BenjaminBossan
Copy link
Member Author

Hmm, right, the test accuracy decreased from 51.7% to 50.5%, even though the train loss is pretty much identical (0.5763 vs 0.5761). I'd say that using the default is the more attractive setting anyway, as it's much more memory efficient, but it could still be worth investigating why the results changed so much.

@Joluck
Copy link
Contributor

Joluck commented Aug 15, 2025

Hmm, right, the test accuracy decreased from 51.7% to 50.5%, even though the train loss is pretty much identical (0.5763 vs 0.5761). I'd say that using the default is the more attractive setting anyway, as it's much more memory efficient, but it could still be worth investigating why the results changed so much.

Could you run the bat file once?

@BenjaminBossan
Copy link
Member Author

I re-ran the Bone example on the same machine, now getting an even lower accuracy (49.5%). I'll probably re-run it a few more times to see how high the variance is in general.

Result for Bone Bat
{
  "run_info": {
    "created_at": "2025-08-15T13:38:21+00:00",
    "total_time": 2758.8984308090003,
    "experiment_name": "bone/llama-3.2-3B-bat",
    "peft_branch": "main",
    "train_config": {
      "model_id": "meta-llama/Llama-3.2-3B",
      "dtype": "bfloat16",
      "max_seq_length": 768,
      "batch_size": 4,
      "batch_size_eval": 50,
      "max_steps": 5000,
      "eval_steps": 250,
      "compile": false,
      "query_template": "Question: {query} Think step by step.\nAnswer:",
      "seed": 0,
      "grad_norm_clip": 1.0,
      "optimizer_type": "AdamW",
      "optimizer_kwargs": {
        "lr": 0.0001,
        "weight_decay": 0.1
      },
      "lr_scheduler": "cosine",
      "use_amp": false,
      "autocast_adapter_dtype": true,
      "generation_kwargs": {
        "max_length": 800,
        "max_new_tokens": 300
      },
      "attn_implementation": null
    },
    "peft_config": {
      "task_type": null,
      "peft_type": "BONE",
      "auto_mapping": null,
      "base_model_name_or_path": "meta-llama/Llama-3.2-3B",
      "revision": null,
      "inference_mode": false,
      "r": 64,
      "target_modules": [
        "q_proj",
        "v_proj"
      ],
      "exclude_modules": null,
      "init_weights": "bat",
      "layers_to_transform": null,
      "layers_pattern": null,
      "bias": "none",
      "modules_to_save": null
    },
    "error_msg": ""
  },
  "train_info": {
    "accelerator_memory_reserved_avg": 14713894417,
    "accelerator_memory_max": 25251807232,
    "accelerator_memory_reserved_99th": 20472733368,
    "train_time": 2467.8785469740014,
    "file_size": 29367552,
    "num_trainable_params": 7340032,
    "num_total_params": 3220089856,
    "status": "success",
    "metrics": [
      {
        "step": 250,
        "valid accuracy": 0.32,
        "train loss": 0.8741402707099915,
        "train samples": 1000,
        "train time": 44.84663501100022,
        "eval time": 16.530845782000142,
        "tokens / sec": 4720.956208822991,
        "mem allocated avg": 6898546569.216,
        "mem reserved avg": 14772112195.584,
        "elapsed time": 125.1565625950002
      },
      {
        "step": 500,
        "valid accuracy": 0.42,
        "train loss": 0.6949697629213333,
        "train samples": 2000,
        "train time": 44.66738984100175,
        "eval time": 12.175719579000088,
        "tokens / sec": 4656.529086216588,
        "mem allocated avg": 6890138988.544,
        "mem reserved avg": 14663949484.032,
        "elapsed time": 240.0960283049999
      },
      {
        "step": 750,
        "valid accuracy": 0.38,
        "train loss": 0.667268633723259,
        "train samples": 3000,
        "train time": 45.62526284499927,
        "eval time": 8.235976585000117,
        "tokens / sec": 4699.172928129208,
        "mem allocated avg": 6901011800.064,
        "mem reserved avg": 14819080011.776,
        "elapsed time": 352.3821910219999
      },
      {
        "step": 1000,
        "valid accuracy": 0.44,
        "train loss": 0.6479405733346939,
        "train samples": 4000,
        "train time": 44.807461878997856,
        "eval time": 9.97781685100017,
        "tokens / sec": 4649.582709295373,
        "mem allocated avg": 6892128219.136,
        "mem reserved avg": 14679913005.056,
        "elapsed time": 465.37876664800024
      },
      {
        "step": 1250,
        "valid accuracy": 0.34,
        "train loss": 0.643578136086464,
        "train samples": 5000,
        "train time": 45.07155244600017,
        "eval time": 8.857318488000146,
        "tokens / sec": 4626.8208810834185,
        "mem allocated avg": 6892222337.024,
        "mem reserved avg": 14675131498.496,
        "elapsed time": 577.4979845480002
      },
      {
        "step": 1500,
        "valid accuracy": 0.48,
        "train loss": 0.6369394363164902,
        "train samples": 6000,
        "train time": 45.09846532499796,
        "eval time": 16.40352508900014,
        "tokens / sec": 4641.643534685168,
        "mem allocated avg": 6893671811.072,
        "mem reserved avg": 14706127405.056,
        "elapsed time": 697.4003612199999
      },
      {
        "step": 1750,
        "valid accuracy": 0.46,
        "train loss": 0.6277884117364884,
        "train samples": 7000,
        "train time": 45.44054208400212,
        "eval time": 16.52979276899987,
        "tokens / sec": 4607.2293682804875,
        "mem allocated avg": 6895174580.224,
        "mem reserved avg": 14716906766.336,
        "elapsed time": 817.9448886139999
      },
      {
        "step": 2000,
        "valid accuracy": 0.38,
        "train loss": 0.6284448710680008,
        "train samples": 8000,
        "train time": 44.66441460200076,
        "eval time": 16.455440011000064,
        "tokens / sec": 4650.144905082808,
        "mem allocated avg": 6891904557.056,
        "mem reserved avg": 14653740548.096,
        "elapsed time": 937.2175861730002
      },
      {
        "step": 2250,
        "valid accuracy": 0.38,
        "train loss": 0.6159043073654175,
        "train samples": 9000,
        "train time": 46.132000129995504,
        "eval time": 10.414071448999948,
        "tokens / sec": 4659.412108607851,
        "mem allocated avg": 6903182145.536,
        "mem reserved avg": 14849203503.104,
        "elapsed time": 1052.6482256069999
      },
      {
        "step": 2500,
        "valid accuracy": 0.46,
        "train loss": 0.6123742452859878,
        "train samples": 10000,
        "train time": 44.48052799800462,
        "eval time": 9.077246988999832,
        "tokens / sec": 4630.498091417431,
        "mem allocated avg": 6888177211.392,
        "mem reserved avg": 14597721423.872,
        "elapsed time": 1164.3923993470003
      },
      {
        "step": 2750,
        "valid accuracy": 0.46,
        "train loss": 0.6012357432842255,
        "train samples": 11000,
        "train time": 45.48063802700244,
        "eval time": 16.48930690899988,
        "tokens / sec": 4658.7077312812435,
        "mem allocated avg": 6898979477.504,
        "mem reserved avg": 14782279188.48,
        "elapsed time": 1285.076055947
      },
      {
        "step": 3000,
        "valid accuracy": 0.46,
        "train loss": 0.590102802991867,
        "train samples": 12000,
        "train time": 44.94007473199872,
        "eval time": 16.46088983999971,
        "tokens / sec": 4644.651822338361,
        "mem allocated avg": 6894117320.704,
        "mem reserved avg": 14690650423.296,
        "elapsed time": 1404.875696617
      },
      {
        "step": 3250,
        "valid accuracy": 0.5,
        "train loss": 0.5990242173671723,
        "train samples": 13000,
        "train time": 45.31071554800019,
        "eval time": 16.52300792999995,
        "tokens / sec": 4654.55019743797,
        "mem allocated avg": 6895772430.336,
        "mem reserved avg": 14729800056.832,
        "elapsed time": 1525.008872587
      },
      {
        "step": 3500,
        "valid accuracy": 0.5,
        "train loss": 0.5803046126365662,
        "train samples": 14000,
        "train time": 44.94620334099545,
        "eval time": 14.057770697999786,
        "tokens / sec": 4666.690051853321,
        "mem allocated avg": 6893924593.664,
        "mem reserved avg": 14704818782.208,
        "elapsed time": 1642.582958221
      },
      {
        "step": 3750,
        "valid accuracy": 0.5,
        "train loss": 0.5769718471765518,
        "train samples": 15000,
        "train time": 46.15345219799838,
        "eval time": 16.54434743999991,
        "tokens / sec": 4695.271744144811,
        "mem allocated avg": 6905346478.08,
        "mem reserved avg": 14888957116.416,
        "elapsed time": 1764.270111406
      },
      {
        "step": 4000,
        "valid accuracy": 0.5,
        "train loss": 0.5857474536895751,
        "train samples": 16000,
        "train time": 44.38817892599582,
        "eval time": 16.396790597999825,
        "tokens / sec": 4604.22132524814,
        "mem allocated avg": 6886660577.28,
        "mem reserved avg": 14582965862.4,
        "elapsed time": 1883.048193569
      },
      {
        "step": 4250,
        "valid accuracy": 0.52,
        "train loss": 0.5724298695325851,
        "train samples": 17000,
        "train time": 46.189890748002654,
        "eval time": 16.550546742999813,
        "tokens / sec": 4576.520891839106,
        "mem allocated avg": 6897394636.8,
        "mem reserved avg": 14742080978.944,
        "elapsed time": 2004.408629256
      },
      {
        "step": 4500,
        "valid accuracy": 0.5,
        "train loss": 0.5789464256763458,
        "train samples": 18000,
        "train time": 45.51842483500286,
        "eval time": 16.440088976999505,
        "tokens / sec": 4565.579779030307,
        "mem allocated avg": 6892786214.912,
        "mem reserved avg": 14656919830.528,
        "elapsed time": 2124.6471290850004
      },
      {
        "step": 4750,
        "valid accuracy": 0.5,
        "train loss": 0.567945005774498,
        "train samples": 19000,
        "train time": 44.95984372499879,
        "eval time": 16.491939075999653,
        "tokens / sec": 4669.47797425881,
        "mem allocated avg": 6893964591.104,
        "mem reserved avg": 14709189246.976,
        "elapsed time": 2244.6083886899996
      },
      {
        "step": 5000,
        "valid accuracy": 0.48,
        "train loss": 0.5767219476699829,
        "train samples": 20000,
        "train time": 45.47602556899801,
        "eval time": 16.543005558000004,
        "tokens / sec": 4579.995665715981,
        "mem allocated avg": 6891249879.04,
        "mem reserved avg": 14656341016.576,
        "elapsed time": 2364.9427617600004
      },
      {
        "step": 5000,
        "test accuracy": 0.49507202426080366,
        "train loss": 0.5767219476699829,
        "train samples": 20000,
        "train total tokens": 4198051
      }
    ]
  },
  "meta_info": {
    "model_info": {
      "sha": "13afe5124825b4f3751f836b40dafda64c1ed062",
      "created_at": "2024-09-18T15:23:48+00:00"
    },
    "dataset_info": {
      "metamath": {
        "sha": "aa4f34d3d2d3231299b5b03d9b3e5a20da45aa18",
        "created_at": "2023-09-21T17:22:46+00:00"
      },
      "gsm8k": {
        "sha": "e53f048856ff4f594e959d75785d2c2d37b678ee",
        "created_at": "2022-04-12T10:22:10+00:00"
      }
    },
    "package_info": {
      "transformers-version": "4.52.4",
      "transformers-commit-hash": null,
      "peft-version": "0.17.1.dev0",
      "peft-commit-hash": "04d41cbcd061bf1ab1185e111054bae012cb1894",
      "datasets-version": "3.6.0",
      "datasets-commit-hash": null,
      "bitsandbytes-version": "0.46.0",
      "bitsandbytes-commit-hash": null,
      "torch-version": "2.7.1+cu126",
      "torch-commit-hash": null
    },
    "system_info": {
      "system": "Linux",
      "release": "6.14.0-1010-aws",
      "version": "#10~24.04.1-Ubuntu SMP Fri Jul 18 20:44:30 UTC 2025",
      "machine": "x86_64",
      "processor": "x86_64",
      "accelerator": "NVIDIA L40S"
    },
    "pytorch_info": "PyTorch built with:\n  - GCC 11.2\n  - C++ Version: 201703\n  - Intel(R) oneAPI Math Kernel Library Version 2024.2-Product Build 20240605 for Intel(R) 64 architecture applications\n  - Intel(R) MKL-DNN v3.7.1 (Git Hash 8d263e693366ef8db40acc569cc7d8edf644556d)\n  - OpenMP 201511 (a.k.a. OpenMP 4.5)\n  - LAPACK is enabled (usually provided by MKL)\n  - NNPACK is enabled\n  - CPU capability usage: AVX2\n  - CUDA Runtime 12.6\n  - NVCC architecture flags: -gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_90,code=sm_90\n  - CuDNN 90.7.1  (built against CUDA 12.8)\n    - Built with CuDNN 90.5.1\n  - Magma 2.6.1\n  - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, COMMIT_SHA=e2d141dbde55c2a4370fac5165b0561b6af4798b, CUDA_VERSION=12.6, CUDNN_VERSION=9.5.1, CXX_COMPILER=/opt/rh/gcc-toolset-11/root/usr/bin/c++, CXX_FLAGS= -D_GLIBCXX_USE_CXX11_ABI=1 -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOROCTRACER -DLIBKINETO_NOXPUPTI=ON -DUSE_FBGEMM -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Werror=range-loop-construct -Werror=bool-operation -Wnarrowing -Wno-missing-field-initializers -Wno-unknown-pragmas -Wno-unused-parameter -Wno-strict-overflow -Wno-strict-aliasing -Wno-stringop-overflow -Wsuggest-override -Wno-psabi -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, TORCH_VERSION=2.7.1, USE_CUDA=ON, USE_CUDNN=ON, USE_CUSPARSELT=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_GLOO=ON, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=1, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF, USE_ROCM_KERNEL_ASSERT=OFF, \n"
  }
}

@Joluck
Copy link
Contributor

Joluck commented Aug 18, 2025

I re-ran the Bone example on the same machine, now getting an even lower accuracy (49.5%). I'll probably re-run it a few more times to see how high the variance is in general.

Result for Bone Bat

{
  "run_info": {
    "created_at": "2025-08-15T13:38:21+00:00",
    "total_time": 2758.8984308090003,
    "experiment_name": "bone/llama-3.2-3B-bat",
    "peft_branch": "main",
    "train_config": {
      "model_id": "meta-llama/Llama-3.2-3B",
      "dtype": "bfloat16",
      "max_seq_length": 768,
      "batch_size": 4,
      "batch_size_eval": 50,
      "max_steps": 5000,
      "eval_steps": 250,
      "compile": false,
      "query_template": "Question: {query} Think step by step.\nAnswer:",
      "seed": 0,
      "grad_norm_clip": 1.0,
      "optimizer_type": "AdamW",
      "optimizer_kwargs": {
        "lr": 0.0001,
        "weight_decay": 0.1
      },
      "lr_scheduler": "cosine",
      "use_amp": false,
      "autocast_adapter_dtype": true,
      "generation_kwargs": {
        "max_length": 800,
        "max_new_tokens": 300
      },
      "attn_implementation": null
    },
    "peft_config": {
      "task_type": null,
      "peft_type": "BONE",
      "auto_mapping": null,
      "base_model_name_or_path": "meta-llama/Llama-3.2-3B",
      "revision": null,
      "inference_mode": false,
      "r": 64,
      "target_modules": [
        "q_proj",
        "v_proj"
      ],
      "exclude_modules": null,
      "init_weights": "bat",
      "layers_to_transform": null,
      "layers_pattern": null,
      "bias": "none",
      "modules_to_save": null
    },
    "error_msg": ""
  },
  "train_info": {
    "accelerator_memory_reserved_avg": 14713894417,
    "accelerator_memory_max": 25251807232,
    "accelerator_memory_reserved_99th": 20472733368,
    "train_time": 2467.8785469740014,
    "file_size": 29367552,
    "num_trainable_params": 7340032,
    "num_total_params": 3220089856,
    "status": "success",
    "metrics": [
      {
        "step": 250,
        "valid accuracy": 0.32,
        "train loss": 0.8741402707099915,
        "train samples": 1000,
        "train time": 44.84663501100022,
        "eval time": 16.530845782000142,
        "tokens / sec": 4720.956208822991,
        "mem allocated avg": 6898546569.216,
        "mem reserved avg": 14772112195.584,
        "elapsed time": 125.1565625950002
      },
      {
        "step": 500,
        "valid accuracy": 0.42,
        "train loss": 0.6949697629213333,
        "train samples": 2000,
        "train time": 44.66738984100175,
        "eval time": 12.175719579000088,
        "tokens / sec": 4656.529086216588,
        "mem allocated avg": 6890138988.544,
        "mem reserved avg": 14663949484.032,
        "elapsed time": 240.0960283049999
      },
      {
        "step": 750,
        "valid accuracy": 0.38,
        "train loss": 0.667268633723259,
        "train samples": 3000,
        "train time": 45.62526284499927,
        "eval time": 8.235976585000117,
        "tokens / sec": 4699.172928129208,
        "mem allocated avg": 6901011800.064,
        "mem reserved avg": 14819080011.776,
        "elapsed time": 352.3821910219999
      },
      {
        "step": 1000,
        "valid accuracy": 0.44,
        "train loss": 0.6479405733346939,
        "train samples": 4000,
        "train time": 44.807461878997856,
        "eval time": 9.97781685100017,
        "tokens / sec": 4649.582709295373,
        "mem allocated avg": 6892128219.136,
        "mem reserved avg": 14679913005.056,
        "elapsed time": 465.37876664800024
      },
      {
        "step": 1250,
        "valid accuracy": 0.34,
        "train loss": 0.643578136086464,
        "train samples": 5000,
        "train time": 45.07155244600017,
        "eval time": 8.857318488000146,
        "tokens / sec": 4626.8208810834185,
        "mem allocated avg": 6892222337.024,
        "mem reserved avg": 14675131498.496,
        "elapsed time": 577.4979845480002
      },
      {
        "step": 1500,
        "valid accuracy": 0.48,
        "train loss": 0.6369394363164902,
        "train samples": 6000,
        "train time": 45.09846532499796,
        "eval time": 16.40352508900014,
        "tokens / sec": 4641.643534685168,
        "mem allocated avg": 6893671811.072,
        "mem reserved avg": 14706127405.056,
        "elapsed time": 697.4003612199999
      },
      {
        "step": 1750,
        "valid accuracy": 0.46,
        "train loss": 0.6277884117364884,
        "train samples": 7000,
        "train time": 45.44054208400212,
        "eval time": 16.52979276899987,
        "tokens / sec": 4607.2293682804875,
        "mem allocated avg": 6895174580.224,
        "mem reserved avg": 14716906766.336,
        "elapsed time": 817.9448886139999
      },
      {
        "step": 2000,
        "valid accuracy": 0.38,
        "train loss": 0.6284448710680008,
        "train samples": 8000,
        "train time": 44.66441460200076,
        "eval time": 16.455440011000064,
        "tokens / sec": 4650.144905082808,
        "mem allocated avg": 6891904557.056,
        "mem reserved avg": 14653740548.096,
        "elapsed time": 937.2175861730002
      },
      {
        "step": 2250,
        "valid accuracy": 0.38,
        "train loss": 0.6159043073654175,
        "train samples": 9000,
        "train time": 46.132000129995504,
        "eval time": 10.414071448999948,
        "tokens / sec": 4659.412108607851,
        "mem allocated avg": 6903182145.536,
        "mem reserved avg": 14849203503.104,
        "elapsed time": 1052.6482256069999
      },
      {
        "step": 2500,
        "valid accuracy": 0.46,
        "train loss": 0.6123742452859878,
        "train samples": 10000,
        "train time": 44.48052799800462,
        "eval time": 9.077246988999832,
        "tokens / sec": 4630.498091417431,
        "mem allocated avg": 6888177211.392,
        "mem reserved avg": 14597721423.872,
        "elapsed time": 1164.3923993470003
      },
      {
        "step": 2750,
        "valid accuracy": 0.46,
        "train loss": 0.6012357432842255,
        "train samples": 11000,
        "train time": 45.48063802700244,
        "eval time": 16.48930690899988,
        "tokens / sec": 4658.7077312812435,
        "mem allocated avg": 6898979477.504,
        "mem reserved avg": 14782279188.48,
        "elapsed time": 1285.076055947
      },
      {
        "step": 3000,
        "valid accuracy": 0.46,
        "train loss": 0.590102802991867,
        "train samples": 12000,
        "train time": 44.94007473199872,
        "eval time": 16.46088983999971,
        "tokens / sec": 4644.651822338361,
        "mem allocated avg": 6894117320.704,
        "mem reserved avg": 14690650423.296,
        "elapsed time": 1404.875696617
      },
      {
        "step": 3250,
        "valid accuracy": 0.5,
        "train loss": 0.5990242173671723,
        "train samples": 13000,
        "train time": 45.31071554800019,
        "eval time": 16.52300792999995,
        "tokens / sec": 4654.55019743797,
        "mem allocated avg": 6895772430.336,
        "mem reserved avg": 14729800056.832,
        "elapsed time": 1525.008872587
      },
      {
        "step": 3500,
        "valid accuracy": 0.5,
        "train loss": 0.5803046126365662,
        "train samples": 14000,
        "train time": 44.94620334099545,
        "eval time": 14.057770697999786,
        "tokens / sec": 4666.690051853321,
        "mem allocated avg": 6893924593.664,
        "mem reserved avg": 14704818782.208,
        "elapsed time": 1642.582958221
      },
      {
        "step": 3750,
        "valid accuracy": 0.5,
        "train loss": 0.5769718471765518,
        "train samples": 15000,
        "train time": 46.15345219799838,
        "eval time": 16.54434743999991,
        "tokens / sec": 4695.271744144811,
        "mem allocated avg": 6905346478.08,
        "mem reserved avg": 14888957116.416,
        "elapsed time": 1764.270111406
      },
      {
        "step": 4000,
        "valid accuracy": 0.5,
        "train loss": 0.5857474536895751,
        "train samples": 16000,
        "train time": 44.38817892599582,
        "eval time": 16.396790597999825,
        "tokens / sec": 4604.22132524814,
        "mem allocated avg": 6886660577.28,
        "mem reserved avg": 14582965862.4,
        "elapsed time": 1883.048193569
      },
      {
        "step": 4250,
        "valid accuracy": 0.52,
        "train loss": 0.5724298695325851,
        "train samples": 17000,
        "train time": 46.189890748002654,
        "eval time": 16.550546742999813,
        "tokens / sec": 4576.520891839106,
        "mem allocated avg": 6897394636.8,
        "mem reserved avg": 14742080978.944,
        "elapsed time": 2004.408629256
      },
      {
        "step": 4500,
        "valid accuracy": 0.5,
        "train loss": 0.5789464256763458,
        "train samples": 18000,
        "train time": 45.51842483500286,
        "eval time": 16.440088976999505,
        "tokens / sec": 4565.579779030307,
        "mem allocated avg": 6892786214.912,
        "mem reserved avg": 14656919830.528,
        "elapsed time": 2124.6471290850004
      },
      {
        "step": 4750,
        "valid accuracy": 0.5,
        "train loss": 0.567945005774498,
        "train samples": 19000,
        "train time": 44.95984372499879,
        "eval time": 16.491939075999653,
        "tokens / sec": 4669.47797425881,
        "mem allocated avg": 6893964591.104,
        "mem reserved avg": 14709189246.976,
        "elapsed time": 2244.6083886899996
      },
      {
        "step": 5000,
        "valid accuracy": 0.48,
        "train loss": 0.5767219476699829,
        "train samples": 20000,
        "train time": 45.47602556899801,
        "eval time": 16.543005558000004,
        "tokens / sec": 4579.995665715981,
        "mem allocated avg": 6891249879.04,
        "mem reserved avg": 14656341016.576,
        "elapsed time": 2364.9427617600004
      },
      {
        "step": 5000,
        "test accuracy": 0.49507202426080366,
        "train loss": 0.5767219476699829,
        "train samples": 20000,
        "train total tokens": 4198051
      }
    ]
  },
  "meta_info": {
    "model_info": {
      "sha": "13afe5124825b4f3751f836b40dafda64c1ed062",
      "created_at": "2024-09-18T15:23:48+00:00"
    },
    "dataset_info": {
      "metamath": {
        "sha": "aa4f34d3d2d3231299b5b03d9b3e5a20da45aa18",
        "created_at": "2023-09-21T17:22:46+00:00"
      },
      "gsm8k": {
        "sha": "e53f048856ff4f594e959d75785d2c2d37b678ee",
        "created_at": "2022-04-12T10:22:10+00:00"
      }
    },
    "package_info": {
      "transformers-version": "4.52.4",
      "transformers-commit-hash": null,
      "peft-version": "0.17.1.dev0",
      "peft-commit-hash": "04d41cbcd061bf1ab1185e111054bae012cb1894",
      "datasets-version": "3.6.0",
      "datasets-commit-hash": null,
      "bitsandbytes-version": "0.46.0",
      "bitsandbytes-commit-hash": null,
      "torch-version": "2.7.1+cu126",
      "torch-commit-hash": null
    },
    "system_info": {
      "system": "Linux",
      "release": "6.14.0-1010-aws",
      "version": "#10~24.04.1-Ubuntu SMP Fri Jul 18 20:44:30 UTC 2025",
      "machine": "x86_64",
      "processor": "x86_64",
      "accelerator": "NVIDIA L40S"
    },
    "pytorch_info": "PyTorch built with:\n  - GCC 11.2\n  - C++ Version: 201703\n  - Intel(R) oneAPI Math Kernel Library Version 2024.2-Product Build 20240605 for Intel(R) 64 architecture applications\n  - Intel(R) MKL-DNN v3.7.1 (Git Hash 8d263e693366ef8db40acc569cc7d8edf644556d)\n  - OpenMP 201511 (a.k.a. OpenMP 4.5)\n  - LAPACK is enabled (usually provided by MKL)\n  - NNPACK is enabled\n  - CPU capability usage: AVX2\n  - CUDA Runtime 12.6\n  - NVCC architecture flags: -gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_90,code=sm_90\n  - CuDNN 90.7.1  (built against CUDA 12.8)\n    - Built with CuDNN 90.5.1\n  - Magma 2.6.1\n  - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, COMMIT_SHA=e2d141dbde55c2a4370fac5165b0561b6af4798b, CUDA_VERSION=12.6, CUDNN_VERSION=9.5.1, CXX_COMPILER=/opt/rh/gcc-toolset-11/root/usr/bin/c++, CXX_FLAGS= -D_GLIBCXX_USE_CXX11_ABI=1 -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOROCTRACER -DLIBKINETO_NOXPUPTI=ON -DUSE_FBGEMM -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Werror=range-loop-construct -Werror=bool-operation -Wnarrowing -Wno-missing-field-initializers -Wno-unknown-pragmas -Wno-unused-parameter -Wno-strict-overflow -Wno-strict-aliasing -Wno-stringop-overflow -Wsuggest-override -Wno-psabi -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, TORCH_VERSION=2.7.1, USE_CUDA=ON, USE_CUDNN=ON, USE_CUSPARSELT=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_GLOO=ON, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=1, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF, USE_ROCM_KERNEL_ASSERT=OFF, \n"
  }
}

I will also check it.

@BenjaminBossan
Copy link
Member Author

I ran a few Bone-bat experiments, variance of test accuracy is quite high:

  • 52.0%
  • 49.1%
  • 49.6%
  • 50.6%

I'll try a few other PEFT methods like LoRA next.

@Joluck
Copy link
Contributor

Joluck commented Aug 20, 2025

I ran a few Bone-bat experiments, variance of test accuracy is quite high:

  • 52.0%
  • 49.1%
  • 49.6%
  • 50.6%

I'll try a few other PEFT methods like LoRA next.

Yes, we should indeed verify whether other PEFT methods also produce such large variances.

@BenjaminBossan
Copy link
Member Author

I ran a few more experiments with LoRA rank 32, the rest being the same, and collected the test accuracy. I just from eyeballing, and from running a Levene/Brown-Forsythe test, it looks pretty much like {MiSS,Bone}-Bat has a higher variance than LoRA:

import numpy as np
import scipy as sp

# note: test set size is 1319
acc_lora = [0.48218347232752085, 0.47536012130401817, 0.4829416224412434, 0.4836997725549659, 0.47763457164518575, 0.4783927217589083, 0.4829416224412434, 0.4715693707354056]
acc_bat = [0.5200909780136467, 0.49052312357846856, 0.49583017437452614, 0.5056861258529188, 0.5170583775587566, 0.5049279757391963]

stat, p = sp.stats.levene(acc_lora, acc_bat, center='mean')
print(f"Levene W={stat:.3f}, p={p:.4f}")
# prints: Levene W=4.087, p=0.0661

stat, p = sp.stats.levene(acc_lora, acc_bat, center='median')
print(f"Brown–Forsythe W={stat:.3f}, p={p:.4f}")
# prints: Brown–Forsythe W=3.960, p=0.0699

Of course, this is not definitive proof, but it strongly suggests that Bat has higher variance.

@BenjaminBossan
Copy link
Member Author

@JL-er Any ideas how to proceed? Is it possible that Bat is especially sensitive to small changes? The training loss is pretty consistent between the runs, so the differences there are not big, but the test accuracy still differs a lot (as generation can exacerbate small differences).

@Joluck
Copy link
Contributor

Joluck commented Aug 28, 2025

@JL-er Any ideas how to proceed? Is it possible that Bat is especially sensitive to small changes? The training loss is pretty consistent between the runs, so the differences there are not big, but the test accuracy still differs a lot (as generation can exacerbate small differences).

I haven’t had time to run tests recently, but I suspect that the instability in evaluation may be caused by the interaction between bat and the original weights.

cyyever pushed a commit to cyyever/peft that referenced this pull request Sep 4, 2025
* Nit fix about max_lenth argument.

* copy to docstring

* typo

* consistency

---------

Co-authored-by: Quentin Gallouédec <quentin.gallouedec@huggingface.co>
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
@github-actions
Copy link

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

@BenjaminBossan
Copy link
Member Author

@JL-er Did you have the time to investigate this further?

@Joluck
Copy link
Contributor

Joluck commented Sep 23, 2025

@JL-er Did you have the time to investigate this further?

I haven’t had time to run tests recently, but I suspect that the instability in evaluation may be caused by the interaction between bat and the original weights.

The bat should be fine, but it is too sensitive to LR.

@BenjaminBossan
Copy link
Member Author

Okay, so we can just merge this PR, right? Maybe in a separate PR, it could be documented that Bat init can be unstable.

@Joluck
Copy link
Contributor

Joluck commented Sep 23, 2025

Okay, so we can just merge this PR, right? Maybe in a separate PR, it could be documented that Bat init can be unstable.

Yes, we can merge the PR. We will propose new methods later.

@BenjaminBossan BenjaminBossan merged commit 530d7bb into huggingface:main Sep 25, 2025
25 of 40 checks passed
@BenjaminBossan BenjaminBossan deleted the method-comparison-results-for-miss branch September 25, 2025 15:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants