SHiRA Adapters #2584

kkb-code · 2025-06-11T18:07:30Z

We would like to add code for Sparse High Rank Adapters (SHiRA) which was published at NeurIPS 2024 (PAPER LINK).

This is an alternate type of adapter and has been found to have significant advantages over the low rank adapters. Specifically, SHiRA achieves better accuracy than LoRA for a variety of vision and language tasks. It also offers simpler and higher quality multi-adapter fusion by significantly reducing concept loss, a common problem faced by low rank adapters. Please see the paper for more details.

kkb-code · 2025-06-11T18:29:21Z

Issue: #2585

BenjaminBossan

Thanks for this PR to integrate SHiRA into PEFT. It's good to see a paper that not only deals with academic improvements but also considers practical applications.

I went through the code and I think this generally already looks quite fine. Still, I found a couple of issues, please check.

Regarding testing: I saw that you added test_shira.py. This is fine for SHiRA-specific tests, but for the general PEFT integration, we need to update the existing PEFT tests to include SHiRA. As a first step, let's add one or a few examples using SHiRA to the custom model tests here:

peft/tests/test_custom_models.py

Line 68 in 759bb70

("Vanilla MLP 1 LoRA", "MLP", LoraConfig, {"target_modules": "lin0"}),

After that, please run pytest tests/test_custom_models.py -k "shira" to see if these tests pass.

Ideally, before we merge, we should also:

add docs
add an example
add a benchmark

Those steps are not strictly necessary, as they can be added later, but often it's a good idea to include them, as they can sometimes reveal new areas for improvement. But it's fine to focus on the proper implementation for now.

Finally, please always run style before pushing your changes to ensure that the linter is happy.

src/peft/tuners/shira/__init__.py

src/peft/tuners/shira/config.py

src/peft/tuners/shira/layer.py

src/peft/tuners/shira/mask_functions.py

src/peft/tuners/shira/layer.py

tests/test_shira.py

X1AOX1A · 2025-06-13T15:17:29Z

It seems like only the random mask is implemented, what about the SHiRA-WM, SHiRA-Grad and SHiRA-SNIP?

… BaseTunerLayer

kkb-code · 2025-06-18T19:12:22Z

Thanks for this PR to integrate SHiRA into PEFT. It's good to see a paper that not only deals with academic improvements but also considers practical applications.

I went through the code and I think this generally already looks quite fine. Still, I found a couple of issues, please check.

Regarding testing: I saw that you added test_shira.py. This is fine for SHiRA-specific tests, but for the general PEFT integration, we need to update the existing PEFT tests to include SHiRA. As a first step, let's add one or a few examples using SHiRA to the custom model tests here:

peft/tests/test_custom_models.py

Line 68 in 759bb70

("Vanilla MLP 1 LoRA", "MLP", LoraConfig, {"target_modules": "lin0"}),

After that, please run pytest tests/test_custom_models.py -k "shira" to see if these tests pass.

Ideally, before we merge, we should also:

add docs

add an example

add a benchmark

Those steps are not strictly necessary, as they can be added later, but often it's a good idea to include them, as they can sometimes reveal new areas for improvement. But it's fine to focus on the proper implementation for now.

Finally, please always run style before pushing your changes to ensure that the linter is happy.

Hi @BenjaminBossan, we have now included SHiRA in the test_custom_models.py and have conducted and passed all the pytests. Also, we have responded to and/or incorporated your requested changes in our new commits. Hopefully, this addresses your concerns.

We have fixed the Lint issues except for UP045 which existed in many other modules throughout the PEFT repo. Please let us know if you would like to fix those for our modules.

BenjaminBossan

Thanks a lot for the updates, this is shaping up quite well.

I still have a couple of comments, but those should not be too hard to address.

Moreover, let's work on the following to finalize the PR:

Please check your ruff version. It doesn't look like make style worked as expected. The CI uses ruff-0.9.10.
Let's add an entry to the docs, check how the other PEFT methods do it.
Let's add an example to the examples/ folder, what exactly you want is up to you. You can take an existing one and modify it for SHiRA.
Let's extend the test coverage by adding entries to test_decoder_models.py, test_encoder_decoder_models.py, test_feature_extraction_models.py and test_seq_classifier.py. This should be straightforward, as you only need to add a config to the list of test cases.

src/peft/tuners/shira/layer.py

tests/test_shira.py

BenjaminBossan · 2025-06-19T10:50:04Z

Ah, forgot to reply to this:

We believe that the future users may want to experiment with their own custom mask functions without requiring PEFT library changes. This is why we used a function reference in this field to allow for maximum flexibility (and we suggest that keeping this functionality would be beneficial to the community).

IMO, it makes little sense to have a config parameter (mask_fn) that isn't persisted. It means that the user needs to run custom code after loading the model to apply the correct mask function. If they do that anyway, we might as well remove the argument. I think something like this would work:

@dataclass
class ShiraConfig(PeftConfig):
    ...
    mask_type: Litera["random", <more-options>] = field(default='random', metadata={"help": "..."})

    def __post_init__(self):
        if self.mask_type == "random":
            self.mask_fn = random_mask
        elif ...
        else:
            warnings.warn(f"Argument {mask_type=} not recognized, please supply your own masking function by calling `config.mask_fn = my_mask_fn`.")
            self.mask_fn = None

Then, if users want to provide a custom mask function, they can still do:

shira_config = ShiraConfig(...)
shira_config.mask_function = my_custom_mask_function

The same would be needed after loading the model, but that's necessary anyway, even with the current implementation. It is important to document this well though.

Moreover, I think it would be really great if all the mask functions from the paper could be added.

…arate call to provide mask_fn

…use it throws an error (addmm_sparse_cuda not implemented for BFloat16). Minor clean up.

… typo fix in shira_finetuning.py

kkb-code · 2025-07-08T17:14:56Z

I investigated a bit more how we can avoid the spurious warning about mask_type and I believe my solution should help with that. Please check my suggestion.

Added.

Please also merge with/rebase on the latest main, as there was an issue with the latest transformers release. Then, CI should pass and we should be good to merge.

We have now merged with the latest main. We also verified make style and everything passed. This should be good to go from our side. Thanks a lot.

BenjaminBossan · 2025-07-09T10:08:15Z

Hmm, there is a strange error occurring on the Windows CI. It seems that each SHiRA test that involves persistence is affected. At first, I thought it might just be flakiness but I re-ran the CI today and it happened again. The error occurs when we have code like this in the tests:

with tempfile.TemporaryDirectory() as tmp_dirname:
    # do stuff

When the context manager is left, the temp directory is cleaned up, but here the cleanup fails with:

    def _rmtree_unsafe(path, onerror):
        try:
>           with os.scandir(path) as scandir_it:
E           NotADirectoryError: [WinError 267] The directory name is invalid: 'C:\\Users\\RUNNER~1\\AppData\\Local\\Temp\\tmptrnulxqo\\adapter_model.safetensors'

We have had some occasional issues in the past with Windows not being able to clean up temp directories, to wit:

peft/tests/testing_common.py

Lines 397 to 415 in 35000fd

    
           # note: not using the context manager here because it fails on Windows CI for some reason 
        
           tmp_dirname = tempfile.mkdtemp() 
        
           try: 
        
               model.save_pretrained(tmp_dirname) 
        
               model = self.transformers_class.from_pretrained(model_id).to(self.torch_device) 
        
               model = PeftModel.from_pretrained( 
        
                   model, tmp_dirname, torch_device=self.torch_device, low_cpu_mem_usage=True 
        
               ) 
        
               assert {p.device.type for p in model.parameters()} == {self.torch_device} 
        
               model.load_adapter(tmp_dirname, adapter_name="other", low_cpu_mem_usage=True) 
        
               assert {p.device.type for p in model.parameters()} == {self.torch_device} 
        
           finally: 
        
               try: 
        
                   shutil.rmtree(tmp_dirname) 
        
               except PermissionError: 
        
                   # windows error 
        
                   pass

However, we never had the case that so many tests were failing and that they were all tied to a single PEFT method. This makes me think that there is something going on with SHiRA that aggravates the problem, perhaps related to the shira_indices? Unfortunately, I don't have a Windows machine to test, do you happen to have one?

kkb-code · 2025-07-09T14:05:49Z

Hmm, there is a strange error occurring on the Windows CI. It seems that each SHiRA test that involves persistence is affected. At first, I thought it might just be flakiness but I re-ran the CI today and it happened again. The error occurs when we have code like this in the tests:
with tempfile.TemporaryDirectory() as tmp_dirname:
    # do stuff
When the context manager is left, the temp directory is cleaned up, but here the cleanup fails with:
    def _rmtree_unsafe(path, onerror):
        try:
>           with os.scandir(path) as scandir_it:
E           NotADirectoryError: [WinError 267] The directory name is invalid: 'C:\\Users\\RUNNER~1\\AppData\\Local\\Temp\\tmptrnulxqo\\adapter_model.safetensors'
We have had some occasional issues in the past with Windows not being able to clean up temp directories, to wit:

peft/tests/testing_common.py

Lines 397 to 415 in 35000fd

# note: not using the context manager here because it fails on Windows CI for some reason

tmp_dirname = tempfile.mkdtemp()

try:

model.save_pretrained(tmp_dirname)

model = self.transformers_class.from_pretrained(model_id).to(self.torch_device)

model = PeftModel.from_pretrained(

model, tmp_dirname, torch_device=self.torch_device, low_cpu_mem_usage=True

)

assert {p.device.type for p in model.parameters()} == {self.torch_device}

model.load_adapter(tmp_dirname, adapter_name="other", low_cpu_mem_usage=True)

assert {p.device.type for p in model.parameters()} == {self.torch_device}

finally:

try:

shutil.rmtree(tmp_dirname)

except PermissionError:

# windows error

pass

However, we never had the case that so many tests were failing and that they were all tied to a single PEFT method. This makes me think that there is something going on with SHiRA that aggravates the problem, perhaps related to the shira_indices? Unfortunately, I don't have a Windows machine to test, do you happen to have one?

This is strange. I do not have a windows machine that would be able to test things out. We usually work with Linux based systems.

Also, about shira_indices, the only place we include new code to save/load them always works at the level of state-dicts (e.g., to_return here and reading from peft_model_state_dict here). So, I am not sure how/why the directory names could get affected or become invalid for SHiRA since we did not change any paths/file names, etc.. It is strange though that only SHiRA tests are affected for these tests, however.

I am also seeing some Access Denied Errors in Windows? Maybe that is a culprit? See here. Also, adapter_model.safetensors is a file, not a directory?

BenjaminBossan · 2025-07-10T10:21:07Z

This is indeed a very strange situation and not something I've come across yet. Today, I re-ran the CI and still the same errors occur. Too bad that we don't have Windows machines to test on, so we need to resort to the GH CI instead.

Perhaps we can try to eliminate potential sources of error one by one. The first thing I'd test is to comment out all the additions to save_and_load.py. Of course, this will lead some tests to fail, but I'm curious if that's enough to fix the Windows permission error. Could you please try that? In addition, to avoid overexerting the CI, let's only test on a single Python version:

peft/.github/workflows/tests.yml

Line 45 in 35000fd

python-version: ["3.9", "3.10", "3.11", "3.12"]

-       python-version: ["3.9", "3.10", "3.11", "3.12"]
+       python-version: ["3.12"]

Of course, we will revert these changes before merging.

kkb-code · 2025-07-10T18:05:11Z

I have pushed a new commit. ~~I have also limited the tests just to windows to make sure it runs fast.~~ All shira related stuff from save_and_load.py was removed. UPDATE: I added back all OS (linux, mac, windows) since I thought that maybe removing the other two OS stopped automatic tests from running (not sure if there are automatic tests on the GH CI or if you run them manually....but just wanted to try this).

kkb-code · 2025-07-10T20:15:56Z

Interestingly, when I remove all SHiRA related lines from save_and_load.py, windows does not throw the PermissionError. Not sure what is going on...but I added another commit reverting back the save_and_load.py to see if it still fails. The windows and ubuntu tests on commit 34af1fd failed only 1 test which is expected (the custom mask save and load test would fail if we do not save and load shira-indices for the custom mask).

UPDATE: Commit ee9d1d5 fails with the same PermissionError as before. Not sure how adding something to a state-dict can cause that. We aren't really sure how to fix this. Nonetheless, I have added another commit f4ff99d to try and debug this. Here, I have kept the SHiRA weight saving elif block but removed everything else to that saves or loads shira indices.

Any help would be appreciated. Thanks.

… weights.

BenjaminBossan · 2025-07-11T08:50:42Z

Any help would be appreciated. Thanks.

I'm as stumped as you are, this really doesn't make a lot of sense. I wondered if you could test converting the ints to floats before saving and then back to ints when loading. This is not a final solution, just for testing if it could somehow be related to the dtype.

…ng and torch.int when loading

kkb-code · 2025-07-11T13:29:18Z

Just pushed a commit for SHiRA indices dtype changed to torch.float32 while saving and then torch.int while loading. Let us see if this helps.

Can we think of any other solutions that might help us merge this? For example, can we skip certain tests on windows OS only? This clearly seems like a Permission issue that happens only on the windows. If there was anything wrong with our code, we would see these issues happening on other OS too?

BenjaminBossan · 2025-07-11T14:17:53Z

Just pushed a commit for SHiRA indices dtype changed to torch.float32 while saving and then torch.int while loading. Let us see if this helps.

This seems to help indeed, there is only one Windows error related to 429 from the Hub, so it's good.

can we skip certain tests on windows OS only? This clearly seems like a Permission issue that happens only on the windows. If there was anything wrong with our code, we would see these issues happening on other OS too?

While generally I agree, I think there is a possibility that something more fundamental is broken here, so before we don't fully understand what is going on, merging is risky. I can't imagine that saving integers in the state_dict on Windows is generally broken, but maybe something more subtle is going on. Unfortunately, I currently don't have time to investigate and don't have a Windows machine, but perhaps you have an idea based on the latest finding.

kkb-code · 2025-07-11T14:51:51Z

This seems to help indeed, there is only one Windows error related to 429 from the Hub, so it's good.

While generally I agree, I think there is a possibility that something more fundamental is broken here, so before we don't fully understand what is going on, merging is risky. I can't imagine that saving integers in the state_dict on Windows is generally broken, but maybe something more subtle is going on. Unfortunately, I currently don't have time to investigate and don't have a Windows machine, but perhaps you have an idea based on the latest finding.

Okay, this is good, at least. Actually, it seems that there is something going on with Integers and Safetensors that is specific only to the Windows OS. Someone reported another issue on Safetensors github: please see here. Potentially looks like some kind of an overflow problem in Windows.

The reported array length is exactly 2 * (test_tensor.numel() - 2**32), leading me to suspect this is an integer overflow issue. The issue only seems to manifest on Windows, with the code above performing as expected on Linux.

Seems like the above issue was closed as it was not planned, so I don't think any fix was provided.

I am okay with casting indices to float32 while saving and then casting them back to int when loading them as a solution to this (because the file size should also stay the same). Seems like it does not affect any Linux tests.

BenjaminBossan · 2025-07-11T15:00:11Z

I am okay with casting indices to float32 while saving and then casting them back to int when loading them as a solution to this. Seems like it does not affect any Linux tests.

Okay, so how about only doing that when the platform is Windows? Then we can be sure that Linux and MacOS are unaffected. For reasonably small values, this should work fine, but at a certain size, the conversion should fail. I wonder if this point is known in advance.

kkb-code · 2025-07-11T15:12:52Z

I agree, precision issues can occur for very large integer values. So let me just limit this to windows systems. I will use something like platform.system() == 'Windows'. Similar platform.system() statements have been used in other places inside peft so this should work. Will push shortly.

… when saving. Convert to int when loading.

kkb-code · 2025-07-11T15:57:09Z

I just pushed cc47b6a. It brings back all tests (all python versions) and casts the shira_indices to float32 before saving only on windows. Hence, Mac-OS and Linux must be completely unaffected now. Thanks. Let us see if this works.

BTW, I am not sure what to do about Hub errors. I thought those were not happening before.

BenjaminBossan · 2025-07-11T16:07:45Z

Thanks @kkb-code let's go with this approach. Could you please ensure the 120 char line limit is taken care of?

BTW, I am not sure what to do about Hub errors. I thought those were not happening before.

Just ignore those, they are just rate limits.

kkb-code · 2025-07-11T16:22:49Z

Yep, thanks @BenjaminBossan. Just pushed another commit 2014c43. This should take care of the 120 char limit. I also fixed a couple minor typos in the warning texts.

BenjaminBossan

Thanks for taking care of the Windows issue and this contribution in general. The failing MacOS test is just flakiness and can be ignored. The PR is good to be merged.

kkb-code · 2025-07-14T18:05:32Z

Thank you so much @BenjaminBossan for the guidance throughout this process. We are happy to see that our SHiRA adapter was merged into PEFT. Please let us know if we should close Issue #2585 or if you would close it at a later time.

Also, I am wondering if there is any ETA on the next PEFT version release that would include the SHiRA update (e.g., v0.16.1)? Can you please let us know? That would allow users to directly use pip install ... and be able to use SHiRA.

Thanks again for the help. It was great working with you on this.

BenjaminBossan · 2025-07-15T09:00:30Z

Also, I am wondering if there is any ETA on the next PEFT version release that would include the SHiRA update (e.g., v0.16.1)? Can you please let us know? That would allow users to directly use pip install ... and be able to use SHiRA.

Even though we had a release just recently, we might have another one soon, but we have external dependencies for that, so I can't give a date. FYI, for new features such as SHiRA, we always do minor releases (i.e. it would be 0.17.0), patch releases are for hot fixes only. In the meantime, users can install directly from GH, e.g. pip install git+https://github.com/huggingface/peft (+ optional hash to fix the commit).

Thanks again for the help. It was great working with you on this.

Thanks, same from my side.

Implements: Sparse High Rank Adapters Paper: https://arxiv.org/abs/2406.13175

hteague-qti and others added 3 commits June 10, 2025 17:58

shira_implementation

66d5b14

Remove an unnecessary file

67c6c7f

Fix the help text in SHiRA config

ca7d7d7

kkb-code mentioned this pull request Jun 11, 2025

Sparse High Rank Adapters (SHiRA) Support #2585

Closed

kkb-code marked this pull request as draft June 12, 2025 22:50

kkb-code marked this pull request as ready for review June 12, 2025 22:52

Add unit tests and minor bug fix

7cbc3ec

BenjaminBossan requested changes Jun 13, 2025

View reviewed changes

kkb-code added 8 commits June 16, 2025 22:49

Add SHiRA support into test_custom_models.py

90d4f54

Add or correct copyright notices to all new SHiRA files

9345d87

Remove __getattr__ function from SHiRA __init__.py file

86276fa

Remove merged property from ShiraLayer class because it is already in…

327f1ad

… BaseTunerLayer

Clean up some unused variables in ShiraLayer

a448963

Remove kwargs line in SHiRA mask_functions.py

d849bca

Default random seed set to None for SHiRA random mask

ed7b2e0

Lint fixes

97a8a16

BenjaminBossan requested changes Jun 19, 2025

View reviewed changes

kkb-code added 9 commits June 26, 2025 00:17

Fix ShiraConfig and mask_fn implementation to have Literals and a sep…

d93f8b3

…arate call to provide mask_fn

Inherit ShiraLayer from nn.Module and a minor typo fix in config

cc13e33

Clean up dead code (shira_embedding and to_dict and other comments)

c363452

Minor bugfix in ShiraLayer

0ab4155

Fix docstring in SHiRA mask_functions.py

6a46c1d

Add remaining SHiRA tests

b272653

Attempt the new forward implementation for SHiRA. Left commented beca…

64a1d54

…use it throws an error (addmm_sparse_cuda not implemented for BFloat16). Minor clean up.

Lint fixes for SHiRA

9ad780f

Add SHiRA example and fix some docstrings in config.

0a85520

kkb-code added 2 commits July 8, 2025 16:50

Fix the warning issue in shira config for from_pretrained and a minor…

a3b2351

… typo fix in shira_finetuning.py

Merge branch 'main' into shira_adapters

41df090

Windows test -- no shira in save_and_load. Revert later.

56ef038

kkb-code added 2 commits July 10, 2025 19:17

Bring back all os for tests

34af1fd

Bring back shira part inside save_and_load.py

ee9d1d5

Remove just the shira indices saving and loading part. Save the shira…

f4ff99d

… weights.

More debugging. Change SHiRA indices dtype to torch.float32 when savi…

c94333a

…ng and torch.int when loading

Bring back all tests and cast shira_indices to float only for windows…

cc47b6a

… when saving. Convert to int when loading.

Fix 120 char limit in shira save_and_load.py

c93b36f

Minor typos fixed.

2014c43

BenjaminBossan approved these changes Jul 14, 2025

View reviewed changes

BenjaminBossan merged commit a4f9334 into huggingface:main Jul 14, 2025
32 of 40 checks passed

BenjaminBossan pushed a commit to BenjaminBossan/peft that referenced this pull request Jul 28, 2025

FEAT Add SHiRA Adapters (huggingface#2584)

32a1fb2

Implements: Sparse High Rank Adapters Paper: https://arxiv.org/abs/2406.13175

SHiRA Adapters #2584

SHiRA Adapters #2584

Uh oh!

Conversation

kkb-code commented Jun 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kkb-code commented Jun 11, 2025

Uh oh!

BenjaminBossan left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

X1AOX1A commented Jun 13, 2025

Uh oh!

kkb-code commented Jun 18, 2025

Uh oh!

BenjaminBossan left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

BenjaminBossan commented Jun 19, 2025

Uh oh!

kkb-code commented Jul 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

BenjaminBossan commented Jul 9, 2025

Uh oh!

kkb-code commented Jul 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

BenjaminBossan commented Jul 10, 2025

Uh oh!

kkb-code commented Jul 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kkb-code commented Jul 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

BenjaminBossan commented Jul 11, 2025

Uh oh!

kkb-code commented Jul 11, 2025

Uh oh!

BenjaminBossan commented Jul 11, 2025

Uh oh!

kkb-code commented Jul 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

BenjaminBossan commented Jul 11, 2025

Uh oh!

kkb-code commented Jul 11, 2025

Uh oh!

kkb-code commented Jul 11, 2025

Uh oh!

BenjaminBossan commented Jul 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kkb-code commented Jul 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

BenjaminBossan left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

kkb-code commented Jun 11, 2025 •

edited

Loading

kkb-code commented Jul 8, 2025 •

edited

Loading

kkb-code commented Jul 9, 2025 •

edited

Loading

kkb-code commented Jul 10, 2025 •

edited

Loading

kkb-code commented Jul 10, 2025 •

edited

Loading

kkb-code commented Jul 11, 2025 •

edited

Loading

BenjaminBossan commented Jul 11, 2025 •

edited

Loading

kkb-code commented Jul 11, 2025 •

edited

Loading