FIX Multiple issues with target_parameters #2710

BenjaminBossan · 2025-08-05T16:49:01Z

There are a few issues with target_parameters that are fixed in this PR.

Existing parametrizations

When using target_parameters with LoRA, after the forward call finishes, the LoRA parametrization is removed. However, this also used to remove all other parametrizations on the same parameter, which is bad. With this PR, only the LoRA parametrization is removed.

Module `repr`

This PR also extends the __repr__ of lora.ParamWrapper to contain the parameter name, which makes it more useful.

Extend testing

Added a tiny gpt-oss model to the target_parameters test suite.

Multiple LoRA adapters with `target_parameters`

There is an issue when adding a second LoRA adapter with target_paramters, where this second adapter would not actually be applied correctly. The corresponding unit test was too lax to notice the bug. This is not easy to fix, so for now we forbid adding a second adapter with target_parameters. This is very strict but it's better than having silent errors.

Although it was possible to fix that specific issue, the solution resulted in ever deeply nested adapters (i.e. with multiple .base_layer). This in turn results in those infixes to be part of the state_dict. But then we cannot load the individual adapters correctly, except if the model is restored in the exact same order as it was previously created. This is not normally a requirement in PEFT (e.g. I can create a model with two adapters and later decide to load only one of them).

In the long run, we need to think about solutions that would allow this. It may require some form of normalization of the layers to prevent ever deeper nesting. Also, what is ugly right now is that, given that the LoRA lives on a module but actually targets one of possibly multiple parameter, the LoRA weights don't actually reference said parameter in any name. That means, purely from the state_dict, it is unclear which parameter a LoRA weight belongs to. Ideally, this should be encoded in the LoRA weight key.

There are a few issues with target_parameters that are fixed in this PR. Existing parametrizations When using target_parameters with LoRA, after the forward call finishes, the LoRA parametrization is removed. However, this also used to remove all other parametrizations on the same parameter, which is bad. With this PR, only the LoRA parametrization is removed. Module repr This PR also extends the __repr__ of lora.ParamWrapper to contain the parameter name, which makes it more useful. Multiple LoRA adapters with target_parameters There is an issue when adding a second LoRA adapter with target_paramters, where this second adapter would not actually be applied correctly. The corresponding unit test was too lax to notice the bug. This is not easy to fix, so for now we forbid adding a second adapter with target_parameters. This is very strict but it's better than having silent errors. Although it was possible to fix that specific issue, the solution resulted in ever deeply nested adapters (i.e. with multiple .base_layer). This in turn results in those infixes to be part of the state_dict. But then we cannot load the individual adapters correctly, except if the model is restored in the exact same order as it was previously created. This is not normally a requirement in PEFT (e.g. I can create a model with two adapters and later decide to load only one of them). In the long run, we need to think about solutions that would allow this. It may require some form of normalization of the layers to prevent ever deeper nesting. Also, what is ugly right now is that, given that the LoRA lives on a module but actually targets one of possibly multiple parameter, the LoRA weights don't actually reference said parameter in any name. That means, purely from the state_dict, it is unclear which parameter a LoRA weight belongs to. Ideally, this should be encoded in the LoRA weight key.

BenjaminBossan · 2025-08-05T16:50:01Z

@matthewdouglas The first part of the fixes in this PR should hopefully address the issue we discussed internally with bnb. Check out the tests that show this: https://github.com/huggingface/peft/pull/2710/files#diff-c8dcf5ce96401fd88057132c84ee4a25da3550574250c9a1857b8aa1672ea540R406

HuggingFaceDocBuilderDev · 2025-08-05T16:53:00Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

BenjaminBossan · 2025-08-05T16:53:53Z

To better illustrate the existing issues with multiple adapters, check out this script:

from transformers import AutoModelForCausalLM
from peft import LoraConfig, PeftModel, get_peft_model
from safetensors.torch import load_file

model_id = "trl-internal-testing/tiny-Llama4ForCausalLM"
path = "/tmp/peft/target-params"

model = AutoModelForCausalLM.from_pretrained(model_id)
config = LoraConfig(
    target_modules=[],
    target_parameters=[
        "feed_forward.experts.down_proj",
        "feed_forward.experts.gate_up_proj",
    ],
)
model = get_peft_model(model, config)
model.add_adapter("other", config)
print(model)
model.save_pretrained(path)

sd = load_file(path + "/adapter_model.safetensors")
print("default sd")
for k in sd:
    print(k)

print("other sd")
sd_other  = load_file(path + "/other/adapter_model.safetensors")
for k in sd:
    print(k)

del model

# this works
print("loading in same order")
model = AutoModelForCausalLM.from_pretrained(model_id)
model = PeftModel.from_pretrained(model, path)
out = model.load_adapter(path + "/other", adapter_name="other")
print(out)

del model

print("loading in reverse order, should theoretically not matter!")
model = AutoModelForCausalLM.from_pretrained(model_id)
model = PeftModel.from_pretrained(model, path + "/other", adapter_name="other")
out = model.load_adapter(path, adapter_name="default")
print(out)

Running this gives us:

PeftModel(
  (base_model): LoraModel(
    (model): Llama4ForCausalLM(
      (model): Llama4TextModel(
        (embed_tokens): Embedding(201135, 8)
        (layers): ModuleList(
          (0-1): 2 x Llama4TextDecoderLayer(
            (self_attn): Llama4TextAttention(
              (q_proj): Linear(in_features=8, out_features=512, bias=False)
              (k_proj): Linear(in_features=8, out_features=256, bias=False)
              (v_proj): Linear(in_features=8, out_features=256, bias=False)
              (o_proj): Linear(in_features=512, out_features=8, bias=False)
              (qk_norm): Llama4TextL2Norm(eps=1e-05)
            )
            (feed_forward): Llama4TextMoe(
              (experts): lora.ParamWrapper(
                (base_layer): lora.ParamWrapper(
                  (base_layer): Llama4TextExperts(
                    (act_fn): SiLU()
                  )
                  (lora_dropout): ModuleDict(
                    (default): Identity()
                  )
                  (lora_A): ModuleDict(
                    (default): Linear(in_features=8, out_features=128, bias=False)
                  )
                  (lora_B): ModuleDict(
                    (default): Linear(in_features=128, out_features=64, bias=False)
                  )
                  (lora_embedding_A): ParameterDict()
                  (lora_embedding_B): ParameterDict()
                  (lora_magnitude_vector): ModuleDict()
                )
                (lora_dropout): ModuleDict(
                  (default): Identity()
                  (other): Identity()
                )
                (lora_A): ModuleDict(
                  (default): Linear(in_features=32, out_features=128, bias=False)
                  (other): Linear(in_features=32, out_features=128, bias=False)
                )
                (lora_B): ModuleDict(
                  (default): Linear(in_features=128, out_features=8, bias=False)
                  (other): Linear(in_features=128, out_features=8, bias=False)
                )
                (lora_embedding_A): ParameterDict()
                (lora_embedding_B): ParameterDict()
                (lora_magnitude_vector): ModuleDict()
              )
              (router): Llama4Router(in_features=8, out_features=16, bias=False)
              (shared_expert): Llama4TextMLP(
                (gate_proj): Linear(in_features=8, out_features=32, bias=False)
                (up_proj): Linear(in_features=8, out_features=32, bias=False)
                (down_proj): Linear(in_features=32, out_features=8, bias=False)
                (activation_fn): SiLU()
              )
            )
            (input_layernorm): Llama4TextRMSNorm((8,), eps=1e-05)
            (post_attention_layernorm): Llama4TextRMSNorm((8,), eps=1e-05)
          )
        )
        (norm): Llama4TextRMSNorm((8,), eps=1e-05)
        (rotary_emb): Llama4TextRotaryEmbedding()
      )
      (lm_head): Linear(in_features=8, out_features=201135, bias=False)
    )
  )
)

default sd
base_model.model.model.layers.0.feed_forward.experts.base_layer.lora_A.weight
base_model.model.model.layers.0.feed_forward.experts.base_layer.lora_B.weight
base_model.model.model.layers.0.feed_forward.experts.lora_A.weight
base_model.model.model.layers.0.feed_forward.experts.lora_B.weight
base_model.model.model.layers.1.feed_forward.experts.base_layer.lora_A.weight
base_model.model.model.layers.1.feed_forward.experts.base_layer.lora_B.weight
base_model.model.model.layers.1.feed_forward.experts.lora_A.weight
base_model.model.model.layers.1.feed_forward.experts.lora_B.weight

other sd
base_model.model.model.layers.0.feed_forward.experts.base_layer.lora_A.weight
base_model.model.model.layers.0.feed_forward.experts.base_layer.lora_B.weight
base_model.model.model.layers.0.feed_forward.experts.lora_A.weight
base_model.model.model.layers.0.feed_forward.experts.lora_B.weight
base_model.model.model.layers.1.feed_forward.experts.base_layer.lora_A.weight
base_model.model.model.layers.1.feed_forward.experts.base_layer.lora_B.weight
base_model.model.model.layers.1.feed_forward.experts.lora_A.weight
base_model.model.model.layers.1.feed_forward.experts.lora_B.weight

loading in same order
<All keys matched successfully>

loading in reverse order, should theoretically not matter!
UserWarning: Found missing adapter keys while loading the checkpoint: ['base_model.model.model.layers.0.feed_forward.experts.base_layer.lora_A.other.weight', 'base_model.model.model.layers.0.feed_forward.experts.base_layer.lora_B.other.weight', 'base_model.model.model.layers.1.feed_forward.experts.base_layer.lora_A.other.weight', 'base_model.model.model.layers.1.feed_forward.experts.base_layer.lora_B.other.weight'].
  warnings.warn(warn_message)
_IncompatibleKeys(missing_keys=[], unexpected_keys=['base_model.model.model.layers.0.feed_forward.experts.base_layer.lora_A.default.weight', 'base_model.model.model.layers.0.feed_forward.experts.base_layer.lora_B.default.weight', 'base_model.model.model.layers.1.feed_forward.experts.base_layer.lora_A.default.weight', 'base_model.model.model.layers.1.feed_forward.experts.base_layer.lora_B.default.weight'])

We create a LoRA model with two adapters that use target_modules. The issues we can see are:

The inner lora.ParamWrapper does not contain any LoRA weights for other.
When we load them back in the exact same order, everything works. When we load them in reverse order, which should not make a difference, we get errors.
If we were to strip the .base_layer part of the state_dict keys, the key names would be duplicate, as the parameter name is not part of the key.

BenjaminBossan · 2025-08-06T09:00:43Z

I'm investigating the failing CI. Update: Done, a new test was not implemented correctly.

githubnemo

Two small comments, otherwise LGTM. Thanks for handling the weirdness target params entail.

githubnemo · 2025-08-11T10:52:28Z

src/peft/tuners/lora/layer.py

        rep = super().__repr__()
        idx = rep.find("(") + 1
-        # insert the name of the parameter
+        # insert the name of the parameter to allow the repr to be disambiguous when multiple parameters on the same mdule


Suggested change

# insert the name of the parameter to allow the repr to be disambiguous when multiple parameters on the same mdule

# insert the name of the parameter to allow the repr to be disambiguous when multiple parameters on the same module

githubnemo · 2025-08-11T14:24:21Z

src/peft/tuners/tuners_utils.py

+                    self.targeted_parameter_names.append(key)
+            else:
+                # Standard case: the parameter is not already parametrized. Note, however, that the model could already
+                # be nested with lora.ParamWrapper.


Worthwhile adding how this comes to be (being already nested with ParamWrapper). I don't think that's obvious.

There are a few issues with target_parameters that are fixed in this PR. Existing parametrizations When using target_parameters with LoRA, after the forward call finishes, the LoRA parametrization is removed. However, this also used to remove all other parametrizations on the same parameter, which is bad. With this PR, only the LoRA parametrization is removed. Module repr This PR also extends the __repr__ of lora.ParamWrapper to contain the parameter name, which makes it more useful. Extend testing Added a tiny gpt-oss model to the target_parameters test suite. Multiple LoRA adapters with target_parameters There is an issue when adding a second LoRA adapter with target_paramters, where this second adapter would not actually be applied correctly. The corresponding unit test was too lax to notice the bug. This is not easy to fix, so for now we forbid adding a second adapter with target_parameters. This is very strict but it's better than having silent errors. Although it was possible to fix that specific issue, the solution resulted in ever deeply nested adapters (i.e. with multiple .base_layer). This in turn results in those infixes to be part of the state_dict. But then we cannot load the individual adapters correctly, except if the model is restored in the exact same order as it was previously created. This is not normally a requirement in PEFT (e.g. I can create a model with two adapters and later decide to load only one of them). In the long run, we need to think about solutions that would allow this. It may require some form of normalization of the layers to prevent ever deeper nesting. Also, what is ugly right now is that, given that the LoRA lives on a module but actually targets one of possibly multiple parameter, the LoRA weights don't actually reference said parameter in any name. That means, purely from the state_dict, it is unclear which parameter a LoRA weight belongs to. Ideally, this should be encoded in the LoRA weight key.

* FIX Multiple issues with target_parameters (#2710) * Bump version to 0.17.1

BenjaminBossan requested a review from githubnemo August 5, 2025 17:32

BenjaminBossan added 3 commits August 6, 2025 11:06

Fix issue in test

c1c717b

Add tiny gpt-oss to test suite

a3b1898

Add unit test for lora.ParamWrapper.__repr__

b158f7e

BenjaminBossan mentioned this pull request Aug 6, 2025

4bit quantization for arbitrary nn.Parameter bitsandbytes-foundation/bitsandbytes#1720

Merged

Merge branch 'main' into fix-target-parameters-multiple-issues

162b628

githubnemo reviewed Aug 11, 2025

View reviewed changes

Reviewer feedback: Type, better comment

d37135e

BenjaminBossan requested a review from githubnemo August 11, 2025 16:20

githubnemo approved these changes Aug 12, 2025

View reviewed changes

BenjaminBossan merged commit a2c6612 into huggingface:main Aug 12, 2025
11 of 14 checks passed

BenjaminBossan deleted the fix-target-parameters-multiple-issues branch August 12, 2025 11:59

BenjaminBossan mentioned this pull request Aug 14, 2025

Release: 0.17.1 changes #2739

Merged

BenjaminBossan added a commit that referenced this pull request Aug 21, 2025

Release: 0.17.1 changes (#2739)

53c25fe

* FIX Multiple issues with target_parameters (#2710) * Bump version to 0.17.1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

FIX Multiple issues with target_parameters #2710

FIX Multiple issues with target_parameters #2710

Uh oh!

BenjaminBossan commented Aug 5, 2025 •

edited

Loading

Uh oh!

BenjaminBossan commented Aug 5, 2025

Uh oh!

HuggingFaceDocBuilderDev commented Aug 5, 2025

Uh oh!

BenjaminBossan commented Aug 5, 2025

Uh oh!

BenjaminBossan commented Aug 6, 2025 •

edited

Loading

Uh oh!

githubnemo left a comment

Uh oh!

githubnemo Aug 11, 2025

Uh oh!

githubnemo Aug 11, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	# insert the name of the parameter to allow the repr to be disambiguous when multiple parameters on the same mdule
	# insert the name of the parameter to allow the repr to be disambiguous when multiple parameters on the same module

FIX Multiple issues with target_parameters #2710

FIX Multiple issues with target_parameters #2710

Uh oh!

Conversation

BenjaminBossan commented Aug 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Existing parametrizations

Module repr

Extend testing

Multiple LoRA adapters with target_parameters

Uh oh!

BenjaminBossan commented Aug 5, 2025

Uh oh!

HuggingFaceDocBuilderDev commented Aug 5, 2025

Uh oh!

BenjaminBossan commented Aug 5, 2025

Uh oh!

BenjaminBossan commented Aug 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

githubnemo left a comment

Choose a reason for hiding this comment

Uh oh!

githubnemo Aug 11, 2025

Choose a reason for hiding this comment

Uh oh!

githubnemo Aug 11, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

BenjaminBossan commented Aug 5, 2025 •

edited

Loading

Module `repr`

Multiple LoRA adapters with `target_parameters`

BenjaminBossan commented Aug 6, 2025 •

edited

Loading