这是indexloc提供的服务,不要输入任何密码
Skip to content

Conversation

@BenjaminBossan
Copy link
Member

@BenjaminBossan BenjaminBossan commented Aug 5, 2025

There are a few issues with target_parameters that are fixed in this PR.

Existing parametrizations

When using target_parameters with LoRA, after the forward call finishes, the LoRA parametrization is removed. However, this also used to remove all other parametrizations on the same parameter, which is bad. With this PR, only the LoRA parametrization is removed.

Module repr

This PR also extends the __repr__ of lora.ParamWrapper to contain the parameter name, which makes it more useful.

Extend testing

Added a tiny gpt-oss model to the target_parameters test suite.

Multiple LoRA adapters with target_parameters

There is an issue when adding a second LoRA adapter with target_paramters, where this second adapter would not actually be applied correctly. The corresponding unit test was too lax to notice the bug. This is not easy to fix, so for now we forbid adding a second adapter with target_parameters. This is very strict but it's better than having silent errors.

Although it was possible to fix that specific issue, the solution resulted in ever deeply nested adapters (i.e. with multiple .base_layer). This in turn results in those infixes to be part of the state_dict. But then we cannot load the individual adapters correctly, except if the model is restored in the exact same order as it was previously created. This is not normally a requirement in PEFT (e.g. I can create a model with two adapters and later decide to load only one of them).

In the long run, we need to think about solutions that would allow this. It may require some form of normalization of the layers to prevent ever deeper nesting. Also, what is ugly right now is that, given that the LoRA lives on a module but actually targets one of possibly multiple parameter, the LoRA weights don't actually reference said parameter in any name. That means, purely from the state_dict, it is unclear which parameter a LoRA weight belongs to. Ideally, this should be encoded in the LoRA weight key.

There are a few issues with target_parameters that are fixed in this PR.

Existing parametrizations

When using target_parameters with LoRA, after the forward call finishes,
the LoRA parametrization is removed. However, this also used to remove
all other parametrizations on the same parameter, which is bad. With
this PR, only the LoRA parametrization is removed.

Module repr

This PR also extends the __repr__ of lora.ParamWrapper to contain the
parameter name, which makes it more useful.

Multiple LoRA adapters with target_parameters

There is an issue when adding a second LoRA adapter with
target_paramters, where this second adapter would not actually be
applied correctly. The corresponding unit test was too lax to notice the
bug. This is not easy to fix, so for now we forbid adding a second
adapter with target_parameters. This is very strict but it's better than
having silent errors.

Although it was possible to fix that specific issue, the solution
resulted in ever deeply nested adapters (i.e. with multiple
.base_layer). This in turn results in those infixes to be part of the
state_dict. But then we cannot load the individual adapters correctly,
except if the model is restored in the exact same order as it was
previously created. This is not normally a requirement in PEFT (e.g. I
can create a model with two adapters and later decide to load only one
of them).

In the long run, we need to think about solutions that would allow this.
It may require some form of normalization of the layers to prevent ever
deeper nesting. Also, what is ugly right now is that, given that the
LoRA lives on a module but actually targets one of possibly multiple
parameter, the LoRA weights don't actually reference said parameter in
any name. That means, purely from the state_dict, it is unclear which
parameter a LoRA weight belongs to. Ideally, this should be encoded in
the LoRA weight key.
@BenjaminBossan
Copy link
Member Author

@matthewdouglas The first part of the fixes in this PR should hopefully address the issue we discussed internally with bnb. Check out the tests that show this: https://github.com/huggingface/peft/pull/2710/files#diff-c8dcf5ce96401fd88057132c84ee4a25da3550574250c9a1857b8aa1672ea540R406

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@BenjaminBossan
Copy link
Member Author

To better illustrate the existing issues with multiple adapters, check out this script:

from transformers import AutoModelForCausalLM
from peft import LoraConfig, PeftModel, get_peft_model
from safetensors.torch import load_file

model_id = "trl-internal-testing/tiny-Llama4ForCausalLM"
path = "/tmp/peft/target-params"

model = AutoModelForCausalLM.from_pretrained(model_id)
config = LoraConfig(
    target_modules=[],
    target_parameters=[
        "feed_forward.experts.down_proj",
        "feed_forward.experts.gate_up_proj",
    ],
)
model = get_peft_model(model, config)
model.add_adapter("other", config)
print(model)
model.save_pretrained(path)

sd = load_file(path + "/adapter_model.safetensors")
print("default sd")
for k in sd:
    print(k)

print("other sd")
sd_other  = load_file(path + "/other/adapter_model.safetensors")
for k in sd:
    print(k)

del model

# this works
print("loading in same order")
model = AutoModelForCausalLM.from_pretrained(model_id)
model = PeftModel.from_pretrained(model, path)
out = model.load_adapter(path + "/other", adapter_name="other")
print(out)

del model

print("loading in reverse order, should theoretically not matter!")
model = AutoModelForCausalLM.from_pretrained(model_id)
model = PeftModel.from_pretrained(model, path + "/other", adapter_name="other")
out = model.load_adapter(path, adapter_name="default")
print(out)

Running this gives us:

PeftModel(
  (base_model): LoraModel(
    (model): Llama4ForCausalLM(
      (model): Llama4TextModel(
        (embed_tokens): Embedding(201135, 8)
        (layers): ModuleList(
          (0-1): 2 x Llama4TextDecoderLayer(
            (self_attn): Llama4TextAttention(
              (q_proj): Linear(in_features=8, out_features=512, bias=False)
              (k_proj): Linear(in_features=8, out_features=256, bias=False)
              (v_proj): Linear(in_features=8, out_features=256, bias=False)
              (o_proj): Linear(in_features=512, out_features=8, bias=False)
              (qk_norm): Llama4TextL2Norm(eps=1e-05)
            )
            (feed_forward): Llama4TextMoe(
              (experts): lora.ParamWrapper(
                (base_layer): lora.ParamWrapper(
                  (base_layer): Llama4TextExperts(
                    (act_fn): SiLU()
                  )
                  (lora_dropout): ModuleDict(
                    (default): Identity()
                  )
                  (lora_A): ModuleDict(
                    (default): Linear(in_features=8, out_features=128, bias=False)
                  )
                  (lora_B): ModuleDict(
                    (default): Linear(in_features=128, out_features=64, bias=False)
                  )
                  (lora_embedding_A): ParameterDict()
                  (lora_embedding_B): ParameterDict()
                  (lora_magnitude_vector): ModuleDict()
                )
                (lora_dropout): ModuleDict(
                  (default): Identity()
                  (other): Identity()
                )
                (lora_A): ModuleDict(
                  (default): Linear(in_features=32, out_features=128, bias=False)
                  (other): Linear(in_features=32, out_features=128, bias=False)
                )
                (lora_B): ModuleDict(
                  (default): Linear(in_features=128, out_features=8, bias=False)
                  (other): Linear(in_features=128, out_features=8, bias=False)
                )
                (lora_embedding_A): ParameterDict()
                (lora_embedding_B): ParameterDict()
                (lora_magnitude_vector): ModuleDict()
              )
              (router): Llama4Router(in_features=8, out_features=16, bias=False)
              (shared_expert): Llama4TextMLP(
                (gate_proj): Linear(in_features=8, out_features=32, bias=False)
                (up_proj): Linear(in_features=8, out_features=32, bias=False)
                (down_proj): Linear(in_features=32, out_features=8, bias=False)
                (activation_fn): SiLU()
              )
            )
            (input_layernorm): Llama4TextRMSNorm((8,), eps=1e-05)
            (post_attention_layernorm): Llama4TextRMSNorm((8,), eps=1e-05)
          )
        )
        (norm): Llama4TextRMSNorm((8,), eps=1e-05)
        (rotary_emb): Llama4TextRotaryEmbedding()
      )
      (lm_head): Linear(in_features=8, out_features=201135, bias=False)
    )
  )
)

default sd
base_model.model.model.layers.0.feed_forward.experts.base_layer.lora_A.weight
base_model.model.model.layers.0.feed_forward.experts.base_layer.lora_B.weight
base_model.model.model.layers.0.feed_forward.experts.lora_A.weight
base_model.model.model.layers.0.feed_forward.experts.lora_B.weight
base_model.model.model.layers.1.feed_forward.experts.base_layer.lora_A.weight
base_model.model.model.layers.1.feed_forward.experts.base_layer.lora_B.weight
base_model.model.model.layers.1.feed_forward.experts.lora_A.weight
base_model.model.model.layers.1.feed_forward.experts.lora_B.weight

other sd
base_model.model.model.layers.0.feed_forward.experts.base_layer.lora_A.weight
base_model.model.model.layers.0.feed_forward.experts.base_layer.lora_B.weight
base_model.model.model.layers.0.feed_forward.experts.lora_A.weight
base_model.model.model.layers.0.feed_forward.experts.lora_B.weight
base_model.model.model.layers.1.feed_forward.experts.base_layer.lora_A.weight
base_model.model.model.layers.1.feed_forward.experts.base_layer.lora_B.weight
base_model.model.model.layers.1.feed_forward.experts.lora_A.weight
base_model.model.model.layers.1.feed_forward.experts.lora_B.weight

loading in same order
<All keys matched successfully>

loading in reverse order, should theoretically not matter!
UserWarning: Found missing adapter keys while loading the checkpoint: ['base_model.model.model.layers.0.feed_forward.experts.base_layer.lora_A.other.weight', 'base_model.model.model.layers.0.feed_forward.experts.base_layer.lora_B.other.weight', 'base_model.model.model.layers.1.feed_forward.experts.base_layer.lora_A.other.weight', 'base_model.model.model.layers.1.feed_forward.experts.base_layer.lora_B.other.weight'].
  warnings.warn(warn_message)
_IncompatibleKeys(missing_keys=[], unexpected_keys=['base_model.model.model.layers.0.feed_forward.experts.base_layer.lora_A.default.weight', 'base_model.model.model.layers.0.feed_forward.experts.base_layer.lora_B.default.weight', 'base_model.model.model.layers.1.feed_forward.experts.base_layer.lora_A.default.weight', 'base_model.model.model.layers.1.feed_forward.experts.base_layer.lora_B.default.weight'])

We create a LoRA model with two adapters that use target_modules. The issues we can see are:

  1. The inner lora.ParamWrapper does not contain any LoRA weights for other.
  2. When we load them back in the exact same order, everything works. When we load them in reverse order, which should not make a difference, we get errors.
  3. If we were to strip the .base_layer part of the state_dict keys, the key names would be duplicate, as the parameter name is not part of the key.

@BenjaminBossan
Copy link
Member Author

BenjaminBossan commented Aug 6, 2025

I'm investigating the failing CI. Update: Done, a new test was not implemented correctly.

Copy link
Collaborator

@githubnemo githubnemo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two small comments, otherwise LGTM. Thanks for handling the weirdness target params entail.

rep = super().__repr__()
idx = rep.find("(") + 1
# insert the name of the parameter
# insert the name of the parameter to allow the repr to be disambiguous when multiple parameters on the same mdule
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# insert the name of the parameter to allow the repr to be disambiguous when multiple parameters on the same mdule
# insert the name of the parameter to allow the repr to be disambiguous when multiple parameters on the same module

self.targeted_parameter_names.append(key)
else:
# Standard case: the parameter is not already parametrized. Note, however, that the model could already
# be nested with lora.ParamWrapper.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worthwhile adding how this comes to be (being already nested with ParamWrapper). I don't think that's obvious.

@BenjaminBossan BenjaminBossan merged commit a2c6612 into huggingface:main Aug 12, 2025
11 of 14 checks passed
@BenjaminBossan BenjaminBossan deleted the fix-target-parameters-multiple-issues branch August 12, 2025 11:59
BenjaminBossan added a commit to BenjaminBossan/peft that referenced this pull request Aug 14, 2025
There are a few issues with target_parameters that are fixed in this PR.

Existing parametrizations

When using target_parameters with LoRA, after the forward call finishes,
the LoRA parametrization is removed. However, this also used to remove
all other parametrizations on the same parameter, which is bad. With
this PR, only the LoRA parametrization is removed.

Module repr

This PR also extends the __repr__ of lora.ParamWrapper to contain the
parameter name, which makes it more useful.

Extend testing

Added a tiny gpt-oss model to the target_parameters test suite.

Multiple LoRA adapters with target_parameters

There is an issue when adding a second LoRA adapter with
target_paramters, where this second adapter would not actually be
applied correctly. The corresponding unit test was too lax to notice the
bug. This is not easy to fix, so for now we forbid adding a second
adapter with target_parameters. This is very strict but it's better than
having silent errors.

Although it was possible to fix that specific issue, the solution
resulted in ever deeply nested adapters (i.e. with multiple
.base_layer). This in turn results in those infixes to be part of the
state_dict. But then we cannot load the individual adapters correctly,
except if the model is restored in the exact same order as it was
previously created. This is not normally a requirement in PEFT (e.g. I
can create a model with two adapters and later decide to load only one
of them).

In the long run, we need to think about solutions that would allow this.
It may require some form of normalization of the layers to prevent ever
deeper nesting. Also, what is ugly right now is that, given that the
LoRA lives on a module but actually targets one of possibly multiple
parameter, the LoRA weights don't actually reference said parameter in
any name. That means, purely from the state_dict, it is unclear which
parameter a LoRA weight belongs to. Ideally, this should be encoded in
the LoRA weight key.
BenjaminBossan added a commit that referenced this pull request Aug 21, 2025
* FIX Multiple issues with target_parameters (#2710)
* Bump version to 0.17.1
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants