Implement ensure_weight_tying for trainable_token_indices (#2864) #2870

sambhavnoobcoder · 2025-10-26T14:13:13Z

Implement ensure_weight_tying for trainable_token_indices

Summary

This PR implements consistent weight tying behavior for trainable_token_indices as specified in issue #2864. It extends the ensure_weight_tying parameter (introduced in PR #2803) to work with trainable_token_indices, providing users explicit control over weight tying between embeddings and LM head.

Fixes #2864 (trainable_token_indices portion)

Problem Statement

Background

PEFT models sometimes need to handle tied weights between embedding layers and LM head layers (when tie_word_embeddings=True). The ensure_weight_tying parameter was introduced in PR #2803 to give users explicit control over this behavior for modules_to_save. However, the same control was missing for trainable_token_indices.

The Issue

Issue identified that the weight tying behavior for trainable_token_indices was not consistent across different scenarios. Specifically, there were four cases that needed to be implemented:

Untied model with ensure_weight_tying=True: Should warn users that weight tying cannot be applied
Tied model with ensure_weight_tying=True and different indices: Should error, as it's impossible to tie adapters with different token indices
Tied model with ensure_weight_tying=False and different indices: Should treat layers as separate (backwards compatibility behavior)
Tied model with ensure_weight_tying=True and same indices: Should apply weight tying correctly

Solution Approach

Implementation Strategy:

Check weight tying configuration early (before creating wrappers)
Detect if user specified both embedding and lm_head layers in dict format
Check if their token indices match or differ
Apply appropriate logic based on the configuration matrix from the issue
Skip creating wrappers for layers that will be tied later

Changes Made

1. Updated Configuration Documentation

File: src/peft/tuners/lora/config.py

Updated the ensure_weight_tying parameter docstring to clarify that it now applies to both modules_to_save and trainable_token_indices, making the documentation consistent with the implementation.

2. Implemented Weight Tying Logic

File: src/peft/utils/other.py

Added comprehensive logic within the existing trainable_token_indices handling block:

Key Components:

Early Detection: Check weight tying configuration before creating any wrappers
Layer Detection: Identify if both embedding and lm_head layers are specified
Index Comparison: Determine if token indices match between the layers
Skip Logic: Prevent double-wrapping by skipping layers that will be tied
Warning System: Inform users when their configuration cannot be applied
Error Handling: Raise clear errors for contradictory configurations
Backwards Compatibility: Preserve existing behavior when ensure_weight_tying=False

Four Cases Implemented:

Case 1 - Warning for Untied Models:
- When: weights_tied=False + ensure_weight_tying=True
- Action: Issue warning that weight tying cannot be applied
- Rationale: Model doesn't have tied weights, so user's request cannot be fulfilled
Case 2 - Error for Contradictory Configuration:
- When: weights_tied=True + ensure_weight_tying=True + different indices
- Action: Raise ValueError with clear explanation
- Rationale: Cannot tie adapters that operate on different token indices
Case 3 - Backwards Compatibility:
- When: weights_tied=True + ensure_weight_tying=False + different indices
- Action: Treat layers as separate (no tying)
- Rationale: User explicitly opted out, respect their choice even if model supports tying
Case 4 - Apply Tying:
- When: Other combinations where tying is appropriate
- Action: Create tied adapters that share parameters
- Rationale: Normal weight tying behavior

3. Comprehensive Test Suite

File: tests/test_trainable_tokens.py

Added 7 new test methods covering all scenarios:

Test Coverage:

test_ensure_weight_tying_warns_when_model_not_tied_list_format: Verifies warning for list format
test_ensure_weight_tying_warns_when_model_not_tied_dict_format: Verifies warning for dict format
test_weight_tying_bc_different_indices_treated_separately: Verifies backwards compatibility
test_ensure_weight_tying_errors_with_different_indices: Verifies error for contradictory config
test_ensure_weight_tying_applied_with_same_indices: Verifies tying with same indices
test_weight_tying_bc_same_indices_applied: Verifies BC for same indices
test_ensure_weight_tying_with_single_layer: Verifies list format tying

Testing Results

New Tests

All 7 new tests pass successfully:

✅ test_ensure_weight_tying_warns_when_model_not_tied_list_format
✅ test_ensure_weight_tying_warns_when_model_not_tied_dict_format
✅ test_weight_tying_bc_different_indices_treated_separately
✅ test_ensure_weight_tying_errors_with_different_indices
✅ test_ensure_weight_tying_applied_with_same_indices
✅ test_weight_tying_bc_same_indices_applied
✅ test_ensure_weight_tying_with_single_layer

Backwards Compatibility

This implementation maintains full backwards compatibility:

✅ Default Behavior Unchanged: ensure_weight_tying defaults to False, preserving existing behavior
✅ No Breaking Changes: Existing code continues to work without modification
✅ Opt-in Enhancement: Users must explicitly set ensure_weight_tying=True to use new features
✅ BC Mode Preserved: When ensure_weight_tying=False, existing automatic tying still works for compatible configurations

Screenshots

Checklist

Implementation follows the specification in issue Deal with weight tying consistently #2864
All 7 new tests pass
Backwards compatibility maintained
Documentation updated (docstring)
Code is scoped only to trainable_token_indices
Error messages are clear and actionable
Warning messages inform users appropriately

cc: @BenjaminBossan

…ngface#2864)

BenjaminBossan

Thanks for a lot for handling the update of weight tying of trainable tokens. What's there already looks quite good, but I wonder if we can simplify the implementation, please check my suggestions.

Regarding the tests, I wanted to map the tests you wrote onto the table from #2864, this is what I ended up with:

weights tied	ensure_weight_tying	LoraConfig trainable_token_indices	result	test
False	False	`[1, 2, 3]`	trainable tokens on embeddings only
False	True	`[1, 2, 3]`	warn & trainable tokens on embeddings only	test_ensure_weight_tying_warns_when_model_not_tied_list_format
True	False	`[1, 2, 3]`	tied trainable tokens
True	True	`[1, 2, 3]`	tied trainable tokens	test_ensure_weight_tying_with_single_layer
False	False	`{"lm_head": [1,2], "embed_tokens": [1,2]}`	treat as separate
False	True	`{"lm_head": [1,2], "embed_tokens": [1,2]}`	warn & treat as separate
True	False	`{"lm_head": [1,2], "embed_tokens": [1,2]}`	tied trainable tokens	test_weight_tying_bc_same_indices_applied
True	True	`{"lm_head": [1,2], "embed_tokens": [1,2]}`	tied trainable tokens	test_ensure_weight_tying_applied_with_same_indices
False	False	`{"lm_head": [1,2], "embed_tokens": [3,4]}`	treat as separate
False	True	`{"lm_head": [1,2], "embed_tokens": [3,4]}`	warn & treat as separate
True	False	`{"lm_head": [1,2], "embed_tokens": [3,4]}`	*treat as separate	test_weight_tying_bc_different_indices_treated_separately
True	True	`{"lm_head": [1,2], "embed_tokens": [3,4]}`	*error	test_ensure_weight_tying_errors_with_different_indices

Does this look right to you? I think it means there are still a few gaps in the tests, could you please provide the missing ones? Some tests could be combined via pytest.mark.parametrize if the expected outcomes are the same.

BenjaminBossan · 2025-10-28T11:51:19Z

tests/test_trainable_tokens.py

+        ]
+        assert warnings_found
+
+    def test_ensure_weight_tying_warns_when_model_not_tied_dict_format(self, model_weight_untied, recwarn):


This test can be merged with test_ensure_weight_tying_warns_when_model_not_tied_list_format by parametrizing the trainable_token_indices argument.

resolved in 232c6e7

BenjaminBossan · 2025-10-28T11:55:24Z

tests/test_trainable_tokens.py

+        warnings_list = [w.message.args[0] for w in recwarn]
+        warnings_found = [
+            msg for msg in warnings_list if "ensure_weight_tying=True but the model does not have tied weights" in msg
+        ]
+        assert warnings_found


I think it's a bit more elegant to do:

expected = ... assert any(expected in msg for msg in warings_list)

resolved in 232c6e7

tests/test_trainable_tokens.py

BenjaminBossan · 2025-10-28T11:59:15Z

tests/test_trainable_tokens.py

+            ensure_weight_tying=True,
+        )
+
+        with pytest.raises(ValueError) as e:


Let's use:

msg = "Cannot ensure weight tying when different token indices are specified" with pytest.raises(ValueError, match=msg):

resolved in 232c6e7

BenjaminBossan · 2025-10-28T12:12:51Z

src/peft/utils/other.py

+        ensure_weight_tying = getattr(peft_config, "ensure_weight_tying", False)
+
+        # Check if we're dealing with dict format that specifies both embed_tokens and lm_head
+        is_dict_format = isinstance(peft_config.trainable_token_indices, dict)


I don't think we need is_dict_format. The check below, len(target_layers) > 1, is already enough, is it not?

yes , re-reviewed this , and simplified the logic significantly . refrence 232c6e7 for implementation .

BenjaminBossan · 2025-10-28T12:22:34Z

src/peft/utils/other.py

+                if "embed" in key_lower and not ("lm" in key_lower or "head" in key_lower):
+                    embed_key = key
+                elif "lm_head" in key_lower or ("head" in key_lower and "lm" not in key_lower):
+                    lm_head_key = key


I wonder if we overcomplicate things here. If there are multiple target_layers, can we not just compare them to the tied weights? Is it important to identify here which one is for the embedding and which one is for the LM head?

Below, you're using the names for the error message, which is a nice touch, but if we can refrain from guessing here, it would be worth it to make the error message more generic IMO.

i relooked at this and removed the string matching logic (checking for "embed", "lm_head", etc.) and now directly compare target layers against model._tied_weights_keys and the actual embedding layer. The error message is now generic, showing all conflicting tied layers instead of assuming specific names.

BenjaminBossan · 2025-10-28T12:23:01Z

src/peft/utils/other.py

+                    indices_mismatch = True
+                else:
+                    # Same indices - if weights are tied and we're applying tying, skip lm_head (it'll be tied later)
+                    if weights_tied and not (not ensure_weight_tying and False):  # Will apply tying


This check makes no sense to me, why and False?

resolved in 232c6e7

sambhavnoobcoder · 2025-10-29T19:03:10Z

About the test coverage , the table looks correct. I've filled all 6 gaps in the test coverage:

Added 2 new standalone test functions (test_untied_model_list_format_no_ensure and test_tied_model_list_format_no_ensure)
Expanded the parametrized test_ensure_weight_tying_warns_when_model_not_tied from 2 to 4 scenarios (adding the dict format cases)
Added parametrized test_untied_model_dict_no_ensure covering 2 scenarios (same and different indices)

sambhavnoobcoder · 2025-10-29T19:24:56Z

@BenjaminBossan Thank you for the detailed review . i have made all the changes and would appreciate if you could have a look at it again . I'll make any changes necessary .

BenjaminBossan

Thanks for iterating on the PR and extending the tests. I still have a few comments, please check.

As a general remark, the logic for handling weight tying in trainable tokens is inherently quite complex. Therefore, I focused on checking if the implementation is clear and simple while keeping the functionality intact. When I found code that I thought could be improved in this regard, I added a comment. But I would also kindly ask you to double check if you can find anything that can be simplified and apply it, even if I haven't commented on it. This will help with the long term health of the PEFT code base 🙏

BenjaminBossan · 2025-11-03T13:02:09Z

src/peft/utils/other.py

+        weights_tied = (
+            model_config.get("tie_word_embeddings", False)
+            # some models may be misconfigured to have weight tying enabled but don't define tied weights keys
+            and model._tied_weights_keys is not None


This could theoretically raise an AttributeError if used with a non-HF transformers model, right? It's not so likely in practice, since a non-HF transformers model is unlikely to have a model config with tie_word_embeddings, but let's still use getattr here to be safe. I would also assign this to a variable, as it's used 3 times in total.

BenjaminBossan · 2025-11-03T13:06:19Z

src/peft/utils/other.py

+            # Check if any of the target layers correspond to tied weights in the model
+            # Instead of guessing layer names, compare against actual tied weight keys
+            # Extract module names from tied weights keys (remove the weight attribute name)
+            tied_module_names = {".".join(key.split(".")[:-1]) for key in model._tied_weights_keys}


I'd say this is simpler:

Suggested change

tied_module_names = {".".join(key.split(".")[:-1]) for key in model._tied_weights_keys}

tied_module_names = {key.rpartition(".")[0] for key in model._tied_weights_keys}

I saw that the existing code does the same thing as you did here, but let's still try to improve :) (feel free to adjust the existing code below too).

BenjaminBossan · 2025-11-03T13:25:52Z

src/peft/utils/other.py

+                    break
+
+            # Find which target layers are in the tied weights (including the embedding source)
+            for target_layer in target_layers:


I'd rename target_layer to target_layer_name to make it clear that it's the name, not the module itself.

BenjaminBossan · 2025-11-03T13:31:35Z

src/peft/utils/other.py

+                has_both_layers = True
+                # Check if all tied layers have the same indices
+                first_indices = target_layers[tied_layer_keys[0]]
+                indices_match = all(target_layers[key] == first_indices for key in tied_layer_keys[1:])


I don't think we need both indices_match and indices_mismatch, it's a bit redundant. I think it's easiest to eliminate the former.

BenjaminBossan · 2025-11-03T13:35:17Z

src/peft/utils/other.py

+            for name, module in model.named_modules():
+                if module is embedding_module:
+                    # Get just the last part of the name for matching with target_layers
+                    embedding_name = name.split(".")[-1]


Although the logic in this loop is fine, it can be a bit confusing: What would it mean if the embedding_module is not found? This should never happen, right? So I'm wondering if we can do something like:

embedding_name = next(n.split(".")[-1] for n, m in model.named_modules() if m is embedding_module)

This would raise an error if embedding_module is not found instead of leaving embedding_name = None. What's your opinion?

BenjaminBossan · 2025-11-03T13:42:31Z

src/peft/utils/other.py

+        if weights_tied and ensure_weight_tying and has_both_layers and indices_mismatch:
+            # Build more generic error message showing the conflicting layers
+            tied_layers_info = ", ".join([f"{key}: {target_layers[key]}" for key in tied_layer_keys])
+            raise ValueError(


Can we not raise this error immediately after indices_mismatch was determined? The earlier we can raise, the better. It should also make the check simpler, as we only need to check for if indices_mismatch.

BenjaminBossan · 2025-11-03T13:50:27Z

src/peft/utils/other.py

+                    # Since indices match here, indices_mismatch=False, so this simplifies to: we apply tying
+                    # Skip all tied modules except the embedding (first one in tied_layer_keys)
+                    # The embedding is typically first, but to be safe, skip modules in _tied_weights_keys
+                    for key in tied_layer_keys:


I'm wondering if we cannot simply take the intersection between the two:

layers_to_skip = set(tied_layer_keys) & tied_module_names.

This approach would fail if we have a substring match but not a full string match, which is what you cover with tied_module.endswith(key). However, I don't see what would need to happen for a substring-only match, and AFAICT, the tests also never reach that point. Could you please explain?

BenjaminBossan · 2025-11-03T14:01:03Z

src/peft/utils/other.py

            and isinstance(model.get_input_embeddings(), TrainableTokensWrapper)
-        ):
-            # the embedding layer is modified and we want weight tying.
+            and not (not ensure_weight_tying and has_both_layers and indices_mismatch)


This conditional is a bit hard to read IMO, let's try to simplify. So for and not (not ensure_weight_tying ... let's move it out of the parenthesis, i.e. it becomes and ensure_weight_tying. As for indices_mismatch, this can only ever be True if has_both_layers is also True, right? So we don't really need to check both.

BenjaminBossan · 2025-11-03T14:01:36Z

src/peft/utils/other.py

+
+        if len(target_layers) > 1 and weights_tied and model._tied_weights_keys:
+            # Check if any of the target layers correspond to tied weights in the model
+            # Instead of guessing layer names, compare against actual tied weight keys


This comment can be removed IMO.

BenjaminBossan · 2025-11-03T14:07:33Z

tests/test_trainable_tokens.py

+        assert lm_head_adapter.token_indices["default"] == [1, 2]
+
+    def test_weight_tying_bc_same_indices_applied(self, model_weight_tied):
+        """Backwards compatibility: same indices should have weight tying even when ensure_weight_tying=False"""


This is not really for BC, is it? I think this is just the general expected behavior. The BC part is only for cases where the behavior might not be what the users expects but we cannot change it now because it would be backwards incompatible.

…ht-tying

ENH: Implement ensure_weight_tying for trainable_token_indices (huggi…

423c616

…ngface#2864)

sambhavnoobcoder mentioned this pull request Oct 26, 2025

Deal with weight tying consistently #2864

Open

BenjaminBossan requested changes Oct 28, 2025

View reviewed changes

sambhavnoobcoder added 2 commits October 30, 2025 00:04

maintinaers comments addressed

232c6e7

make style ran

213b47f

sambhavnoobcoder requested a review from BenjaminBossan October 31, 2025 16:23

githubnemo mentioned this pull request Nov 3, 2025

Proposal: make trainable tokens more flexible to support LMHead tuning #2792

Closed

BenjaminBossan requested changes Nov 3, 2025

View reviewed changes

sambhavnoobcoder added 2 commits November 6, 2025 01:20

Merge remote-tracking branch 'origin/main' into trainable-tokens-weig…

3874be0

…ht-tying

comments fixed

30d2d01

	tied_module_names = {".".join(key.split(".")[:-1]) for key in model._tied_weights_keys}
	tied_module_names = {key.rpartition(".")[0] for key in model._tied_weights_keys}

Implement ensure_weight_tying for trainable_token_indices (#2864) #2870

Are you sure you want to change the base?

Implement ensure_weight_tying for trainable_token_indices (#2864) #2870

Uh oh!

Conversation

sambhavnoobcoder commented Oct 26, 2025

Implement ensure_weight_tying for trainable_token_indices

Summary

Problem Statement

Background

The Issue

Solution Approach

Changes Made

1. Updated Configuration Documentation

2. Implemented Weight Tying Logic

3. Comprehensive Test Suite

Testing Results

New Tests

Backwards Compatibility

Screenshots

Checklist

Uh oh!

BenjaminBossan left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sambhavnoobcoder commented Oct 29, 2025

Uh oh!

sambhavnoobcoder commented Oct 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

BenjaminBossan left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

sambhavnoobcoder commented Oct 29, 2025 •

edited

Loading