DoRA embed_scale Support #2838 #2839

sambhavnoobcoder · 2025-10-14T21:53:39Z

Summary

This PR adds embed_scale support to DoRA (Weight-Decomposed LoRA), ensuring DoRA correctly handles models with scaled embeddings (e.g., Gemma3TextScaledWordEmbedding). This is a companion PR to the LoRA (#2825) and X-LoRA (#2831) embed_scale fixes, following the suggestion and need to extend support to DoRA's embedding variant.

Changes

Code

src/peft/tuners/lora/variants.py: Modified DoraEmbeddingVariant.forward() to apply embed_scale to DoRA contributions
- Retrieves embed_scale via module._get_embed_scale() (same method used by LoRA)
- Applies scaling AFTER weight norm calculation to preserve DoRA's weight geometry semantics

Tests

tests/test_decoder_models.py: Parametrized existing LoRA embed_scale test to cover both vanilla LoRA and DoRA
- Added @pytest.mark.parametrize("use_dora", [False, True]) to test_lora_embed_scale_is_applied
- Modified LoraConfig to include use_dora parameter
- Added explicit tolerance atol=1e-5, rtol=1e-5 to handle small numerical differences on MPS backend

Key Design Decision

Why apply embed_scale AFTER weight norm calculation?

DoRA decomposes weight updates as: W_new = m * (W_base + ΔW) / ||W_base + ΔW||

The weight norm calculation (||W_base + ΔW||) is a geometric property of the weight matrix itself and should remain independent of output scaling. The embed_scale is an output-space transformation applied by specific embedding layers (like Gemma3's sqrt(hidden_size) scaling), so it's applied after DoRA's weight decomposition completes.

This preserves DoRA's weight geometry semantics while ensuring output consistency with the base layer.

Test Results

✅ test_lora_embed_scale_is_applied[False] (vanilla LoRA) - 1.67s
✅ test_lora_embed_scale_is_applied[True] (DoRA) - 0.06s
✅ test_lora_embed_scale_is_applied_mixed_batch - 0.04s
✅ make style passed

Fixes : #2838

cc : @BenjaminBossan

HuggingFaceDocBuilderDev · 2025-10-15T09:10:50Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

BenjaminBossan

Thanks for also amending DoRA to take the embedding scale into account. The PR LGTM. That should wrap up all PEFT methods that allow to target the embedding layer.

the files for change

fd1bb66

sambhavnoobcoder mentioned this pull request Oct 14, 2025

Embedding scaling for TrainableTokensModel in DoRA #2838

Closed

BenjaminBossan approved these changes Oct 15, 2025

View reviewed changes

BenjaminBossan merged commit 086f187 into huggingface:main Oct 15, 2025
13 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

DoRA embed_scale Support #2838 #2839

DoRA embed_scale Support #2838 #2839

Uh oh!

sambhavnoobcoder commented Oct 14, 2025

Uh oh!

HuggingFaceDocBuilderDev commented Oct 15, 2025

Uh oh!

BenjaminBossan left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

DoRA embed_scale Support #2838 #2839

DoRA embed_scale Support #2838 #2839

Uh oh!

Conversation

sambhavnoobcoder commented Oct 14, 2025

Summary

Changes

Code

Tests

Key Design Decision

Test Results

Uh oh!

HuggingFaceDocBuilderDev commented Oct 15, 2025

Uh oh!

BenjaminBossan left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants