这是indexloc提供的服务,不要输入任何密码
Skip to content

Conversation

@sambhavnoobcoder
Copy link
Contributor

Summary

This PR adds embed_scale support to DoRA (Weight-Decomposed LoRA), ensuring DoRA correctly handles models with scaled embeddings (e.g., Gemma3TextScaledWordEmbedding). This is a companion PR to the LoRA (#2825) and X-LoRA (#2831) embed_scale fixes, following the suggestion and need to extend support to DoRA's embedding variant.

Changes

Code

  • src/peft/tuners/lora/variants.py: Modified DoraEmbeddingVariant.forward() to apply embed_scale to DoRA contributions
    • Retrieves embed_scale via module._get_embed_scale() (same method used by LoRA)
    • Applies scaling AFTER weight norm calculation to preserve DoRA's weight geometry semantics

Tests

  • tests/test_decoder_models.py: Parametrized existing LoRA embed_scale test to cover both vanilla LoRA and DoRA
    • Added @pytest.mark.parametrize("use_dora", [False, True]) to test_lora_embed_scale_is_applied
    • Modified LoraConfig to include use_dora parameter
    • Added explicit tolerance atol=1e-5, rtol=1e-5 to handle small numerical differences on MPS backend

Key Design Decision

Why apply embed_scale AFTER weight norm calculation?

DoRA decomposes weight updates as: W_new = m * (W_base + ΔW) / ||W_base + ΔW||

The weight norm calculation (||W_base + ΔW||) is a geometric property of the weight matrix itself and should remain independent of output scaling. The embed_scale is an output-space transformation applied by specific embedding layers (like Gemma3's sqrt(hidden_size) scaling), so it's applied after DoRA's weight decomposition completes.

This preserves DoRA's weight geometry semantics while ensuring output consistency with the base layer.

Test Results

  • test_lora_embed_scale_is_applied[False] (vanilla LoRA) - 1.67s
  • test_lora_embed_scale_is_applied[True] (DoRA) - 0.06s
  • test_lora_embed_scale_is_applied_mixed_batch - 0.04s
  • make style passed

Fixes : #2838

cc : @BenjaminBossan

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Member

@BenjaminBossan BenjaminBossan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for also amending DoRA to take the embedding scale into account. The PR LGTM. That should wrap up all PEFT methods that allow to target the embedding layer.

@BenjaminBossan BenjaminBossan merged commit 086f187 into huggingface:main Oct 15, 2025
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants