-
Notifications
You must be signed in to change notification settings - Fork 2.1k
The great deduplication #2771
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The great deduplication #2771
Conversation
For now, no deprecation
- set_adapter not touched yet - went from 10739 to 8494 duplicated lines (quick check)
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
Make it deal with quantized weights by default, so that special treatment in certain PEFT methods can be removed.
While working on this, I also deduplicated almost identical TRANSFORMERS_MODULES_TO_XXX_TARGET_MODULES_MAPPING constants by copying them from LoRA and only overriding a few values that differ. Moreover, some PEFT methods didn't have their own TRANSFORMERS_MODULES_TO_XXX_TARGET_MODULES_MAPPING but used the one from LoRA instead. They now each have their own constant, which is a copy from the one from LoRA.
It's only used there, no need to have it on BaseTuner.
It's confusing otherwise
It is too specific to be really used as standalone function.
Not that generally useful
Also improve some existing docstrings, e.g. for inject_adapter_in_model.
sayakpaul
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All tests passing gives enough confidence that nothing was broken in this PR. I would also probably run the itegration/slow tests just to confirm no disruption.
But apart from that, this is really cool!
| "qwen2": ["q_proj", "v_proj"], | ||
| "qwen3": ["q_proj", "v_proj"], | ||
| } | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How can this be deleted? A small comment in-line here would he helpful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As these mappings were all identical or almost identical to the one for LoRA, I just made them copies of the LoRA mappings, e.g. https://github.com/huggingface/peft/pull/2771/files#diff-b054446713795a79a89f2285d66068573d69f0b701ee31419bb223d1601e7844R108 for C3A.
I mentioned that in the PR description:
While working on this PR, I also deduplicated almost identical TRANSFORMERS_MODULES_TO_XXX_TARGET_MODULES_MAPPING constants by copying them from LoRA and only overriding a few values that differ. Moreover, some PEFT methods didn't have their own TRANSFORMERS_MODULES_TO_XXX_TARGET_MODULES_MAPPING but used the one from LoRA instead. They now each have their own constant, which is a copy from the one from LoRA.
BenjaminBossan
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your review, Sayak.
All tests passing gives enough confidence that nothing was broken in this PR. I would also probably run the itegration/slow tests just to confirm no disruption.
Good idea about running slow tests, unfortunately they're not setup to run on forks :-/ I ran a subset of those locally and they pass.
But I think it's not a big deal, when it's merged, they will during the night, we can fix anything that breaks then and generally, the stuff that is changed here is best covered by the normal tests anyway.
Btw. one unfortunate side effect of this PR is that the test coverage declines slightly (by 1%) because there are fewer duplicated lines being tested :D
| "qwen2": ["q_proj", "v_proj"], | ||
| "qwen3": ["q_proj", "v_proj"], | ||
| } | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As these mappings were all identical or almost identical to the one for LoRA, I just made them copies of the LoRA mappings, e.g. https://github.com/huggingface/peft/pull/2771/files#diff-b054446713795a79a89f2285d66068573d69f0b701ee31419bb223d1601e7844R108 for C3A.
I mentioned that in the PR description:
While working on this PR, I also deduplicated almost identical TRANSFORMERS_MODULES_TO_XXX_TARGET_MODULES_MAPPING constants by copying them from LoRA and only overriding a few values that differ. Moreover, some PEFT methods didn't have their own TRANSFORMERS_MODULES_TO_XXX_TARGET_MODULES_MAPPING but used the one from LoRA instead. They now each have their own constant, which is a copy from the one from LoRA.
Exactly why regular software engineering coverage often cannot be applied one-on-one in these libraries. I am totally fine with that. |
githubnemo
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor comments / questions. Looks good otherwise :)
| def _unloading_checks(self, adapter_names: Optional[list[str]]): | ||
| adapters_to_consider = adapter_names or self.active_adapters | ||
| is_modules_to_save_available = any( | ||
| self.peft_config[adapter].modules_to_save for adapter in adapters_to_consider | ||
| ) | ||
| if is_modules_to_save_available and len(adapters_to_consider) > 1: | ||
| raise ValueError("Cannot unload multiple adapters that specify `modules_to_save`.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this a remnant and can be removed? doesn't seem to be used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, good question. So I agree that this check doesn't make too much sense, but the general idea is right: If we have multiple modules_to_save, then merge_and_unload doesn't really work. Right now, we just let it pass:
from transformers import AutoModelForCausalLM
from peft import LoraConfig, get_peft_model
from copy import deepcopy
model_id = "facebook/opt-125m"
model = AutoModelForCausalLM.from_pretrained(model_id)
config = LoraConfig(modules_to_save=["lm_head"])
model = get_peft_model(model, config)
model.add_adapter("adapter2", config)
model.base_model.model.lm_head.modules_to_save.default.weight.data.fill_(111)
model.base_model.model.lm_head.modules_to_save.adapter2.weight.data.fill_(222)
unloaded = deepcopy(model).merge_and_unload(adapter_names=["default", "adapter2"])
print(unloaded.lm_head.weight) # is [[111, 111, ...]]I'd suggest to leave this situation as is for now and handle it in a separate PR. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, let's leave it as-is for now.
* IA³ merge checks for bnb * Typos
BenjaminBossan
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the detailed review @githubnemo, I resolved the flagged issues or commented on them. Please check again.
| def _unloading_checks(self, adapter_names: Optional[list[str]]): | ||
| adapters_to_consider = adapter_names or self.active_adapters | ||
| is_modules_to_save_available = any( | ||
| self.peft_config[adapter].modules_to_save for adapter in adapters_to_consider | ||
| ) | ||
| if is_modules_to_save_available and len(adapters_to_consider) > 1: | ||
| raise ValueError("Cannot unload multiple adapters that specify `modules_to_save`.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, good question. So I agree that this check doesn't make too much sense, but the general idea is right: If we have multiple modules_to_save, then merge_and_unload doesn't really work. Right now, we just let it pass:
from transformers import AutoModelForCausalLM
from peft import LoraConfig, get_peft_model
from copy import deepcopy
model_id = "facebook/opt-125m"
model = AutoModelForCausalLM.from_pretrained(model_id)
config = LoraConfig(modules_to_save=["lm_head"])
model = get_peft_model(model, config)
model.add_adapter("adapter2", config)
model.base_model.model.lm_head.modules_to_save.default.weight.data.fill_(111)
model.base_model.model.lm_head.modules_to_save.adapter2.weight.data.fill_(222)
unloaded = deepcopy(model).merge_and_unload(adapter_names=["default", "adapter2"])
print(unloaded.lm_head.weight) # is [[111, 111, ...]]I'd suggest to leave this situation as is for now and handle it in a separate PR. WDYT?
During work on huggingface#2771, the usage of set_auxiliary_adapters became obsolete, expect in one place, which was missed. This has now been cleaned up and the obsolete method is removed.
During work on #2771, the usage of set_auxiliary_adapters became obsolete, expect in one place, which was missed. This has now been cleaned up and the obsolete method is removed.
There is a lot of code duplication PEFT, especially when it comes to the
model.pyfiles of the different PEFT methods. The reason for this is that for new PEFT methods, contributors will often copy an existing one and adapt it where necessary (which is as it should be). For lack of abstractions, this resulted in dozens of methods being copied 1:1.At the start, when there were still few PEFT methods, we couldn't know yet which functions could stay the same across different PEFT methods and which ones would need adjusting. Therefore, it wasn't a bad choice to avoid prematurely abstracting those away. Now that we have a much better picture, it is time to make this step though. If a PEFT method requires special treatment, it can still override these methods, thus we don't lose flexibility. The refactored methods are:
merge_and_unloadunloaddelete_adapterset_adapterenable_adapter_layersdisable_adapter_layers_replace_module_unload_and_optionally_merge_mark_only_adapters_as_trainable_check_new_adapter_config_check_target_module_exists_prepare_adapter_config__getattr__get_peft_config_as_dict(fully deleted)These methods were dealt with in separate commits for easier review.
This PR results in over 3000 fewer lines of code. We went from ~10700 lines counted as duplicated to ~8000. The 90th percentile of duplication ratio for methods/functions went from 100% to 97%. This seems little but it means that there are significantly fewer functions now that are 100% duplicates.
As a concrete example, the
boft/model.pyfile now contains 70 lines of code where previously it contained 227 (counted usingcloc). As this is a fairly standard model file for a PEFT method, it means future contributors can skip ~2/3 of the model code, allowing them to better focus on the actual new code.The remaining, very similar functions are almost all on
layer.py. So e.g. for LoRA,Linear.mergeis 95% identical to_ConvNd.merge. But since there are differences, the functions cannot be easily deduplicated (and it's questionable whether we even want to deduplicate them at all).I introduced a new module,
functional.py, which contains functions (just reimported from elsewhere) that can be useful for libraries that want to integrate PEFT. I would suggest that we should treat them as public API and thus guarantee backwards compatibility. When refactoring the methods for deduplication, I considered whether they can be used on any model (like transformers) or if they only make sense on aPeftModel. If it was the former, I made them standalone functions and added them tofunctional.py. If they are dependent onPeftModelinstances, they just became normal methods. E.g.,disable_adapter_layershasfor active_adapter in self.active_adapters, so it presupposesself.active_adapters.While working on this PR, I also deduplicated almost identical
TRANSFORMERS_MODULES_TO_XXX_TARGET_MODULES_MAPPINGconstants by copying them from LoRA and only overriding a few values that differ. Moreover, some PEFT methods didn't have their ownTRANSFORMERS_MODULES_TO_XXX_TARGET_MODULES_MAPPINGbut used the one from LoRA instead. They now each have their own constant, which is a copy from the one from LoRA.