-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Add Arrow + GenKnowSub to LoRA #2644
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Hey @TheTahaaa, thank you for the PR! I'm unsure about the use of a LoRA variant here. The concept of Arrow is, as I understand it, more similar to X-Lora (i.e. a routing of multiple adapters) than a single adapter. Therefore I think it would make more sense to implement it as its own method (but still being able to ingest LoRA adapters, like X-LoRA does). Secondly, I think that while a direct integration of Let me know what you think. |
|
Thanks for the feedback @githubnemo ! Regarding the use of a LoraVariant for Arrow: while Arrow isn’t a new adapter type, it’s fundamentally an inference-time, train-free routing mechanism over already trained LoRAs within the same layer, as emphasised in the paper. That’s why I implemented it as a variant—so it can stay tightly coupled to the existing LoRA modules, plug cleanly into PEFT’s adapter system, and require no changes to loading logic or user workflows. Users can simply add a dummy “router” adapter, activate it, and Arrow handles token-level routing over the loaded experts in place. It’s lightweight, modular, and leverages the LoraVariant interface for minimal disruption. Regarding GenKnowSub: in our paper, we show that subtracting a general-domain LoRA from each task-specific LoRA before applying Arrow significantly improves modularity and routing quality. To obtain these general-domain LoRAs, we simply fine-tune the base model on a general corpus (e.g., Wikipedia) using LoRA with a causal language modeling objective. These general LoRAs—such as ones trained on different languages—are then loaded into the model alongside the task-specific LoRAs. Because GenKnowSub is a one-time, inference-only preprocessing step, we’ve implemented it as a toggle in the Arrow router config via use_gks=True. When enabled, the router adapter will subtract the average of the general LoRAs (e.g., language experts) from each task-specific LoRA prior to routing. Here’s a concise example showing how a user would load LoRA experts, apply GenKnowSub, and use Arrow routing: This flow is designed to be non-invasive and intuitive. Once the dummy router adapter is added and activated, it internally handles everything through its custom forward() path, including:
No special hooks, overrides, or custom code are needed beyond activating the adapter — the standard .forward() or .generate() calls will work seamlessly. If helpful, I’d be happy to walk through how this is handled inside the forward() logic of the ArrowLoraLinearLayer. |
|
Thanks for taking the time to give a thorough answer.
Yes, I thought about this again and I'd still implement it as its own method for better separation of concerns and a clearer interface for the user. However, I can also see how this would involve a lot of code duplication when we could re-use all the infrastructure that is already there in form of LoRA. So, I suggest a compromise:
For testing we should implement stand-alone tests similar to Do you think that would make sense? |
|
Sounds good to me! I'll start working on the changes and add the necessary test file(s) soon. My understanding is that most of the work will involve refactoring existing code into a new class for the standalone Arrow method – is that correct? |
Not quite what I meant. My suggestion is to keep the variant solution that you already have because it re-uses a lot of code from LoRA and making it its own method would duplicate a lot of code. To minimize the impact on future configs I suggested to use a breakout config (point (1) in my post). Since the variant implementation will not have a So instead of this: # Load task-specific LoRA adapters
model.load_adapter(..., adapter_name="cluster0")
...
model.load_adapter(..., adapter_name="cluster9")
# Load general-domain LoRA adapters (e.g., English, French, German)
model.load_adapter(..., adapter_name="le_en")
model.load_adapter(..., adapter_name="le_fr")
model.load_adapter(..., adapter_name="le_de")
# Now the model contains all the task-specific and general-domain LoRAs
# Define Arrow + GenKnowSub config as a dummy LoRA router
router_cfg = LoraConfig(
r=2, # dummy rank since A and B won't be used!
use_arrow=True, # This will turn this LoRA to the ArrowLoraVariant
arrow_expert_num=10, # Number of task-specific modules in each LoRA layer
arrow_top_k=3, # Number of selected task-specific LoRAs for each token in each layer
arrow_router_temperature=1.0,
use_gks=True, # ← enable GenKnowSub!
le_names=..., # name of loaded general-domain LoRAs
ts_names=..., # name of loaded task-specific LoRAs
target_modules=["qkv_proj", "o_proj"]
)
# Add a dummy router adapter to trigger Arrow + GenKnowSub logic
model.add_adapter(adapter_name="router", peft_config=router_cfg)
model.set_adapter("router")
# Done! Just call `model.generate()` or `model(...)` as usual.there will be something like from peft import create_arrow_model, ArrowConfig
peft_model = create_arrow_model(
base_model=base_model,
task_specific_adapters=[path0, path1, path2],
general_knowledge_adapters=[path3, path4, path5],
arrow_config=ArrowConfig(...)
)This function would live in the |
|
That would be much more straightforward to implement, working on it! @githubnemo |
|
✅ Implementation complete! @githubnemo I’ve added just two attributes to the Currently working on the test files—will push the final version soon. P.S. The final version with create_arrow_model and tests/test_arrow is committed and pushed. 🎯 |
githubnemo
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! This is already looking good, I think that this design works quite well.
Lots of smaller nit picks which need to be resolved first before doing a more thorough review. One bigger point is the precomputation of prototypes / gks values which disallows adding / removing adapters. It also prevents us from switching to an expert adapter, do fine-tuning and switching back to the router (the prototypes would be outdated). While I think that supporting adding / removing adapters is a smaller change, allowing for fine-tuning while arrow layers are present is a bit more complicated. I expect GKS to be OK with this since we only subtract the weights once at the beginning and could do fine-tuning afterwards. I'm not sure if it is possible to solve this without removing prototype computation. Maybe we should make prototype computation optional. WDYT?
The precomputation of prototypes (and the averaging of general experts for GKS) is lazy—it happens in the forward() of This means fine-tuning with multiple experts is fully supported as long as "arrow_router" is not active during training. You can activate other adapters, add/remove experts, fine-tune as needed, and then switch back to "arrow_router". At that moment, prototypes (and GKS values) will be computed based on the currently loaded experts. However, in the case of adding or removing LoRA modules to/from the arrow_model, we need to ensure that the task-specific and general LoRA module lists are kept in sync. This can be handled via a utility function (similar to |
I think there are also training scenarios that won't work. For example if I want to test the setup's performance end-to-end during training. Let's say I have a tool usage benchmark (e.g., a multi-turn conversation with intermittent tool calls) and I have several adapters, one for each tool. It would be hard to separate the individual steps of the benchmark for each tool so I have to test the whole arrow + adapter setup. This means that I have to train a single adapter, return to the arrow adapter, evaluate, return to the single adapter, etc. This would inherently be incompatible with caching the prototypes.
Yes, something like def update_layer(self, adapter_name, ...):
[...]
if hasattr(self, "lora_arrow"):
for adapter in self.lora_variant:
if adapter in self.lora_arrow:
self.lora_arrow[adapter].on_adapter_change() |
Should I go ahead and implement The point is that if an adapter is loaded or removed from the model, the user should explicitly indicate which adapters are task-specific and which are general adapters used in GKS. One possible approach is to add an attribute in That said, I think that whenever a user adds or removes a LoRA from the arrow_model, they should also indicate the final set of task-specific and general LoRAs so we can properly organise things. What do you think? |
Yeah, let's go ahead and implement |
✅ Implemented!
|
32a3462 to
d345645
Compare
|
Give me a ping when I can review this again. Regarding review procedure it would be best to keep the comments open instead of marking them as resolved. Make sure to merge main into your branch regularly to keep the PR mergable! :) |
|
Hey @githubnemo , I’ve merged the latest main branch into my PR and re-ran all the tests — all 7 test cases passed successfully. ✅ There are still some style/formatting issues coming from the merged main branch (see screenshot below), so I haven’t run I had also marked all previously open comments as resolved — just so I can address them systematically one by one. They are all marked as unresolved again now. The PR is now ready for review! |
49535ca to
3e4c805
Compare
|
Hey @githubnemo , I’ve updated the prototype computation logic and merged the latest changes from main. |
|
A minor while important update in |
|
I think there was an issue with rebasing and formatting.
We've recently updated the ruff version, maybe yours is now outdated. Could you go over the PR once and try to make sure that the PR aligns with main and doesn't contain irrelevant changes (e.g., auto-formatting of unrelated things). That would help quite a lot, thanks! |
baa03d8 to
b5ace6a
Compare
|
Hey @githubnemo , Rebased on the latest main, removed unrelated changes, restored upstream entries (MiSS/SHiRA), and updated quality hooks. Ready for review ✅ |
githubnemo
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for dealing with the rebase, everything looks fine now!
There are still some aspects that deserve testing like the hyper-param compatibility check and possibly GPU/bnb related code but in general it looks quite good already.
Since the implementation seems reaching maturity I'd suggest that we focus on adding documentation and examples. You can document the new method in a new section in docs/source/developer_guides/lora.md explaining what the method does, that it can be enhanced with GKS and provide brief code examples, similar to the rest of the document. Providing a bit of insight how to manually instantiate the arrow layer for advanced users is a plus. This should include that the naming scheme for loaded adapters is documented (in the code as well) since that is now a major part of how arrow ingests adapters.
In your original post of this PR you mentioned that you already have some examples for common benchmark datasets. Do you think you would be able to contribute an example from one of them? Normally I'd suggest adding the method to the MetaMathQA benchmark suite in PEFT but Arrow (+GKS) make a lot of assumptions that are not true for other methods so I'd rather skip that at this time. Nevertheless, I think having an example that showcases the benefit of Arrow and Arrow + GKS over baseline would be great.
src/peft/tuners/lora/bnb.py
Outdated
| def resolve_lora_variant(self, *, use_dora: bool, use_arrow: bool, **kwargs) -> Optional[LoraVariant]: | ||
| if use_arrow: | ||
| # arrow variant must be imported so it registers itself | ||
| from .variants import ArrowLinearVariant | ||
|
|
||
| return ArrowLinearVariant() | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you tested forward as well? Maybe it makes sense to add a test to test_gpu_examples.py for testing the bnb integration properly. I think class TestEvaInitializationGPU is a good example on how to do this.
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
Regarding the bnb concern, I did test both Phi2 and Phi3-mini with 4-bit quantisation on a few datasets like Arc-easy, Arc-challenge, HellaSwag; they work absolutely fine!
Sure!
Yes, I can. I’ll run some experiments on PIQA, BoolQ, ARC-Easy, etc. to show how GKS and Arrow improve over the base model, using the task adapters trained on the Flan dataset. Should I add this example to |
|
I’ve added another test to cover the scenario of adding a new adapter, running a forward pass (to build prototype and apply gks), then activating the new adapter and calling forward again (to simulate training/inference with a newly loaded expert), and finally re-activating the In addition, I updated the documentation in |
githubnemo
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nicely done! Only some minor comments left.
I think the examples directory would be the right place.
I agree, something like examples/arrow_multitask or something akin to the task you have in mind.
Could you also check the CI? There seem to be a few errors due to some LoRA layers still missing the arrow_config parameter in their __init__ and update_layer methods.
|
All review suggestions have been addressed, and the new arrow-multitask script has been added to the examples directory. Waiting for your review ✅ |
githubnemo
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, with tests passing I was able to have a look at the coverage and while there are few tests missing, coverage looks good.
Some smaller comments / suggestions the biggest being moving create_arrow_model to arrow.py since it has nothing to do with variants as such but nothing too big.
The example looks very clean, thanks for that!
I think once the tests are implemented and CI passes we can merge this.
|
All review suggestions have been addressed in the code. I also verified formatting, and everything is clean now. Ready for your review! |
githubnemo
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, I'm letting the CI run to see if that passes but except for some nitpicks and a small bug in the GPU test it looks fine.
Regarding docs, let's add ArrowConfig to docs/source/package_reference/lora.md so that it is discoverable there as well.
Examples seem to run fine as well.
Could you merge with main and resolve the conflicts? If that diff looks good we can merge this PR :)
tests/test_gpu_examples.py
Outdated
| device_map="auto", | ||
| quantization_config=bnb_config, | ||
| ) | ||
| tok = AutoTokenizer.from_pretrained(model_id, use_fast=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| tok = AutoTokenizer.from_pretrained(model_id, use_fast=True) | |
| with hub_online_once(model_id + 'tokenizer'): | |
| tok = AutoTokenizer.from_pretrained(model_id, use_fast=True) |
the tokenizer wasn't fetched on the first invocation of hub_online_once so at this point we're already offline and the tokenizer fetching will fail.
You can verify that it works by running
pytest -m "single_gpu_tests and bitsandbytes" -k arrow tests/test_gpu_examples.py -x --pdb
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's fine now, thanks!
| Usage examples: | ||
| python evaluate.py --strategy base --ds_name arc-challenge | ||
| python evaluate.py --strategy arrow --ds_name boolq | ||
| python evaluate.py --strategy gks --ds_name hswag |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| Usage examples: | |
| python evaluate.py --strategy base --ds_name arc-challenge | |
| python evaluate.py --strategy arrow --ds_name boolq | |
| python evaluate.py --strategy gks --ds_name hswag | |
| Usage examples: | |
| python arrow_phi3_mini.py --strategy base --ds_name arc-challenge | |
| python arrow_phi3_mini.py --strategy arrow --ds_name boolq | |
| python arrow_phi3_mini.py --strategy gks --ds_name hswag |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch!
|
All review suggestions have been applied. I also merged the branch with main and ran the test suite; things look good overall. While running These warnings originate from the aLoRA config guard and are expected for non-CAUSAL_LM task types; they are not caused by the Arrow changes. The final version with all necessary changes is pushed and ready for review. ☑️ |
|
Looks good to me :) I'll run the CI one last time and if it's green we can merge! |
githubnemo
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for seeing the implementation through!
This will be a fine addition to PEFT :)
Hi there! 👋
This PR adds support for Arrow, a modular routing mechanism for LoRA experts introduced here, as well as our refinement method GenKnowSub, proposed in our ACL 2025 Main Conference paper. GenKnowSub enhances Arrow by subtracting a general-domain LoRA from task-specific ones prior to routing, leading to improved generalisation and modularity.
The integration is implemented through a new ArrowLoraLinearLayer, registered as a LoraVariant (similar to DoRA).
🔧 Code Changes
Modified files under peft/tuners/lora/:
Added a new file:
✅ Testing & Quality
Confirmed that tests/test_lora_variants.py passes.
All formatting and style checks pass (make quality successful).
📚 Optional Follow-up
I’ve also developed a separate GitHub repository demonstrating how to use Arrow and GenKnowSub on benchmark datasets (e.g., BoolQ, PIQA, ARC-Challenge). I’d be happy to contribute this as an example under peft/examples/ if it’s of interest.
Thank you in advance for reviewing this PR!