-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Fixed 4bit compare UT on XPU #2843
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Hi @BenjaminBossan , please help review. Thanks! |
|
Hi, thanks for the PR. I assume you are from the same team as @yao-matrix?
Do you mean the latest release from bitsandbytes makes the tests pass? Why is that version not used for CI then? Or do you mean the tests require bitsandbytes installed from source? In that case, I would change the condition such that:
|
|
Echo Benjamin, our target is to cover all cases on XPU, so as a general principal, let's be very cautious of doing platform-specific skip. Pls follow Benjamin suggestion to figure out what bnb version is needed to pass. @YangKai0616 |
Sorry for my unclear expression. What I meant is that the current XPU can pass all tests of bitsandbytes. Because I noticed you mentioned that For the test case Additionally, I tested the FP32/4bit matmul, Linear outputs and quantization precision on both CUDA (A100) and XPU, and performed a cross-platform comparison. The results are as follows: The results show that the 4-bit quantization precision on XPU and CUDA (A100) aligns perfectly. The differences in Linear, MLP, and matmul computations are reasonable. The increased differences in large size matmul demonstrate the effect of accumulated errors. Considering that the ground truth for tests such as Thanks! |
|
I see, thanks for explaining. Honestly, I'm considering to remove these tests completely, as they are actually tests for bitsandbytes and not PEFT. We added them to PEFT because bnb didn't have its own CI for this at the time, but now the picture has changed. Therefore, I would like to avoid putting too much extra work into this, like providing separate artifacts per architecture. I started a discussion with bnb maintainers about removing the bnb tests in PEFT and will reply here once we have come up with a conclusion. |
|
@YangKai0616 After discussing with bnb colleagues, we decided to remove |
Thank you for your prompt response! Yes, you can see that the 4-bit BNB model is also used in test case test_regression.py::TestOpt4bitBnb::test_lora_4bit. The ground truth based on CUDA GPU is located at link. Should we maintain a similar ground truth for XPU? |
|
@YangKai0616 Okay, got it. So what could be done:
Let's add a comment to let the user know that if they run with XPU, the regression artifacts are loaded from a repo outside of Hugging Face's control, so they run this at their own risk. We use |
@BenjaminBossan Done. In the Please help review! |
BenjaminBossan
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding the Intel repo with the necessary regression artifact. In principle, this looks good. I would like to see a small change, namely that users have to explicitly opt in to using XPU via an environment variable, e.g. PEFT_USE_XPU. This way, we can ensure that users don't use it by accident.
tests/regression/test_regression.py
Outdated
| is_xpu = (infer_device() == "xpu") | ||
| if is_xpu: | ||
| lora_4bit_folder_path = os.path.join(REGRESSION_DIR, LORA_4BIT_FOLDER) | ||
| if os.path.isdir(lora_4bit_folder_path): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the REGRESSION_DIR is a temp folder created on script start, this shouldn't be necessary, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The setup_tearndown process will first download the complete artifacts files, and then download the XPU files to overwrite it. Therefore, the LORA_4BIT_FOLDER already exists at this point. The reason for detecting, deleting, and then overwriting in this way is that I have tried force_download=True, but it seems to have no effect.
Done! |
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
BenjaminBossan
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for making the regression tests pass on XPU.
Similar to #2473 . Our internal CI testing for XPU found that current test results for cases like
test_bnb_regression.py::test_opt_350m_4bit_quant_storageandtest_regression.py::TestOpt4bitBnb::test_lora_4bitare similar totest_bnb_regression.py::test_opt_350m_4bit, vary due to thebitsandbytesversion and hardware platform.XPU currently can pass all UT files in the latest
bitsandbytesversion. Therefore, can we temporarily disable these specific examples on XPU to prevent CI errors?