-
Notifications
You must be signed in to change notification settings - Fork 2.5k
Update minimum selenium version supported for crawler #2921
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@ZanSara I did notice that with these changes that the following install on Google Colab !pip install --upgrade pip
!pip install git+https://github.com/deepset-ai/haystack.git@issue_crawler_colab#egg=farm-haystack[colab,crawler]looks to hang perpetually in trying to resolve dependencies (this was the original issue brought up in discussion #2919). I'm still trying to pin down what is causing the problem. I haven't been able to reproduce this locally on my machine. And without the changes in this PR the install is successful but the selenium version becomes |
|
@sjrl and @ZanSara I'm sorry to get into this PR. What happens here is that when you get the colab option, you are sticking grpcio version. This makes selenium unable to get installed because it depends on newer version. This chat on Slack mentioned Colab issues with Selenium issues: https://haystack-community.slack.com/archives/C01J3TZM9HT/p1656228551083839 |
I think it must be a secondary dependency since I can't find it in their requirements file: https://github.com/SeleniumHQ/selenium/blob/trunk/py/requirements.txt @ZanSara Do you know if we could potentially bump the grpcio version in |
Right, this issue was caused by the Selenium version 3 as you identified hence the attempted fix with the minimum version requirement. But it looks like we might also need to bump the grpcio version, is that right @danielbichuetti ? |
|
Yes, the main issue is that is not a direct dependency issue. It's an indirect. When grpcio is set fixed, other dependencies that Selenium relies on, also get limited by the grpcio. When using Colab I forced the installation of 4.3.0. Apparently no behavior was affected, but need further investigation, I suppose. @sjrl |
|
Hey @sjrl and @danielbichuetti! The To avoid the runtime restart, we pinned |
|
@ZanSara @sjrl The dependency issue is this:
It's not the grpcio indeed, indeed the google-colab is. While investigating this, I found grpcio can be forced to 1.48. No Colab issues. datascience package can also be forced to 0.17.5 safely on Colab. On Colab repo we have this issue: googlecolab/colabtools#2604 where they discuss it. @craigcitro advises they are only warnings and not errors. Maybe pin requests to a newer version. What do you think? |
The current version of grpcio in google colab is 1.47.0. So it sounds like it should be okay to pin it to 1.47.0. What do you think @danielbichuetti? And thanks for investigating for any potential issues. |
|
@sjrl The differences from I just found one issue on Google API Python client repo where they discuss a similar case. The lack of updates on the colab =
grpcio==1.48.0
requests>=2.26.0 |
|
@danielbichuetti I'm unsure about forcing the And as @danielbichuetti points out the dependency issue is |
|
@sjrl Installing it manually Another solution that would be possible:
But, it is much complex then pinning requests. |
|
I'm not sure I get what's the issue of pinning |
The base problem now is that |
Indeed, |
|
@sjrl After further investigation, I found one Selenium version which install without dependencies issues: 4.1.3 So, this should resolve the issue This remembered me why I "love" so much complex dependencies chains. |
|
@danielbichuetti Thanks for looking into this. If version
And yes this also reminds why I find google-colab to be frustratingly out of date when it comes to staying up to date on packages. |
|
@sjrl 😃
It's not the only version, it's the most updated one. The dependency on the new urllib3 was introduced on 4.1.4. It was a bad explanation, but the code is the correct scenario. So >=4.0.0 for the new parameter and <=4.1.3 because of the change in deps. |
It looks like in version |
|
@sjrl Indeed, I did a huge mistake when pasting versions. I was collecting some version issues that were possible with haystack. On 4.1.4 it was introduced one incompatibility with 3.7, but not on urllib3. The urllib3 pin to 1.26 is not present in Selenium 4.0.0-alpha-7 |
|
@sjrl @ZanSara After that huge mistake, I segmented environments and started a direct test on Colab. Currently, I could pin and make it work with Colab. Despise being an outdated version of Selenium, it's far better than v3 and won't throw errors: The dependency pinned code is: Here is the working notebook. I really apologize for that version issue. |
|
Thanks to you for all the work!! 🙌 Let's apply these two pins for now, I think it's the best idea. @sjrl What do you think? It's a pin to an alpha version, but if it's the only one that works I think we can't do any better. This might be temporary anyway. I'll let you approve this PR 👍 |
|
Thanks for tagging me in! Just catching up, but to ask: could we make anyone's life easier by loosening the |
|
Hey @craigcitro! That's outstanding you joined us. 🎉 If you could do it, it would be great! Newer versions of urllib3 aren't compatible with 2.23.0. Many packages rely on more updated Indeed, this PR is mainly because of Despise being possible to update manually, |
|
Done! I'm making the fix on our side, though it'll take a few days to go live in Colab. In general, feel free to ask on https://github.com/googlecolab/colabtools/issues when you hit trouble! Always happy to help our users when we can. 😄 |
|
@craigcitro Thank you very much! I have no words for this 🥳 |
|
Just tried it out this morning and it does not look like google-colab has updated yet with the relaxed requirements for |
|
Ah, I should have clarified that I meant business days -- our releases only happen on days we're there in case something goes sideways. Change is live now. |
Related Issue(s): Issue #2920
Proposed changes:
Pre-flight checklist