-
Notifications
You must be signed in to change notification settings - Fork 2.3k
[fix] engine - bing fix search, pagination, remove safesearch #2822
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Awesome job, thank you for looking into this one! I can confirm that this fixes pagination and time range search 🎉 I'll look deeper into the changes soon when I have time for it. |
Not sure you know; We fetch the languages and regions .. searxng/searx/data/engine_traits.json Lines 184 to 279 in d013f51
From the API description: https://learn.microsoft.com/en-us/bing/search-apis/bing-web-search/reference/market-codes But note, bing has different market-codes for WEB. Video, Images .. News There we describe the URLs / for instance under the names About Switzerland: If you look there Bing WEB has only fr-CH and fr-DE I wrote this just to let you know .. no need to change anything in your PR .. in my review I will have a look about languages / regions .. seems bing has changed a lot of things since I have implemented the engine. One question I have about your initial comment:
What '(Web)' icon do you mean .. I haven't seen anything in the content I don't expected. |
|
@return42 yep I saw we fetch those. If you noticed I changed the default for the language to For Switzerland I'm sure you meant fr-CH and de-CH, but yes those work now, I could've sworn french wasn't working during testing... No worries, not the focus on this PR anyways. (Also on Bing you can set region to "Switzerland - German" there isn't even a "Switzerland - French" option in the ui... and then set the language to Italian, and the results are pretty good. But that might be too much work for what it's worth.) Here are screenshots of the Web icons, I tested them with other countries and IP's and they still appear. |
|
Pew .. bing itself is somewhat broken. Here are the results when I search, with region Australia and language English: .. thats because I tested different language & regions before .. to get this result: I had to empty browser cache and delete the cookies from bing. But instead of a link to https://www.bmw-dubai.com I would expected for that query is a link to the branch office in Australia: M$ is broken by design. |
|
it turns out that the old market codes don't work anymore ... I had to rework the fetch_traits() and now read from https://www.bing.com/account/general I didn't manage more than bing-WEB today, I will have a look at the other bing engines in the next days. May I have to review fetch_traits() from the other engines also .. I remember that bing-News had special market codes .. nut sure if this has been changed in the meantime .. I will have look .. more coming soon 👍 |
|
@return42 what was your ip for the bmw austrilia search? I cannot reproduce even with Hong Kong ip: |
|
Also, I wasn't able to observe |
Yes, the market code ( In your patched version the page breaks didn't work for me, I had to add the argument Further in my tests in China, Japan and other I noticed that I never should send the TL; DR; all settings have somehow side effects on the other settings, it is a big jumble of options and nothing behaves predictable ... only sufficient test can give you at least some hope that it somehow works .. today .. and tomorrow everything changes --> as it always is with M$ products, I experience this every day at work 😢 |
829ad6d to
a731f40
Compare
|
@jazzzooo I have invested another day -- bing is so crude -- ... I think now we have a state where the issues are fixed and everything works as far as possible. We'll notice over time that there are still quirks in some languages and regions (especially at the bing-news) ... we'll have to sort them out in subsequent PRs. If you could do another final test, just to make sure I haven't missed any major bugs, then we could merge this PR for now (I'd like to release the bug fixes, the fine tuning can be done afterwards). |
|
@BernieHuang2008 can you have a look about the traditional and simplified Chinese in the bing engines we patched in this PR .. especially bing-web and bing-news are of interest in the regions ( |
| # In bing the market code 'zh-cn' exists, but there is no 'news' category in | ||
| # bing for this market. Alternatively we use the the market code from Honk | ||
| # Kong. Even if this is not correct, it is better than having no hits at | ||
| # all, or sending false queries to bing that could raise the suspicion of a | ||
| # bot. | ||
|
|
||
| _fetch_traits(engine_traits, bing_traits_url, xpath_language_codes, xpath_market_codes) | ||
| # HINT: 'en-hk' is the region code it does not indicate the language en!! | ||
| engine_traits.regions['zh-CN'] = 'en-hk' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@BernieHuang2008 this will alias the bing-news for :zh and :zh-CN to :zh-HK .. not sure if it is a good or bead idea of mine / may you know a better alternative?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I SearXNG we have a Search Syntax / with this syntax you can for instance select a language/region .. as you can do it in the drop-down menu in the left upper:
When I write down a search term here, I use this syntax / for instance :zh-TW !bing bmw is a search term that will use the bing engine with language Chinese in region Taiwan to query the word "bmw".
Note: You cant use this syntax on bing.com as you have done above in your screenshot.
Sure, but it may take a while because I need to understand what you're doing first |
|
I've found these bugs: You can see that there are two quotation marks ( Both in Traditional/Simplified Chinese, disappearing keywords are all over the page. Before this PR (in public instances), it worked fine. In Bing News, it also shows these reslts in entirety. |
I think its hard to understand (or explain) all the details of the bing engines .. most of what we have done here is a reverse engineering of the bing services from debugging what bing does when you use it in your WEB browser .. and bing is very complicated .. there is no clear API, nor is there by instance a clear list of market-code bing uses. Bing has two settings about languages and regions:
Ths is more or less similar for all bing engines: There is one thing to notice: It would be enough for me, if you do a simple test as a normal user and give us feedback if the results are in the expected language (and script). And if they fit to the region you have selected as user. By instance here is how I tested the regions:
as you may notice from my test above, I can't really test for these regions due to lack of language skills and ignorance of regionally preferred results ... I can't do that for many other regions either, but in the Chinese language area there is also the fact that in the regions partly simplified, partly traditional script is preferred. |
|
Oops, we posted at the same time .. :) |
|
... it doesn't matter ... the results satisfied me except the problem above. |
👍
How did you tested it / can you give my your search term ... |
|
sure, i use |
OK, I will have a look 👍 .. but this brings me to another question / and I'm sorry for asking dump questions: the search term seems simplified Chinese .. right? .. when I use this term, the language recognition (in SearXNG) switches to It is a littel off topic here in the thread where we test the bing engines, but what comes into my mind:
It also happens in other language areas that a language is not recognized correctly. You then have to explicitly select the desired language or region.. but what is it like here in the Chinese language area.. here the language recognition may also set the wrong region . |
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
This issue should now been fixed / I also added the: |
Not sure where you see a regression, when you go to bing.com and search in en-GB in their forms, you will get the results that are also shown in SearXNG in this branch ( |
Sorry / no, the left side of #2822 (comment) is what I would expect for regional search. With current master branch / From an IP located in DE, when I search for ..
.. links on top of the result list.
Do I understand you right, from this US IP you don't get:
.. links on top of the result list? |
yep I redid my testing again once with US IP and once with a German IP, and I restarted searx between every IP change. But it would be good if someone else can also test it. 820ab68:US IP:
German IP:
bc4c32bIt seems a bit more nuanced than I thought, whatever locale you search with first, will stick until you restart the instance... US IP:
German IP:
I don't observe any difference between US and German IP, so you should be able to reproduce this. I have no redis, I'm running with |
|
@return42 ping, this issue still exists and breaks region search on all public instances. I checked, and my version still works :) If you'd like, I can open an issue on this, but the code in this pr is relevant. |
Yes! .. I asked @unixfox and @dalf for a review. To sum up the long thread .. we compare two patches bc4c32b..820ab68 both commits are in the branch of this PR. The bing engine needed a review again and this PR was welcome to me, but the commit 820ab68 has in my opinion too much expansion (but may I am totally wrong / I looked to long on the bing engines to have a clear view anymore). I then did a review of this PR and had to include a few things back into the implementation (albeit in a slightly different form) -> bc4c32b. At the moment the focus in the discussion is on the languages/regions of the bing WEB engine ... but we must not lose sight of the other bing engines (news, videos images). I have tested bc4c32b in all possible languages/regions (please also test However, @jazzzooo and I end up with very different ratings. This can have many reasons, which do not necessarily lie in the implementation; bing does not behave deterministically! In my opinion, the bing client is already broken (see #2822 (comment)) ... the market codes and languages are used at several parameters and cookies --> and M$ seems to have lost the overview itself. In my opinion, there is no "absolutely right" or "absolutely wrong" ... we will only come to a conclusion that suits us best from experience .. this also requires a thorough reverse engineering of the bing client. Here now a comparison of the results of bing-WEB for UK and US ... but I also point out again that we must not neglect languages like
Yeah, thats one big problem in testing (and running) my patch .. May its the best we remove my patch from this PR but I would like to leave the decision to others ... @dalf and @unifox have asked for more reviewers on PRs ..
Not sure what you mean by "I don't observe any difference between US and German IP" .. ? .. your examples show that you got only UK-results on your US IP and only US-results on your german-IP |
@jazzzooo now I understand why we have such different experience :) .. read & lets continue in: |
What does this PR do?
Not fixed:
Why is this change important?
All my changes to the cookie logic were intentional. It is the "least" broken solution for now without redoing the region/language traits logic, I'll save that for another PR. The new User-Agent should be an improvement across the board, but it is still good to check that nothing broke. The previous User-Agent was not one that appeared in the wild, maybe now some other engines will work better too? Also, let's not use uuid1 in the future, uuid4 is faster and doesn't leak the mac address of the server.
How to test this PR locally?
!bing Coca-Cola(the most international term)!bin Coca-Cola!biv Coca-ColaRelated issues
Closes #2698
Could fix #2388 but I cannot reproduce, it might be related to #2641 , see my comment on that. We will know it's fixed if instances stop reporting it after this goes live tho.