这是indexloc提供的服务,不要输入任何密码
Skip to content

Conversation

@dalf
Copy link
Contributor

@dalf dalf commented Jun 21, 2021

What does this PR do?

see #159

@unixfox
Copy link
Contributor

unixfox commented Jun 21, 2021

One minor request change is to stop searching for number_of_results when use_mobile_ui is enabled because the div with the ID result-stats doesn't exist.

@dalf
Copy link
Contributor Author

dalf commented Jun 21, 2021

# results --> number_of_results
if not use_mobile_ui:

?

@unixfox
Copy link
Contributor

unixfox commented Jun 21, 2021

# results --> number_of_results
if not use_mobile_ui:

?

Oops never mind.

@return42
Copy link
Member

For me this patch seems to work. I can't validate if it also works when google blocks me, because google does not block me :-)

Does anyone have an idea how I can enforce to be blocked by google?

In the following I will leave some comments from my reverse engineering ... (similar to #159 (comment))

The content type is plain text:

grafik

The response text is idiosyncratic ... starting with )]}'

grafik

in the further course one finds fragments of HTML, one JS <script> tag and one <style> CSS tag.

What me wonders: the response is not (valid) XML but it seems that lxml's parser (lxml.html.fromstring(resp.text))could handle such a idiosyncratic data structure (read "Parsing HTML fragments" and "Really broken pages"

# convert the text to dom
dom = html.fromstring(resp.text)
# results --> answer
answer = eval_xpath(dom, '//div[contains(@class, "LGOjhe")]//text()')

I tested with !google foo and got a lot of answers, much more than from the traditional google and most of them are questionable (in the following screenshot I delimited the answers with //)

grafik


Randomly I checked paging and different languages (which means different domains google.com, google.de, google.es ...)

Over all I would say that this solution seems to work and that the lxml parser could handle this smashed plain text file (response).

I vote to give it a try in a production environment .. lets merge.

disable by default, it has to be enabled in settings.yml

related to  #159
@dalf dalf force-pushed the google_mobile_ui branch from 95e634a to 7a5c364 Compare June 21, 2021 12:53
@unixfox
Copy link
Contributor

unixfox commented Jun 21, 2021

For me this patch seems to work. I can't validate if it also works when google blocks me, because google does not block me :-)

Does anyone have an idea how I can enforce to be blocked by google?

A simple stress tool like hey will get you blocked after just 5 minutes. Here is the command that I used:

hey -c 100 -n 20000 https://www.google.com/search?q=test

The response text is idiosyncratic ... starting with )]}'

grafik

According to a friend, it seems like this is protobuf data serialized to JavaScript, more about that here: marin-m/pbtk#15 (comment)

What me wonders: the response is not (valid) XML but it seems that lxml's parser (lxml.html.fromstring(resp.text))could handle such a idiosyncratic data structure (read "Parsing HTML fragments" and "Really broken pages"

We could extract the correct HTML code, that's what I wanted to do at first, but lxml parsed the content without any issues, so I gave up the idea.

I tested with !google foo and got a lot of answers, much more than from the traditional google and most of them are questionable (in the following screenshot I delimited the answers with //)

It seems like I'm getting more Google answers, maybe it is not parsing the correct one... I don't know.

I vote to give it a try in a production environment .. lets merge.

By the way, it's already in test on https://searx.be!

But before merging it, do you think we should reduce the amount of parameters given in the async query parameter? I don't really understand the purpose of all the values in the query parameter, maybe some parameters could be used by Google in order to track Searx? So some parameters aren't really needed to pass and we could remove them?

@dalf dalf merged commit f4da4ba into master Jun 21, 2021
@dalf
Copy link
Contributor Author

dalf commented Jun 21, 2021

Note: if settings.yml doesn't include use_mobile_ui: true, the engine doesn't use the mobile UI.

@unixfox
Copy link
Contributor

unixfox commented Jun 21, 2021

Note: if settings.yml doesn't include use_mobile_ui: true, the engine doesn't use the mobile UI.

use_mobile_ui seems to be to true in the default settings.yml though: https://github.com/searxng/searxng/blob/master/searx/settings.yml#L586

@return42
Copy link
Member

But before merging it, do you think we should reduce the amount of parameters given in the async query parameter? I

Here is what I tried and what works for me:

    additional_parameters = {}
    if use_mobile_ui:
        additional_parameters = {
            'async': 'use_ac:true,_fmt:pc'
        }

@unixfox
Copy link
Contributor

unixfox commented Jun 21, 2021

Should we also document this enhancement in the settings.yml?

@return42
Copy link
Member

Should we also document this enhancement

If we have consolidated the development, I can add some doc-strings which could be shown here https://searxng.github.io/searxng/src/index.html

@return42
Copy link
Member

return42 commented Jun 21, 2021

Oops I see this PR is merged .. I will implement one more PR which reduces the parameters and add the documentation.

return42 added a commit to return42/searxng that referenced this pull request Jun 21, 2021
Reverse engineering shows that not all of the parameters used by google's mobile
UI (aka "more results" button) are needed [1].

[1] searxng#160 (comment)

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
return42 added a commit to return42/searxng that referenced this pull request Jun 21, 2021
Reverse engineering shows that not all of the parameters used by google's mobile
UI (aka "more results" button) are needed [1].

[1] searxng#160 (comment)

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
kvch pushed a commit to kvch/searx that referenced this pull request Jul 3, 2021
Reverse engineering shows that not all of the parameters used by google's mobile
UI (aka "more results" button) are needed [1].

[1] searxng/searxng#160 (comment)

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
kvch pushed a commit to searx/searx that referenced this pull request Jul 3, 2021
Reverse engineering shows that not all of the parameters used by google's mobile
UI (aka "more results" button) are needed [1].

[1] searxng/searxng#160 (comment)

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
@return42 return42 deleted the google_mobile_ui branch July 17, 2021 13:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants