这是indexloc提供的服务,不要输入任何密码
Skip to content

Conversation

@dalf
Copy link
Contributor

@dalf dalf commented May 17, 2025

What does this PR do?

Use httpx-curl-cffi to add a network parameter: impersonate.

See https://github.com/lexiforest/curl_cffi/blob/8b4ee6d4db0982a3d93191b698c56936dfdfdcf0/curl_cffi/requests/impersonate.py#L9-L77 for the possible values

Why is this change important?

See #3929

How to test this PR locally?

  • check enable_http2: true works
  • check enable_http2: false works
  • check verify: false works
  • check verify: "/path/to/ca.pem" works
  • check local_addresses works
  • check HTTP proxy
  • check HTTPS proxy
  • check SOCKS proxy
  • check the qwant engine

Author's checklist

Based on #4674

curl-cffi requires Python 3.10 or above and ARM64 or AMD64. With curl-cffi, the engines using the impersonate parameter crash with an explanation.

Related issues

@dalf dalf force-pushed the httpx_impersonate branch 2 times, most recently from 9d5fd35 to 5736897 Compare May 17, 2025 17:22
@dalf dalf requested a review from Copilot May 17, 2025 17:23
@unixfox
Copy link
Contributor

unixfox commented May 17, 2025

I would use httpx_curl_cffi globally:

  • Better for privacy, blends in with the mass of Chome users.
  • Many engines probably already use JA3 identification for bot usage, at least all the ones using Cloudflare do. Specifying "network: chrome" on all the affected engines would be very time-consuming, at least you get it by default. This might solve the rate limits or CAPTCHA issues for some existing engines.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces an "impersonate" network parameter to support httpx-curl-cffi for improved client transport configuration, which is utilized in various parts of the network and engine modules.

  • Added "impersonate" option to network settings in settings.yml.
  • Propagated the "impersonate" parameter through the Network and client APIs, updating client creation and error handling.
  • Updated the Qwant engine to include additional HTTP headers for requests.

Reviewed Changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.

File Description
searx/settings.yml Adds network.impersonate configuration (set to "chrome")
searx/network/network.py Updates Network initialization and client key construction to include impersonate
searx/network/client.py Integrates impersonate in new_client and mount selection logic with proper error handling
searx/engines/qwant.py Adds custom HTTP headers for the Qwant engine requests

@dalf
Copy link
Contributor Author

dalf commented May 17, 2025

I would use httpx_curl_cffi globally

The package is not available on ARMv7 and Python 3.9.

[EDIT] it should compile on ARMv7 : lexiforest/curl_cffi#304

@unixfox
Copy link
Contributor

unixfox commented May 17, 2025

I would use httpx_curl_cffi globally

The package is not available on ARMv7 and Python 3.9.

Yes but only if it's available. I'm talking about the customization done like these: https://github.com/searxng/searxng/pull/4801/files#diff-e95e3f454925bc2344d3cf538950b11a1255935b47a609c9589e6ac33d74802fR1713-R1714

@dalf
Copy link
Contributor Author

dalf commented May 17, 2025

Notes:

  • If we replace the user agent by Chrome, some other engines might need some updates since the default user agent is FF.
  • I've tried to use "impersonate: firefox", it didn't worked.

(I'm saying we should or should make it global)

@dalf
Copy link
Contributor Author

dalf commented May 18, 2025

Looking at the benchmark, httpx-async is one of the slowest:

Should we be worried that a single, commercially backed company is driving the project and could one day switch off the updates, requiring SearXNG to migrate to a pure httpx client?

@dsmith2001
Copy link

dsmith2001 commented May 21, 2025

Notes:

  • If we replace the user agent by Chrome, some other engines might need some updates since the default user agent is FF.

  • I've tried to use "impersonate: firefox", it didn't worked.

(I'm saying we should or should make it global)

Just a heads up if you're using e.g. impersonate=chrome: curl_cffi sets the user agent and the impersonated browser's headers for you. You can override them but it's something to keep in mind.

You can use something like httpbin's "anything" endpoint to see what's being sent:

Should we be worried that a single, commercially backed company is driving the project and could one day switch off the updates, requiring SearXNG to migrate to a pure httpx client?

The project is open source and can be forked, it's not like it's some super obscure project either so I am not too worried.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants