这是indexloc提供的服务,不要输入任何密码
Skip to content

Conversation

@inetol
Copy link
Member

@inetol inetol commented Jun 12, 2025

Replaces x_for functionality with trusted_proxies. This allows defining which IP / ranges to trust extracting the client IP address from X-Forwarded-For and X-Real-IP headers.

We don't know if the proxy chain will give us the proper client address, so we rely on reading the headers of the proxy before SearXNG (if there is one, in that case it must be added to trusted_proxies) hoping it has done the proper checks. In case a proxy in the chain does not check the client address correctly, integrity is compromised and this should be fixed by whoever manages the proxy, not us.

Closes:

Related:

@inetol inetol force-pushed the trusted-proxies branch from c410e7d to 76c8a1a Compare June 12, 2025 13:18
@inetol inetol marked this pull request as ready for review June 12, 2025 13:21
@inetol inetol requested review from Bnyro and return42 June 12, 2025 13:21
Copy link
Member

@Bnyro Bnyro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that it only makes sense to check these headers if we're using a proxy.

  • Otherwise, if somebody runs an instance without reverse proxy, attackers could just spoof faulty X-Real-IP headers. (I seriously hope that everybody uses a reverse proxy though)

Bnyro
Bnyro previously approved these changes Jun 12, 2025
inetol added a commit to inetol/searxng that referenced this pull request Jun 13, 2025
vojkovic
vojkovic previously approved these changes Jun 13, 2025
inetol added a commit to searxng/searxng-docker that referenced this pull request Jun 14, 2025
inetol added a commit to inetol/searxng that referenced this pull request Jul 10, 2025
@inetol inetol force-pushed the trusted-proxies branch from 8b21fc6 to 29b4aa3 Compare July 10, 2025 18:46
@inetol
Copy link
Member Author

inetol commented Jul 10, 2025

Rebase to latest master, no changes made.

BTW this PR might require a migration for anyone using x_for:

We have replaced x_for with trusted_proxies in limiter.toml. You can now configure which proxies to trust to get the client IP, which is necessary for botdetection and plugins like "self info".

If you want to know if everything is working after updating, with the "self info" plugin enabled, search "ip" in your instance, if your public IP appears then you are all set.

https://docs.searxng.org/admin/searx.limiter.html#limiter-toml
https://docs.searxng.org/src/searx.botdetection.html#method-ip-lists

@return42
Copy link
Member

I think this PR will not solve the main issue ..

Closes:

SearXNG uses the ProxyFix with it default x_for=1 .. its implemented here in this line:

app.wsgi_app = ReverseProxyPathFix(ProxyFix(app.wsgi_app))

Meaning the ideal solution would be to make the x_for parameter configurable in settings.yml for SearXNG, instead of limiter.toml, which would set the parameter for both the uWSGI app and for the limiter.

@inetol
Copy link
Member Author

inetol commented Jul 13, 2025

trusted_proxies is a full replacement of x_for and friends, not a thing that goes along with it.

I have added a commit cleaning the leftover.

Comment on lines 195 to 201
def get_cfg() -> Config:
global CFG # pylint: disable=global-statement

if CFG is None:
from searx import settings_loader # pylint: disable=import-outside-toplevel

cfg_file = (settings_loader.get_user_cfg_folder() or pathlib.Path("/etc/searxng")) / "limiter.toml"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moving the get_cfg here seems obvious, but the limiter implementation shouldn't be tied directly to the botdetection package.

Background:

A long time ago, there was an intention to move botdetection into a standalone Python package. Due to lack of time, this project was abandoned.

In the past, we therefore avoided binding botdetection directly to the limiter (as far as I remember, this worked, except for one minor detail).

That's for your information. I'll consider how this situation could possibly be improved in my review... but the simplest approach will probably be to leave the get_cfg in the limiter.

@return42 return42 force-pushed the trusted-proxies branch 2 times, most recently from 893a887 to 3ecb9c5 Compare July 20, 2025 15:45
@return42
Copy link
Member

I pushed a WIP commit on top, with the suggestions from mine:

  • Improve separation of limiter (cfg & Valkey) from botdetection.
  • The trusted_proxies config is needed to determine the real IP --> moved option botdetection.ip_limit.trusted_proxies to real_ip.trusted_proxies
  • Added obsolete real_ip.x_for to deprecated list.

The second commit [fix] botdetection types: SXNG_Request -> flask.Request is a little bit OT in context of this PR / However, it improves the separation of bot detection from the limiter.

@return42
Copy link
Member

trusted_proxies is a full replacement of x_for and friends, not a thing that goes along with it.

Sure?

I have added a commit cleaning the leftover.

The old Werkzeug's ProxyFix class sets environ["REMOTE_ADDR"] -->

https://github.com/pallets/werkzeug/blob/504a8c4fbda9b8b2fd09e817544ffd228f23458e/src/werkzeug/middleware/proxy_fix.py#L150-L152

By setting the environment, the flask.Request.request.remote_addr depends on the value of x_for .. as far I can see, this PR does not influence the value of flask.Request.request.remote_addr .. or I'm wrong?

@inetol inetol changed the title [mod] botdetection: trusted proxies [mod] limiter: trusted proxies Jul 25, 2025
@inetol inetol marked this pull request as draft July 25, 2025 10:06
@inetol inetol force-pushed the trusted-proxies branch from 052bf06 to 7d11f6f Compare July 25, 2025 11:06
@inetol inetol marked this pull request as ready for review July 25, 2025 11:09
@inetol
Copy link
Member Author

inetol commented Jul 25, 2025

By setting the environment, the flask.Request.request.remote_addr depends on the value of x_for .. as far I can see, this PR does not influence the value of flask.Request.request.remote_addr .. or I'm wrong?

Correct, but as I see that this will once again cause issues in the future, like #4448, I created a middleware to normalize the incoming REMOTE_ADDR. https://github.com/searxng/searxng/pull/4911/files#diff-75819cdbd4611ce01c94e44e6f99b8a0110b1cb91c9e1d20143f4d088926c189

I've added additional tests in case someone connects to WSGI server via socket, since the implementation was not handled correctly in those cases. https://github.com/searxng/searxng/pull/4911/files#diff-4ff2a4139508d0036ded86f1086e92adadfb74ba33bb9939038bcb9bc3fe1729

For the rest, I've cleaned up some verbose logging, squash everything and rebase to master. From what I have been able to test, I have not seen further issues.

@inetol inetol dismissed stale reviews from vojkovic and Bnyro July 25, 2025 11:10

Outdated

@return42 return42 force-pushed the trusted-proxies branch 2 times, most recently from 20b09c9 to 19e27d7 Compare July 30, 2025 16:24
@return42
Copy link
Member

return42 commented Jul 30, 2025

I rebased and pushed a WIP commit on top ..

In the WIP I implemented a middleware searx.botdetection.trusted_proxies.ProxyFix which sets the correct client IP in the WSGI environment (flask.Request.remote_addr) .. by this there is no longer a need for a get_real_ip function ..

I implemented a handful of tests / hope I have covered all variations .. Since we do not know whether the IPs in the environments are valid IPs (the WSGI servers usually only pass through strings), these are also checked by the middleware.

@inetol
Copy link
Member Author

inetol commented Jul 31, 2025

I like the idea of moving this to the middleware, but wouldn't that mean it would no longer be specific to "botdetection"? Maybe we should move this to settings.yml instead as being webserver specific.

I added a commit fixing a couple of things. The addr mapping normalizations for origin and X-Real-IP header are missing (I know I made this comment a while ago, but I've changed my mind about this aproach while reviewing again).
There is also a case where the request failed because x_forwarded_for was out of range (if all addr are trusted and whitelisted), if everything is trusted, should return the first address as expected in https://developer.mozilla.org/en-US/docs/Web/HTTP/Reference/Headers/X-Forwarded-For#syntax

@return42
Copy link
Member

return42 commented Aug 9, 2025

but wouldn't that mean it would no longer be specific to "botdetection"?

From my POV it is specific to botdetection: Just as Flask offers a ProxyFix class (based on x_for), the botdetection implementation offers botdetection.trusted_proxy.ProxyFix class, which is based on a configuration named:

[botdetection]
trusted_proxies = [
  '127.0.0.0/8',
  '::1'
]

Maybe we should move this to settings.yml instead as being webserver specific.

Sorry, not sure what you mean by "this" ..

I added a commit fixing a couple of things.

Thanks a lot .. much better than my first implementation 👍

if everything is trusted, should return the first address

Good catch .. I added a test for this use case (and some other minor changes) in my WIP commit on top of your branch.

@inetol
Copy link
Member Author

inetol commented Aug 9, 2025

Sorry, not sure what you mean by "this" ..

The trusted_proxies setting, it's not specific to limiter.toml as is used in the "self information" plugin too (and it's a webserver setting). We could move it into server section inside settings.yml.

@return42
Copy link
Member

return42 commented Aug 9, 2025

The "self information" is no longer based on the botdetection .. it now uses the flask.Request.remote_addr / see

The trusted_proxies setting, it's not specific to limiter.toml

The name of the config file for SearXNG's limiter is limiter.toml but in this file you find configuration to a tool named botdetection .. What matters is the namespace / not the filename .. here is the namespace:

[botdetection]
ipv4_prefix =
ipv6_prefix = 
trusted_proxies =
[botdetection.ip_limit]
[botdetection.ip_lists]

Imagine if one day we integrate a tool called foo, then its namespace in the limiter.toml would be as follows:

[foo]
opt = "this is option foo.opt"
[foo.bar]
opt = "this is option foo.bar.opt"

@inetol
Copy link
Member Author

inetol commented Aug 9, 2025

OK, looks good to me now.

Replaces `x_for` functionality with `trusted_proxies`. This allows defining
which IP / ranges to trust extracting the client IP address from X-Forwarded-For
and X-Real-IP headers.

We don't know if the proxy chain will give us the proper client
address (REMOTE_ADDR in the WSGI environment), so we rely on reading the headers
of the proxy before SearXNG (if there is one, in that case it must be added to
trusted_proxies) hoping it has done the proper checks. In case a proxy in the
chain does not check the client address correctly, integrity is compromised and
this should be fixed by whoever manages the proxy, not us.

Closes:

- searxng#4940
- searxng#4939
- searxng#4907
- searxng#3632
- searxng#3191
- searxng#1237

Related:

- searxng/searxng-docker#386
- inetol-infrastructure/searxng-container#81
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants