-
Notifications
You must be signed in to change notification settings - Fork 2.3k
[mod] limiter: trusted proxies #4911
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Bnyro
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that it only makes sense to check these headers if we're using a proxy.
- Otherwise, if somebody runs an instance without reverse proxy, attackers could just spoof faulty
X-Real-IPheaders. (I seriously hope that everybody uses a reverse proxy though)
Early preparation for searxng/searxng#4911
|
Rebase to latest master, no changes made. BTW this PR might require a migration for anyone using
|
|
I think this PR will not solve the main issue ..
SearXNG uses the ProxyFix with it default Line 77 in d574339
Meaning the ideal solution would be to make the |
|
I have added a commit cleaning the leftover. |
searx/botdetection/config.py
Outdated
| def get_cfg() -> Config: | ||
| global CFG # pylint: disable=global-statement | ||
|
|
||
| if CFG is None: | ||
| from searx import settings_loader # pylint: disable=import-outside-toplevel | ||
|
|
||
| cfg_file = (settings_loader.get_user_cfg_folder() or pathlib.Path("/etc/searxng")) / "limiter.toml" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moving the get_cfg here seems obvious, but the limiter implementation shouldn't be tied directly to the botdetection package.
Background:
A long time ago, there was an intention to move botdetection into a standalone Python package. Due to lack of time, this project was abandoned.
In the past, we therefore avoided binding botdetection directly to the limiter (as far as I remember, this worked, except for one minor detail).
That's for your information. I'll consider how this situation could possibly be improved in my review... but the simplest approach will probably be to leave the get_cfg in the limiter.
893a887 to
3ecb9c5
Compare
|
I pushed a
The second commit |
Sure?
The old Werkzeug's By setting the environment, the |
Correct, but as I see that this will once again cause issues in the future, like #4448, I created a middleware to normalize the incoming I've added additional tests in case someone connects to WSGI server via socket, since the implementation was not handled correctly in those cases. https://github.com/searxng/searxng/pull/4911/files#diff-4ff2a4139508d0036ded86f1086e92adadfb74ba33bb9939038bcb9bc3fe1729 For the rest, I've cleaned up some verbose logging, squash everything and rebase to master. From what I have been able to test, I have not seen further issues. |
20b09c9 to
19e27d7
Compare
|
I rebased and pushed a WIP commit on top .. In the WIP I implemented a middleware I implemented a handful of tests / hope I have covered all variations .. Since we do not know whether the IPs in the environments are valid IPs (the WSGI servers usually only pass through strings), these are also checked by the middleware. |
|
I like the idea of moving this to the middleware, but wouldn't that mean it would no longer be specific to "botdetection"? Maybe we should move this to settings.yml instead as being webserver specific. I added a commit fixing a couple of things. The addr mapping normalizations for origin and |
From my POV it is specific to botdetection: Just as Flask offers a ProxyFix class (based on x_for), the botdetection implementation offers [botdetection]
trusted_proxies = [
'127.0.0.0/8',
'::1'
]
Sorry, not sure what you mean by "this" ..
Thanks a lot .. much better than my first implementation 👍
Good catch .. I added a test for this use case (and some other minor changes) in my WIP commit on top of your branch. |
The |
|
The "self information" is no longer based on the botdetection .. it now uses the
The name of the config file for SearXNG's limiter is [botdetection]
ipv4_prefix =
ipv6_prefix =
trusted_proxies =
[botdetection.ip_limit]
[botdetection.ip_lists]Imagine if one day we integrate a tool called [foo]
opt = "this is option foo.opt"
[foo.bar]
opt = "this is option foo.bar.opt" |
|
OK, looks good to me now. |
Replaces `x_for` functionality with `trusted_proxies`. This allows defining which IP / ranges to trust extracting the client IP address from X-Forwarded-For and X-Real-IP headers. We don't know if the proxy chain will give us the proper client address (REMOTE_ADDR in the WSGI environment), so we rely on reading the headers of the proxy before SearXNG (if there is one, in that case it must be added to trusted_proxies) hoping it has done the proper checks. In case a proxy in the chain does not check the client address correctly, integrity is compromised and this should be fixed by whoever manages the proxy, not us. Closes: - searxng#4940 - searxng#4939 - searxng#4907 - searxng#3632 - searxng#3191 - searxng#1237 Related: - searxng/searxng-docker#386 - inetol-infrastructure/searxng-container#81
Replaces
x_forfunctionality withtrusted_proxies. This allows defining which IP / ranges to trust extracting the client IP address from X-Forwarded-For and X-Real-IP headers.We don't know if the proxy chain will give us the proper client address, so we rely on reading the headers of the proxy before SearXNG (if there is one, in that case it must be added to trusted_proxies) hoping it has done the proper checks. In case a proxy in the chain does not check the client address correctly, integrity is compromised and this should be fixed by whoever manages the proxy, not us.
Closes:
get_real_ip()picks proxy IP #4907x_forfor both the WSGI app and for botdetection #3632Related: