-
Notifications
You must be signed in to change notification settings - Fork 2.3k
[feat] add favicons to result urls #3727
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@vojkovic thanks a lot for your work on this task 👍 .. the CI reports some issues you can fix by |
Should be good now 👍 |
|
Yeah, thanks .. we separate the source code modifications of the static/themes from the build of the static files .. for this we use For a PR it would be better to set up a feature branch ... but now this PR is already in your master and I hope you don't need the master for other purposes than this PR. |
|
Should have read the guide... - Should be alright now. Not using the master branch for anything else, but will keep that in mind for future commits. :) |
|
Today I had my first look at your PR and it's a pleasure to see how you've immersed yourself in the topic and the development of SearXNG .. many thanks for this really excellent work 👍 We will be able to simplify a few details and I already have solutions in mind on how we can implement the whole thing in a more resource efficient way (we should reduce the incoming and outgoing request/responses cycles on the server side). But before I start I will have one or the other question ... please do not change the implementations after my questions, we can do that together in the end ... or I will then make concrete suggestions. |
|
I realised - I've forgot to add any HTTP headers to cache the image on the client side. |
|
Thanks for this PR @vojkovic, this looks awesome! PS: I'm already thinking about writing my own small self-hostable favicon cache/proxy that could be used with these changes here to not rely on any of the big providers (or include one that's opt-in in SearXNG, although I think a separated project would be better for this). |
|
I can't edit (git push to) the PR .. Did you blocked this branch or is it because it is the master branch? For git push I missed this option: I did a lot of work the last days and my intention was to force push the additional commits from my WIP-3727 branch: |
|
Nice changes - Weird you can't push to it, maybe because it's from my org... I've sent an invite so you can make changes to that repo. I like your changes, and I think it's the correct direction for this feature. Just a quick thing if we're going to start doing caching - my searxng docker image is 263MB on disk, there should be some sort of warning if searxng is going to potentially x4 in disk size (max favicon storage is 1gb). |
👍
Both consume resources; if the icons come from the cache, the cache needs disk space, if the cache is turned off, the additional requests from the client and the additional requests from the resolver use significantly more bandwidth. Here are a few thoughts from me:
@Bnyro I had this in mind when revising the favicons; the current revision is modularized to the extent that it can easily be moved to an external project. However, in order to work in a resource-efficient manner, it will always be necessary for SearXNG to have access to the cache. |
|
@return42 I'm getting a 500 Internal Server Error when trying to search.
This is running on a fresh Github codespace doing |
Sorry did a mistake when I implemented some compatibility stuff .. should now fixed / please try again. |
|
So the favicon proxy has been in production on my instance for a few days now, and it's been working flawlessly. There's a noticeable speedup in having the favicons cached on the server. Really great work on this @return42.
I agree. No matter what we do, this feature will take up more system resources. I suggest having a It also might be worth considering more maintainer changeable options notably for:
I don't think this is a feature that's needed assuming the maintainer can choose the maximum amount of space that the database will take up. I don't see many people changing an option like this.
Yes, I agree. I'm not really sure about keeping the database in /tmp, I feel like this should be more persistent. Also, a lot of instances are running inside docker containers with a tool like watchtower, that automatically updates to the latest image. Every time the container restarts, the entire database gets wiped. I suggest we recommend users to create a docker volume for the faviconcache.db so the data isn't purged. I can send in a PR to https://github.com/searxng/searxng-docker to make the changes to the default compose config once this gets merged.
My 2c is to keep it as one project - it's easily maintainable as is. |
|
I was wondering what I'm missing in SearXNG, and it was favicons - thanks for working on making this happen! I'm unsure what the overhead is in enabling this feature, but I think it would be great if it is turned on by default, using the domain.tld/favicon.ico type of pull. Also, integrating the proxy within the same codebase would make it easier to use. |
Sending requests to each individual search result compromises the instance privacy, so we've chosen to use public favicon resolvers instead. Keep in mind that not all domains host their favicon at My most active instance location has cached 209.8MB of favicons, exceeding the 184MB size of the Docker image of searxng. Since we can't always count on instance maintainers to monitor new updates closely, this feature should require the explicit opt-in from the maintainer to prevent any unexpected surprises. |
|
Thanks for the answers! I hope I'm not bloating the PR with additional thoughts and comments:
If favicon caching is a thing, a single request per instance per a reasonable unit of time (until it expires/time for a re-check) may not be particularly compromising, but I agree that using a public service to provide those acts as a privacy layer between the site and the search engine. Is caching using a public resolver possible?
I probably did not get what is meant with this - why does it matter how big the Docker image for SearXNG is relative to an active instance favicons data size? Downloading more files to a container started from a Docker image does not limit its filesystem size to the Docker image size. I'd expect the favicons containing folder to be a named volume and would personally use a mounted volume to store any cache outside of the SearXNG container to make updating the image more efficient. |
Yes, that's what is happening currently. Caching on both the client and server side.
Maybe I didn't quite get my point across, it's not a technical issue, but if you download a 200mb container, it would be surprising for it to double in size. Keep in mind, there's not really any other SearXNG feature that caches like this.
Yes, that's exactly what I recommended here: #3727 (comment) |
|
Hope I don't come across as too stubborn about this, but this feature makes the most frequent type of SearXNG use at least twice better for me due to the additional visual queue information, and it would probably work the same for most people.
I was surprised I did not have to mount any folders when I first tried SearXNG... What if favicons and caching are enabled by default, and the default max storage space for favicons is much smaller? That would minimise the disk use surprise factor (for people who have not read the docs 👀 ) and enhance the default UX. |
|
I was about to open an issue for that, but that is really what's missing me in SearXNG. Seems like @realies made a good point, it would just make sens to let the user set the cache size himself |
|
I just tried these changes out on a built container, and favicons are really what this app was missing. It enhances the overall app x2 just on its own. This addition works nice. The space consumption would be a bit of a concern though. Granted, I'm hosting on a 4TB machine, and I'd have to visit an exorbitant amount of sites to deal any damage, but as much as I use searching for various development reasons throughout the day; I could really rack it up. Something I thought of, obviously if it's a lot of work, could be looked at later. But I just wanted to toss the idea out there. Would there be any interest in a custom host option for favicons? In short, I've written my own favicon grabber, it takes a lot of different scenarios into account because there's so many different ways of grabbing a favicon, and I host that on a Cloudflare worker, so cloudflare does the hosting for it. I Just call the query via a domain URL, and it returns my favicon which I can then do whatever I want with. Out of the current options the developer of this PR has provided, I haven't looked at what the query limits are. But if there are limits, having a backup option may be nice. Only env var I see being needed is just a host URL to the favicon fetcher. My current one is queried using And it just returns the URL to the favicon for that domain (or overrides, if I input them into my worker). If you're asking "Why a backup", take a look at these three website's favicons: It appears I can't fetch that favicon for that domain, but if I use my custom favicon grabber, I can get it. |
|
Dear followers of this favicon PR, I'm sorry for my long absent on this topic. FYI: To continue I squashed the patches from the WIP commit into their related commits. Further I did a rebase on SearXNG's current master branch (to fix a conflict with the static files). |
Yeah, thanks a lot 👍 .. I assembled 433e5f8 from your patch. |
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
All favicons implementations have been documented and moved to the Python
package:
searx.favicons
There is a configuration (based on Pydantic) for the favicons and all its
components:
searx.favicons.config
A solution for caching favicons has been implemented:
searx.favicon.cache
If the favicon is already in the cache, the returned URL is a data URL [1]
(something like `data:image/png;base64,...`). By generating a data url from
the FaviconCache, additional HTTP roundtripps via the favicon_proxy are saved:
favicons.proxy.favicon_url
The favicon proxy service now sets a HTTP header "Cache-Control: max-age=...":
favicons.proxy.favicon_proxy
The resolvers now also provide the mime type (data, mime):
searx.favicon.resolvers
[1] https://developer.mozilla.org/en-US/docs/Web/HTTP/Basics_of_HTTP/Data_URLs
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
Run ``make docs.live`` and visit http://0.0.0.0:8000/admin/searx.favicons.html Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
Comes from a sughgestion in: - #3727 (comment) Suggested-by: Bnyro <bnyro@tutanota.com> Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
|
FYI: I had to rebase the branch to solve a conflict in |
Comes from a sughgestion in: - searxng#3727 (comment) Suggested-by: Bnyro <bnyro@tutanota.com> Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
|
What do you guys think, is this PR ready to release? I test this PR for more than a week on my production instance .. The favicons work perfectly, only the size of the WAL bothers me (see ls -lh /var/cache/searxng/) ... I guess the WAL file grows because the server processes run endlessly and the DB-Connector is therefore never closed. That reminds me that we have to continue with this PR ;-) For those of you testing this PR in production .. what can you say about your WAL file? |
|
From my point of view I also tested this for over a week in prod without any problems. RAM and CPU usage seem to be about the same as before this PR 👍 The database faviconcache.db is 150MB on my instance. Seems ok IMO. The |
|
Here's the statistics from each of my nodes from the last two months of use: Singapore Frankfurt London Chicago Sydney Bangalore San Francisco I'm pretty happy with how this looks, obviously the more that gets cached, the less favicons that will need to be requested in the future. LGTM 🚀 |
|
Many thanks for the cooperation on the favicons, thanks to @vojkovic SearXNG is once again a bit more up-to-date and user-friendly ... everything seems to be fine so far ... lets merge 🚀 |
Comes from a sughgestion in: - #3727 (comment) Suggested-by: Bnyro <bnyro@tutanota.com> Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
|
A big, big THANK YOU to everyone involved in this PR! |
|
Thanks everyone for working on this. Can't wait to try the latest out. But first I need to compare the CSS changes. I re-built the entire Searxng theme from the ground up, and I've stuck myself in a situation where I must manually do everything. Joy.... |
|
@return42 Just got to say. There's a crap-ton of settings in there.
You busted your tail on this favicon functionality, and it's about everything a person could dream of. The amount of detail in here is amazing. Thanks again. I'm going to go read the docs and see how I can implement the custom favicon resolver. This'll be the first big thing I've done with the app. |
|
Hey, this is very cool work, thank you! Excited to pull it down and test it out :-) |
|
Dear contributors to the favicon PR, can you please give a try and give me your feedback? /thanks 👍 |
What does this PR do?
Adds favicons near the urls in results.
(Spacing is broken on RTL mode due to #3724)
This works very similar to the implementation of the autocomplete backends.
You can also turn off favicons completely in results by setting it to none. I think some optimisations to my css and templates can be made for rtl mode.
After my tests, I believe that caching on the client-side is decent enough, so I've not implemented any cache to SearXNG. Let me know your feedback.
You can see the changes live already on https://priv.au, although this also has my theme patches.
Related issues
Closes: #1014 (after 2 years!)