这是indexloc提供的服务,不要输入任何密码
Skip to content

Conversation

@vojkovic
Copy link
Member

@vojkovic vojkovic commented Aug 11, 2024

What does this PR do?

Adds favicons near the urls in results.

image
image
image
image
(Spacing is broken on RTL mode due to #3724)

This works very similar to the implementation of the autocomplete backends.

You can also turn off favicons completely in results by setting it to none. I think some optimisations to my css and templates can be made for rtl mode.

After my tests, I believe that caching on the client-side is decent enough, so I've not implemented any cache to SearXNG. Let me know your feedback.

You can see the changes live already on https://priv.au, although this also has my theme patches.

Related issues

Closes: #1014 (after 2 years!)

@return42
Copy link
Member

@vojkovic thanks a lot for your work on this task 👍 .. the CI reports some issues you can fix by make format.python further its recommended to run make testbefore commit / Development Quickstart

@vojkovic
Copy link
Member Author

@vojkovic thanks a lot for your work on this task 👍 .. the CI reports some issues you can fix by make format.python further its recommended to run make testbefore commit / Development Quickstart

Should be good now 👍

@return42
Copy link
Member

Yeah, thanks .. we separate the source code modifications of the static/themes from the build of the static files .. for this we use make static.build.commit, the workflow is described in the Development Quickstart .. can you separate the commits .. if you have a doubt about how, I can do it for you in my review .. if it's okay with you if I do a force push on your feature branch .. which in this case is your master branch.

For a PR it would be better to set up a feature branch ... but now this PR is already in your master and I hope you don't need the master for other purposes than this PR.

@vojkovic
Copy link
Member Author

vojkovic commented Aug 11, 2024

Should have read the guide... - Should be alright now. Not using the master branch for anything else, but will keep that in mind for future commits. :)

@return42
Copy link
Member

Today I had my first look at your PR and it's a pleasure to see how you've immersed yourself in the topic and the development of SearXNG .. many thanks for this really excellent work 👍

We will be able to simplify a few details and I already have solutions in mind on how we can implement the whole thing in a more resource efficient way (we should reduce the incoming and outgoing request/responses cycles on the server side).

But before I start I will have one or the other question ... please do not change the implementations after my questions, we can do that together in the end ... or I will then make concrete suggestions.

@vojkovic
Copy link
Member Author

I realised - I've forgot to add any HTTP headers to cache the image on the client side.

@Bnyro
Copy link
Member

Bnyro commented Aug 15, 2024

Thanks for this PR @vojkovic, this looks awesome!

PS: I'm already thinking about writing my own small self-hostable favicon cache/proxy that could be used with these changes here to not rely on any of the big providers (or include one that's opt-in in SearXNG, although I think a separated project would be better for this).

@return42
Copy link
Member

I can't edit (git push to) the PR .. Did you blocked this branch or is it because it is the master branch? For git push I missed this option:

image

I did a lot of work the last days and my intention was to force push the additional commits from my WIP-3727 branch:

@vojkovic
Copy link
Member Author

Nice changes - Weird you can't push to it, maybe because it's from my org... I've sent an invite so you can make changes to that repo.

I like your changes, and I think it's the correct direction for this feature.

Just a quick thing if we're going to start doing caching - my searxng docker image is 263MB on disk, there should be some sort of warning if searxng is going to potentially x4 in disk size (max favicon storage is 1gb).

@return42
Copy link
Member

sent an invite so you can make changes t

👍

Just a quick thing if we're going to start doing caching

Both consume resources; if the icons come from the cache, the cache needs disk space, if the cache is turned off, the additional requests from the client and the additional requests from the resolver use significantly more bandwidth.

Here are a few thoughts from me:

  • The activation of the favicons must therefore be a decision of the maintainer of the instance (Currently the client decides).

  • The maintainer must be able to configure which resolvers are offered in the client (additional resolvers require additional resources)

  • The maintainer must be able to determine the location of the CACHE / this is not currently possible, but could be done with a small change.


I'm already thinking about writing my own small self-hostable favicon cache/proxy that could be used with these changes here to not rely on any of the big providers (or include one that's opt-in in SearXNG, although I think a separated project would be better for this).

@Bnyro I had this in mind when revising the favicons; the current revision is modularized to the extent that it can easily be moved to an external project. However, in order to work in a resource-efficient manner, it will always be necessary for SearXNG to have access to the cache.

@vojkovic
Copy link
Member Author

@return42 I'm getting a 500 Internal Server Error when trying to search.

AttributeError: 'SQLiteProperties' object has no attribute '_DB'

This is running on a fresh Github codespace doing make run as usual. Is this something on my end?

@return42
Copy link
Member

I'm getting a 500 Internal Server Error when trying to search.

Sorry did a mistake when I implemented some compatibility stuff .. should now fixed / please try again.

@vojkovic
Copy link
Member Author

So the favicon proxy has been in production on my instance for a few days now, and it's been working flawlessly. There's a noticeable speedup in having the favicons cached on the server. Really great work on this @return42.

The activation of the favicons must therefore be a decision of the maintainer of the instance (Currently the client decides).

I agree. No matter what we do, this feature will take up more system resources. I suggest having a favicons_enabled: option in settings.yml (default: off) which will turn off favicons completely. Then, instance maintainers can make their own decision.

It also might be worth considering more maintainer changeable options notably for:

  • Hold time of a cached favicon.
  • Limit of the favicon cached size.
  • Location of faviconcache.db (touched more on later)

The maintainer must be able to configure which resolvers are offered in the client (additional resolvers require additional resources)

I don't think this is a feature that's needed assuming the maintainer can choose the maximum amount of space that the database will take up. I don't see many people changing an option like this.

  • The maintainer must be able to determine the location of the CACHE / this is not currently possible, but could be done with a small change.

Yes, I agree. I'm not really sure about keeping the database in /tmp, I feel like this should be more persistent.

Also, a lot of instances are running inside docker containers with a tool like watchtower, that automatically updates to the latest image. Every time the container restarts, the entire database gets wiped.

I suggest we recommend users to create a docker volume for the faviconcache.db so the data isn't purged. I can send in a PR to https://github.com/searxng/searxng-docker to make the changes to the default compose config once this gets merged.

I had this in mind when revising the favicons; the current revision is modularized to the extent that it can easily be moved to an external project. However, in order to work in a resource-efficient manner, it will always be necessary for SearXNG to have access to the cache.

My 2c is to keep it as one project - it's easily maintainable as is.

@realies
Copy link

realies commented Aug 28, 2024

I was wondering what I'm missing in SearXNG, and it was favicons - thanks for working on making this happen!

I'm unsure what the overhead is in enabling this feature, but I think it would be great if it is turned on by default, using the domain.tld/favicon.ico type of pull. Also, integrating the proxy within the same codebase would make it easier to use.

@vojkovic
Copy link
Member Author

I'm unsure what the overhead is in enabling this feature, but I think it would be great if it is turned on by default, using the domain.tld/favicon.ico type of pull. Also, integrating the proxy within the same codebase would make it easier to use.

Sending requests to each individual search result compromises the instance privacy, so we've chosen to use public favicon resolvers instead. Keep in mind that not all domains host their favicon at /favicon.ico; some specify it explicitly in the HTML <head>, while others include it in manifest.json.

My most active instance location has cached 209.8MB of favicons, exceeding the 184MB size of the Docker image of searxng. Since we can't always count on instance maintainers to monitor new updates closely, this feature should require the explicit opt-in from the maintainer to prevent any unexpected surprises.

@realies
Copy link

realies commented Aug 28, 2024

Thanks for the answers! I hope I'm not bloating the PR with additional thoughts and comments:

Sending requests to each individual search result compromises the instance privacy, so we've chosen to use public favicon resolvers instead.

If favicon caching is a thing, a single request per instance per a reasonable unit of time (until it expires/time for a re-check) may not be particularly compromising, but I agree that using a public service to provide those acts as a privacy layer between the site and the search engine. Is caching using a public resolver possible?

My most active instance location has cached 209.8MB of favicons, exceeding the 184MB size of the Docker image of searxng. Since we can't always count on instance maintainers to monitor new updates closely, this feature should require the explicit opt-in from the maintainer to prevent any unexpected surprises.

I probably did not get what is meant with this - why does it matter how big the Docker image for SearXNG is relative to an active instance favicons data size? Downloading more files to a container started from a Docker image does not limit its filesystem size to the Docker image size. I'd expect the favicons containing folder to be a named volume and would personally use a mounted volume to store any cache outside of the SearXNG container to make updating the image more efficient.

@vojkovic
Copy link
Member Author

vojkovic commented Aug 28, 2024

Is caching using a public resolver possible?

Yes, that's what is happening currently. Caching on both the client and server side.

Why does it matter how big the Docker image for SearXNG is relative to an active instance favicons data size?

Maybe I didn't quite get my point across, it's not a technical issue, but if you download a 200mb container, it would be surprising for it to double in size.

Keep in mind, there's not really any other SearXNG feature that caches like this.

would personally use a mounted volume to store any cache outside of the SearXNG container to make updating the image more efficient.

Yes, that's exactly what I recommended here: #3727 (comment)

@realies
Copy link

realies commented Aug 28, 2024

Hope I don't come across as too stubborn about this, but this feature makes the most frequent type of SearXNG use at least twice better for me due to the additional visual queue information, and it would probably work the same for most people.

but if you download a 200mb container, it would be surprising for it to double in size

I was surprised I did not have to mount any folders when I first tried SearXNG...

What if favicons and caching are enabled by default, and the default max storage space for favicons is much smaller? That would minimise the disk use surprise factor (for people who have not read the docs 👀 ) and enhance the default UX.

@RMI78
Copy link

RMI78 commented Sep 3, 2024

I was about to open an issue for that, but that is really what's missing me in SearXNG. Seems like @realies made a good point, it would just make sens to let the user set the cache size himself

@Aetherinox
Copy link

Aetherinox commented Sep 5, 2024

I just tried these changes out on a built container, and favicons are really what this app was missing. It enhances the overall app x2 just on its own. This addition works nice.

The space consumption would be a bit of a concern though. Granted, I'm hosting on a 4TB machine, and I'd have to visit an exorbitant amount of sites to deal any damage, but as much as I use searching for various development reasons throughout the day; I could really rack it up.


Something I thought of, obviously if it's a lot of work, could be looked at later. But I just wanted to toss the idea out there.

Would there be any interest in a custom host option for favicons? In short, I've written my own favicon grabber, it takes a lot of different scenarios into account because there's so many different ways of grabbing a favicon, and I host that on a Cloudflare worker, so cloudflare does the hosting for it. I Just call the query via a domain URL, and it returns my favicon which I can then do whatever I want with.

Out of the current options the developer of this PR has provided, I haven't looked at what the query limits are. But if there are limits, having a backup option may be nice.

Only env var I see being needed is just a host URL to the favicon fetcher. My current one is queried using fav.domain.com/?domain=google.com

And it just returns the URL to the favicon for that domain (or overrides, if I input them into my worker).

If you're asking "Why a backup", take a look at these three website's favicons:

oihDvKwHSM

It appears I can't fetch that favicon for that domain, but if I use my custom favicon grabber, I can get it.

@return42
Copy link
Member

Dear followers of this favicon PR, I'm sorry for my long absent on this topic. FYI: To continue I squashed the patches from the WIP commit into their related commits. Further I did a rebase on SearXNG's current master branch (to fix a conflict with the static files).

@return42
Copy link
Member

return42 commented Oct 2, 2024

The patch for the RTL thing with the favicons:

Yeah, thanks a lot 👍 .. I assembled 433e5f8 from your patch.

vojkovic and others added 7 commits October 3, 2024 14:42
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
All favicons implementations have been documented and moved to the Python
package:

    searx.favicons

There is a configuration (based on Pydantic) for the favicons and all its
components:

    searx.favicons.config

A solution for caching favicons has been implemented:

    searx.favicon.cache

If the favicon is already in the cache, the returned URL is a data URL [1]
(something like `data:image/png;base64,...`).  By generating a data url from
the FaviconCache, additional HTTP roundtripps via the favicon_proxy are saved:

    favicons.proxy.favicon_url

The favicon proxy service now sets a HTTP header "Cache-Control: max-age=...":

    favicons.proxy.favicon_proxy

The resolvers now also provide the mime type (data, mime):

    searx.favicon.resolvers

[1] https://developer.mozilla.org/en-US/docs/Web/HTTP/Basics_of_HTTP/Data_URLs

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
Run ``make docs.live`` and visit http://0.0.0.0:8000/admin/searx.favicons.html

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
Comes from a sughgestion in:

- #3727 (comment)

Suggested-by: Bnyro <bnyro@tutanota.com>
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
@return42
Copy link
Member

return42 commented Oct 3, 2024

FYI: I had to rebase the branch to solve a conflict in definitions.less and in the /static files build by make static.build.commit.

return42 added a commit to return42/searxng that referenced this pull request Oct 3, 2024
Comes from a sughgestion in:

- searxng#3727 (comment)

Suggested-by: Bnyro <bnyro@tutanota.com>
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
@return42
Copy link
Member

return42 commented Oct 3, 2024

What do you guys think, is this PR ready to release?

I test this PR for more than a week on my production instance ..

The favicons work perfectly, only the size of the WAL bothers me (see ls -lh /var/cache/searxng/) ... I guess the WAL file grows because the server processes run endlessly and the DB-Connector is therefore never closed.

That reminds me that we have to continue with this PR ;-)

For those of you testing this PR in production .. what can you say about your WAL file?

@mrpaulblack
Copy link
Member

From my point of view I also tested this for over a week in prod without any problems. RAM and CPU usage seem to be about the same as before this PR 👍

The database faviconcache.db is 150MB on my instance. Seems ok IMO. The faviconcache.db-wal file is 4.2MB on my instance.

@vojkovic
Copy link
Member Author

vojkovic commented Oct 4, 2024

Here's the statistics from each of my nodes from the last two months of use:

Singapore
526.7M /tmp/faviconcache.db
152.8M /tmp/faviconcache.db-wal

Frankfurt
762.8M /tmp/faviconcache.db
231.2M /tmp/faviconcache.db-wal

London
425.8M /tmp/faviconcache.db
119.5M /tmp/faviconcache.db-wal

Chicago
670.5M /tmp/faviconcache.db
208.3M /tmp/faviconcache.db-wal

Sydney
172.9M /tmp/faviconcache.db
50.8M /tmp/faviconcache.db-wal

Bangalore
87.1M /tmp/faviconcache.db
19.8M /tmp/faviconcache.db-wal

San Francisco
255.1M /tmp/faviconcache.db
81.3M /tmp/faviconcache.db-wal

I'm pretty happy with how this looks, obviously the more that gets cached, the less favicons that will need to be requested in the future. LGTM 🚀

@return42
Copy link
Member

return42 commented Oct 5, 2024

Many thanks for the cooperation on the favicons, thanks to @vojkovic SearXNG is once again a bit more up-to-date and user-friendly ... everything seems to be fine so far ... lets merge 🚀

@return42 return42 merged commit f00fa76 into searxng:master Oct 5, 2024
8 checks passed
return42 added a commit that referenced this pull request Oct 5, 2024
Comes from a sughgestion in:

- #3727 (comment)

Suggested-by: Bnyro <bnyro@tutanota.com>
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
@Nocifer
Copy link

Nocifer commented Oct 5, 2024

A big, big THANK YOU to everyone involved in this PR!

@Aetherinox
Copy link

Aetherinox commented Oct 5, 2024

Thanks everyone for working on this. Can't wait to try the latest out. But first I need to compare the CSS changes. I re-built the entire Searxng theme from the ground up, and I've stuck myself in a situation where I must manually do everything. Joy....

@Aetherinox
Copy link

Aetherinox commented Oct 6, 2024

@return42 Just got to say.

There's a crap-ton of settings in there.

In the following we list the resolvers available in the core of SearXNG, but via the FQN it is also possible to implement your own resolvers and integrate them into the proxy:

You busted your tail on this favicon functionality, and it's about everything a person could dream of. The amount of detail in here is amazing.

Thanks again. I'm going to go read the docs and see how I can implement the custom favicon resolver. This'll be the first big thing I've done with the app.

@omentic
Copy link
Contributor

omentic commented Oct 8, 2024

Hey, this is very cool work, thank you! Excited to pull it down and test it out :-)

@return42
Copy link
Member

Dear contributors to the favicon PR, can you please give

a try and give me your feedback? /thanks 👍

dalf added a commit to dalf/searxng that referenced this pull request Jan 4, 2025
dalf added a commit to dalf/searxng that referenced this pull request Jan 4, 2025
return42 pushed a commit that referenced this pull request Jan 5, 2025
p3psi-boo pushed a commit to p3psi-boo/searxng that referenced this pull request Feb 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Theme] Display the favicons of the result links.

9 participants