这是indexloc提供的服务,不要输入任何密码
Skip to content

Conversation

@return42
Copy link
Member

This patch implements lazy loading of the JSON data.

Motivation: in most requests not all JSON data is needed, but loaded. By example these four JSON files:

  • currencies.json ~550KB
  • engine_descriptions.json ~1,3MB
  • external_bangs.json ~1,3MB
  • osm_keys_tags.json ~ 2,2MB

most often not used and consume a lot of memory and BTW they also extend the time required to instantiate a walker.

This patch implements lazy loading of the JSON data.

Motivation: in most requests not all JSON data is needed, but loaded.  By
example these four JSON files:

- currencies.json ~550KB
- engine_descriptions.json ~1,3MB
- external_bangs.json ~1,3MB
- osm_keys_tags.json ~ 2,2MB

most often not used and consume a lot of memory and BTW they also extend the
time required to instantiate a walker.

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
@return42 return42 marked this pull request as ready for review April 30, 2024 03:55
Copy link
Member

@Bnyro Bnyro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Works as expected from testing 👍

@dalf
Copy link
Contributor

dalf commented May 4, 2024

There are two points here:

  • speed up the start of app
  • lower the memory footprint

The speed up is clear.

The memory footprint is different: an long running instance will have the same memory footprint as now. It requires one request on:

  • the currency engine
  • one on OSM engine
  • one on ddg definitions info or unit conversion
  • on engines tab of the preferences pages (okay, the mouse has to be over an engine name)

==> it won't reduce the memory footprint of darmarit.org/searx/ , paulgo.io, searx.be for example (according the stats)

IMO, the solution is sqlite: #2633

dalf added a commit to dalf/searxng that referenced this pull request May 4, 2024
To reduce memory usage, use a SQLite database to store the engine descriptions.
A dump of the database is stored in Git to facilitate maintenance,
especially the pull requests made automatically every month.

Related to
* searxng#2633
* searxng#3443
dalf added a commit to dalf/searxng that referenced this pull request May 4, 2024
To reduce memory usage, use a SQLite database to store the engine descriptions.
A dump of the database is stored in Git to facilitate maintenance,
especially the pull requests made automatically every month.

Related to
* searxng#2633
* searxng#3443
dalf added a commit that referenced this pull request May 4, 2024
To reduce memory usage, use a SQLite database to store the engine descriptions.
A dump of the database is stored in Git to facilitate maintenance,
especially the pull requests made automatically every month.

Related to
* #2633
* #3443
dalf added a commit to dalf/searxng that referenced this pull request May 4, 2024
To reduce memory usage, use a SQLite database to store the engine descriptions.
A dump of the database is stored in Git to facilitate maintenance,
especially the pull requests made automatically every month.

Related to
* searxng#2633
* searxng#3443
dalf added a commit to dalf/searxng that referenced this pull request May 4, 2024
To reduce memory usage, use a SQLite database to store the engine descriptions.
A dump of the database is stored in Git to facilitate maintenance,
especially the pull requests made automatically every month.

Related to
* searxng#2633
* searxng#3443
@return42
Copy link
Member Author

return42 commented May 4, 2024

The memory footprint is different: an long running instance

Yes, this is known .. I assumed uWSGI worker processes do not live long .. but TBH I don't really know how many request processed before a new process is spawned --> max-requests

@dalf
Copy link
Contributor

dalf commented May 4, 2024

As far I understand, this require a change of the uwsgi configuration in all instances to make this PR relevant.

@return42
Copy link
Member Author

return42 commented May 4, 2024

As far I understand, this require a change of the uwsgi configuration in all instances to make this PR relevant.

I would have expected the processes to be restarted regularly ... I would be surprised if the processes should live indefinitely and process millions of requests (memory leaks?) ... but I can't find any reasonable documentation either ... we could customize our wsgi config and add max-config next to lazy-apps

lazy-apps = true

@dalf
Copy link
Contributor

dalf commented May 4, 2024

Same I can't find for sure the default of max_requests.
Even the source code is not clear for me:
https://github.com/unbit/uwsgi/blob/353b7dd19c9af762f3874ed46a604766e1d7c6d5/core/uwsgi.c#L279

When Paul has tested some PR on his instance, we could clearly see a memory leak over a week: the memory never dropped to the initial value. Same for my instance using Docker.

we could customize our wsgi config

The docker image can be updated, but what about all the instances using the installation script or something else (like the Arch package) ?

return42 added a commit to return42/searxng that referenced this pull request May 5, 2024
As stated in .. and other posts, the defaults of uWSGI not suitable for a
productive environment.  To give just one example, the workers run indefinitely
and the memory leaks aggregate.

- "Configuring uWSGI for Production: The defaults are all wrong" EuroPython 2019 [1]
- "Configuring uWSGI for Production Deployment" [2]
- "When Paul has tested some PR on his instance, we could clearly see a memory
  leak over a week: the memory never dropped to the initial value. Same for my
  instance using Docker." [3]

[1] https://av.tib.eu/media/44810
[2] https://www.bloomberg.com/company/stories/configuring-uwsgi-production-deployment/
[3] searxng#3443 (comment)

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
@return42
Copy link
Member Author

return42 commented May 5, 2024

When Paul has tested some PR on his instance, we could clearly see a memory leak over a week: the memory never dropped to the initial value. Same for my instance using Docker.

Wow, never thought the defaults of uWSGI are not suitable for a productive environment / we can discuss here in this PR:

Since you have now sent PR

we should focus on the SQL solution .. I change this PR to DRAFT.


The docker image can be updated, but what about all the instances using the installation script or something else (like the Arch package) ?

Not related to this PR but in general we should not weight deployment questions over improvements of SearXNG core.

@return42 return42 marked this pull request as draft May 5, 2024 08:13
@return42
Copy link
Member Author

return42 commented May 9, 2024

Superseded by #3458

@return42 return42 closed this May 9, 2024
dalf added a commit to dalf/searxng that referenced this pull request May 9, 2024
To reduce memory usage, use a SQLite database to store the engine descriptions.
A dump of the database is stored in Git to facilitate maintenance,
especially the pull requests made automatically every month.

Related to
* searxng#2633
* searxng#3443
@return42 return42 reopened this May 11, 2024
dalf added a commit to dalf/searxng that referenced this pull request May 18, 2024
To reduce memory usage, use a SQLite database to store the engine descriptions.
A dump of the database is stored in Git to facilitate maintenance,
especially the pull requests made automatically every month.

Related to
* searxng#2633
* searxng#3443
dalf added a commit to dalf/searxng that referenced this pull request May 18, 2024
To reduce memory usage, use a SQLite database to store the engine descriptions.
A dump of the database is stored in Git to facilitate maintenance,
especially the pull requests made automatically every month.

Related to
* searxng#2633
* searxng#3443
dalf added a commit to dalf/searxng that referenced this pull request May 18, 2024
To reduce memory usage, use a SQLite database to store the engine descriptions.
A dump of the database is stored in Git to facilitate maintenance,
especially the pull requests made automatically every month.

Related to
* searxng#2633
* searxng#3443
return42 added a commit to return42/searxng that referenced this pull request May 28, 2024
As stated in .. and other posts, the defaults of uWSGI not suitable for a
productive environment.  To give just one example, the workers run indefinitely
and the memory leaks aggregate.

- "Configuring uWSGI for Production: The defaults are all wrong" EuroPython 2019 [1]
- "Configuring uWSGI for Production Deployment" [2]
- "When Paul has tested some PR on his instance, we could clearly see a memory
  leak over a week: the memory never dropped to the initial value. Same for my
  instance using Docker." [3]

[1] https://av.tib.eu/media/44810
[2] https://www.bloomberg.com/company/stories/configuring-uwsgi-production-deployment/
[3] searxng#3443 (comment)

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
return42 added a commit to return42/searxng that referenced this pull request Jun 23, 2024
As stated in .. and other posts, the defaults of uWSGI not suitable for a
productive environment.  To give just one example, the workers run indefinitely
and the memory leaks aggregate.

- "Configuring uWSGI for Production: The defaults are all wrong" EuroPython 2019 [1]
- "Configuring uWSGI for Production Deployment" [2]
- "When Paul has tested some PR on his instance, we could clearly see a memory
  leak over a week: the memory never dropped to the initial value. Same for my
  instance using Docker." [3]

[1] https://av.tib.eu/media/44810
[2] https://www.bloomberg.com/company/stories/configuring-uwsgi-production-deployment/
[3] searxng#3443 (comment)

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
@dalf
Copy link
Contributor

dalf commented May 22, 2025

Superseded by #3458

Doppelgänger of this PR merged: #4834

@dalf dalf closed this May 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants