这是indexloc提供的服务,不要输入任何密码
Skip to content

Conversation

@dalf
Copy link
Contributor

@dalf dalf commented May 4, 2024

What does this PR do?

To reduce memory usage, use a SQLite database to store the engine descriptions, currencies and OSM keys/tags.

Dump of the databases are stored in Git to facilitate maintenance, especially the pull requests made automatically every month.

With this PR searx.data provides some functions to access the data:

def fetch_engine_descriptions(language) -> Dict[str, List[str]]:
  """Return engine description and source for each engine name"""

def fetch_iso4217_from_user(name: str) -> Optional[str]:
  """Currency: get the ISO4217 name from the user input"""

def fetch_name_from_iso4217(iso4217: str, language: str) -> Optional[str]:
  """Currency: get the localized name from the ISO4217"""

def fetch_osm_key_label(key_name: str, language: str) -> Optional[str]:
  """Get the OSM key label from the key id"""

def fetch_osm_tag_label(tag_key: str, tag_value: str, language: str) -> Optional[str]:
  """Get the OSM tag label from the tag key and value"""

The function names starts with fetch instead of get to emphasis the fact the data are fetch from the databases.

With these functions are part of the code or engines can access the data without weird import like in the apple map engine:

from searx.engines.openstreetmap import get_key_label

Why is this change important?

It spares about 20MB per worker similar to #3443, but the memory remains low even after some queries using OSM (for example).

SQLite is going to cache some pages, but as far I understand this is kernel cache:

  • it is shared between the processes
  • the kernel discards the cache entries when the memory is low

About load time: it takes 10ms to load useragents.json, external_urls.json, wikidata_units.json, external_bangs.json, engine_traits.json and locales.json on my AMD 5750GE. Even ten time slower is still reasonable IMO: the HTTP requests during the initialization are way slower than that.

How to test this PR locally?

Author's checklist

Related issues

Related to

@dalf dalf force-pushed the data_use_sqlite branch from 54aad94 to 8e37976 Compare May 4, 2024 09:12
@dalf dalf requested a review from return42 May 4, 2024 09:22
@dalf dalf force-pushed the data_use_sqlite branch from 8e37976 to 3982a26 Compare May 4, 2024 11:00
@dalf dalf changed the title data: engine descriptions: use SQLite instead of JSON data: currencies and engine descriptions: use SQLite instead of JSON May 4, 2024
@dalf dalf force-pushed the data_use_sqlite branch 2 times, most recently from 8557d79 to 42e1d92 Compare May 4, 2024 11:08
@dalf dalf changed the title data: currencies and engine descriptions: use SQLite instead of JSON data: currencies, engine descriptions and osm_keys_tags: use SQLite instead of JSON May 4, 2024
@dalf dalf force-pushed the data_use_sqlite branch 2 times, most recently from cab91d5 to a1a9156 Compare May 4, 2024 15:43
mrpaulblack added a commit to paulgoio/searxng that referenced this pull request May 6, 2024
Copy link
Member

@return42 return42 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dalf can we merge this PR or are you waiting for more test results?

@dalf dalf force-pushed the data_use_sqlite branch from a1a9156 to 890bc16 Compare May 9, 2024 16:10
@dalf dalf force-pushed the data_use_sqlite branch 2 times, most recently from f5da9b4 to bf959dd Compare May 18, 2024 20:34
dalf added 3 commits May 18, 2024 20:50
To reduce memory usage, use a SQLite database to store the engine descriptions.
A dump of the database is stored in Git to facilitate maintenance,
especially the pull requests made automatically every month.

Related to
* searxng#2633
* searxng#3443
@dalf dalf force-pushed the data_use_sqlite branch from bf959dd to c0a96d8 Compare May 18, 2024 20:50
@dalf
Copy link
Contributor Author

dalf commented May 18, 2024

After some test on Paul's instance, the memory increases to nearly its original value after few days.


I have updated the code:

  • It is rebased on the last master branch
  • SQL connections are shared between the threads (it seems Python documentation is misleading: the connections can be shared, a safety belt can be added)
  • the cache is reduced to 512KB instead of 2MB
  • sql_connection is generator, so if the connection has to be closed, it can be done in one place (idea from @return42 )

@mrpaulblack can you try the last update?

@dalf
Copy link
Contributor Author

dalf commented May 2, 2025

See #4650

@dalf dalf closed this May 2, 2025
@dalf
Copy link
Contributor Author

dalf commented May 22, 2025

I still think this PR could improve the memory footprint, adapt in way or another

See the last message on #1892

return42 added a commit to return42/searxng that referenced this pull request May 22, 2025
In the previous implementation, all databases were loaded into memory when
importing the searx.data package, regardless of whether they were ever needed.

Regardless of this, it is an antipattern to load entire databases into memory
when importing a package or module; databases should be loaded when needed.

Lazy loading is a first step toward improving memory usage and also improves
performance when setting up the runtime environment.  Building on this,
subsequent PRs will be able to further optimize memory behavior, e.g., by using
a real database application such as the one already available via

    searx.cache.ExpireCache

Related:

- searxng#1892
- searxng#3458
- searxng#4650

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
return42 added a commit that referenced this pull request May 22, 2025
In the previous implementation, all databases were loaded into memory when
importing the searx.data package, regardless of whether they were ever needed.

Regardless of this, it is an antipattern to load entire databases into memory
when importing a package or module; databases should be loaded when needed.

Lazy loading is a first step toward improving memory usage and also improves
performance when setting up the runtime environment.  Building on this,
subsequent PRs will be able to further optimize memory behavior, e.g., by using
a real database application such as the one already available via

    searx.cache.ExpireCache

Related:

- #1892
- #3458
- #4650

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
return42 added a commit to return42/searxng that referenced this pull request May 22, 2025
To reduce the memory footprint, this patch no longer loads the JSON data
completely into memory.  Instead, there is an SQL database based on
`ExpireCacheSQLite`.

The class CurrenciesDB is a simple DB application that encapsulates the
DB (queries and initialization) and provides convenient methods like
`name_to_iso4217` and `iso4217_to_name`.

Related:

- searxng#1892
- searxng#3458 (comment)
- searxng#4650

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
return42 added a commit to return42/searxng that referenced this pull request May 22, 2025
To reduce the memory footprint, this patch no longer loads the JSON data
completely into memory.  Instead, there is an SQL database based on
`ExpireCacheSQLite`.

The class CurrenciesDB is a simple DB application that encapsulates the
DB (queries and initialization) and provides convenient methods like
`name_to_iso4217` and `iso4217_to_name`.

Related:

- searxng#1892
- searxng#3458 (comment)
- searxng#4650

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
return42 added a commit to return42/searxng that referenced this pull request May 24, 2025
To reduce the memory footprint, this patch no longer loads the JSON data
completely into memory.  Instead, there is an SQL database based on
`ExpireCacheSQLite`.

The class CurrenciesDB is a simple DB application that encapsulates the
DB (queries and initialization) and provides convenient methods like
`name_to_iso4217` and `iso4217_to_name`.

Related:

- searxng#1892
- searxng#3458 (comment)
- searxng#4650

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
return42 added a commit to return42/searxng that referenced this pull request May 24, 2025
To reduce the memory footprint, this patch no longer loads the JSON data
completely into memory.  Instead, there is an SQL database based on
`ExpireCacheSQLite`.

The class CurrenciesDB is a simple DB application that encapsulates the
DB (queries and initialization) and provides convenient methods like
`name_to_iso4217` and `iso4217_to_name`.

Related:

- searxng#1892
- searxng#3458 (comment)
- searxng#4650

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
return42 added a commit that referenced this pull request May 25, 2025
To reduce the memory footprint, this patch no longer loads the JSON data
completely into memory.  Instead, there is an SQL database based on
`ExpireCacheSQLite`.

The class CurrenciesDB is a simple DB application that encapsulates the
DB (queries and initialization) and provides convenient methods like
`name_to_iso4217` and `iso4217_to_name`.

Related:

- #1892
- #3458 (comment)
- #4650

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
Bnyro pushed a commit to Bnyro/searxng that referenced this pull request Jun 25, 2025
…4834)

In the previous implementation, all databases were loaded into memory when
importing the searx.data package, regardless of whether they were ever needed.

Regardless of this, it is an antipattern to load entire databases into memory
when importing a package or module; databases should be loaded when needed.

Lazy loading is a first step toward improving memory usage and also improves
performance when setting up the runtime environment.  Building on this,
subsequent PRs will be able to further optimize memory behavior, e.g., by using
a real database application such as the one already available via

    searx.cache.ExpireCache

Related:

- searxng#1892
- searxng#3458
- searxng#4650

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
Bnyro pushed a commit to Bnyro/searxng that referenced this pull request Jun 25, 2025
To reduce the memory footprint, this patch no longer loads the JSON data
completely into memory.  Instead, there is an SQL database based on
`ExpireCacheSQLite`.

The class CurrenciesDB is a simple DB application that encapsulates the
DB (queries and initialization) and provides convenient methods like
`name_to_iso4217` and `iso4217_to_name`.

Related:

- searxng#1892
- searxng#3458 (comment)
- searxng#4650

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants