-
Notifications
You must be signed in to change notification settings - Fork 2.3k
data: currencies, engine descriptions and osm_keys_tags: use SQLite instead of JSON #3458
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
8557d79 to
42e1d92
Compare
cab91d5 to
a1a9156
Compare
* integration testing of searxng/searxng#3458 -> this switch is only temporary
return42
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dalf can we merge this PR or are you waiting for more test results?
f5da9b4 to
bf959dd
Compare
To reduce memory usage, use a SQLite database to store the engine descriptions. A dump of the database is stored in Git to facilitate maintenance, especially the pull requests made automatically every month. Related to * searxng#2633 * searxng#3443
|
After some test on Paul's instance, the memory increases to nearly its original value after few days. I have updated the code:
@mrpaulblack can you try the last update? |
|
See #4650 |
|
I still think this PR could improve the memory footprint, adapt in way or another See the last message on #1892 |
In the previous implementation, all databases were loaded into memory when
importing the searx.data package, regardless of whether they were ever needed.
Regardless of this, it is an antipattern to load entire databases into memory
when importing a package or module; databases should be loaded when needed.
Lazy loading is a first step toward improving memory usage and also improves
performance when setting up the runtime environment. Building on this,
subsequent PRs will be able to further optimize memory behavior, e.g., by using
a real database application such as the one already available via
searx.cache.ExpireCache
Related:
- searxng#1892
- searxng#3458
- searxng#4650
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
In the previous implementation, all databases were loaded into memory when
importing the searx.data package, regardless of whether they were ever needed.
Regardless of this, it is an antipattern to load entire databases into memory
when importing a package or module; databases should be loaded when needed.
Lazy loading is a first step toward improving memory usage and also improves
performance when setting up the runtime environment. Building on this,
subsequent PRs will be able to further optimize memory behavior, e.g., by using
a real database application such as the one already available via
searx.cache.ExpireCache
Related:
- #1892
- #3458
- #4650
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
To reduce the memory footprint, this patch no longer loads the JSON data completely into memory. Instead, there is an SQL database based on `ExpireCacheSQLite`. The class CurrenciesDB is a simple DB application that encapsulates the DB (queries and initialization) and provides convenient methods like `name_to_iso4217` and `iso4217_to_name`. Related: - searxng#1892 - searxng#3458 (comment) - searxng#4650 Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
To reduce the memory footprint, this patch no longer loads the JSON data completely into memory. Instead, there is an SQL database based on `ExpireCacheSQLite`. The class CurrenciesDB is a simple DB application that encapsulates the DB (queries and initialization) and provides convenient methods like `name_to_iso4217` and `iso4217_to_name`. Related: - searxng#1892 - searxng#3458 (comment) - searxng#4650 Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
To reduce the memory footprint, this patch no longer loads the JSON data completely into memory. Instead, there is an SQL database based on `ExpireCacheSQLite`. The class CurrenciesDB is a simple DB application that encapsulates the DB (queries and initialization) and provides convenient methods like `name_to_iso4217` and `iso4217_to_name`. Related: - searxng#1892 - searxng#3458 (comment) - searxng#4650 Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
To reduce the memory footprint, this patch no longer loads the JSON data completely into memory. Instead, there is an SQL database based on `ExpireCacheSQLite`. The class CurrenciesDB is a simple DB application that encapsulates the DB (queries and initialization) and provides convenient methods like `name_to_iso4217` and `iso4217_to_name`. Related: - searxng#1892 - searxng#3458 (comment) - searxng#4650 Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
To reduce the memory footprint, this patch no longer loads the JSON data completely into memory. Instead, there is an SQL database based on `ExpireCacheSQLite`. The class CurrenciesDB is a simple DB application that encapsulates the DB (queries and initialization) and provides convenient methods like `name_to_iso4217` and `iso4217_to_name`. Related: - #1892 - #3458 (comment) - #4650 Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
…4834) In the previous implementation, all databases were loaded into memory when importing the searx.data package, regardless of whether they were ever needed. Regardless of this, it is an antipattern to load entire databases into memory when importing a package or module; databases should be loaded when needed. Lazy loading is a first step toward improving memory usage and also improves performance when setting up the runtime environment. Building on this, subsequent PRs will be able to further optimize memory behavior, e.g., by using a real database application such as the one already available via searx.cache.ExpireCache Related: - searxng#1892 - searxng#3458 - searxng#4650 Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
To reduce the memory footprint, this patch no longer loads the JSON data completely into memory. Instead, there is an SQL database based on `ExpireCacheSQLite`. The class CurrenciesDB is a simple DB application that encapsulates the DB (queries and initialization) and provides convenient methods like `name_to_iso4217` and `iso4217_to_name`. Related: - searxng#1892 - searxng#3458 (comment) - searxng#4650 Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
What does this PR do?
To reduce memory usage, use a SQLite database to store the engine descriptions, currencies and OSM keys/tags.
Dump of the databases are stored in Git to facilitate maintenance, especially the pull requests made automatically every month.
With this PR
searx.dataprovides some functions to access the data:The function names starts with
fetchinstead ofgetto emphasis the fact the data are fetch from the databases.With these functions are part of the code or engines can access the data without weird import like in the apple map engine:
searxng/searx/engines/apple_maps.py
Line 9 in dbed8da
Why is this change important?
It spares about 20MB per worker similar to #3443, but the memory remains low even after some queries using OSM (for example).
SQLite is going to cache some pages, but as far I understand this is kernel cache:
About load time: it takes 10ms to load
useragents.json,external_urls.json,wikidata_units.json,external_bangs.json,engine_traits.jsonandlocales.jsonon my AMD 5750GE. Even ten time slower is still reasonable IMO: the HTTP requests during the initialization are way slower than that.How to test this PR locally?
Author's checklist
Related issues
Related to