[mod] data: implement a simple currencies (SQL) database #4836

return42 · 2025-05-22T17:44:06Z

To reduce the memory footprint, this patch no longer loads the JSON data completely into memory. Instead, there is an SQL database based on ExpireCacheSQLite.

The class CurrenciesDB is a simple DB application that encapsulates the DB (queries and initialization) and provides convenient methods like name_to_iso4217 and iso4217_to_name.

dalf · 2025-05-22T18:32:52Z

If I understand correctly the workflow is:

keep the .json on the disk
when searxng starts:
- create CurrenciesDB() without any data
when the first currency info is requested:
- load the data from the .json to save them as SQLite temporary file, for all currencies --> the data are seen in memory and then garbage collected.
- read one entry from the temporary SQLite file
when further currency infos are requested:
- read one entry from the temporary SQLite file

Why not store directly the SQLite files in the git repository, and then read directly these files:

no issue with temporary files
the DB files can be opened in read only mode
?

To be able to read the automatic updates, the update scripts can write a .csv files in addition to the SQLite file.

See #3458 for example implementation. This PR 3458 missed a check in production to make sure the memory footprint does not grow over time, then it went into limbo.

return42 · 2025-05-23T04:19:04Z

If I understand correctly the workflow is:

👍 yes, this is the "current workflow".

Why not store directly the SQLite files in the git repository,

Thinkable .. why not ..

drawback: SQLite's .db file is a binary --> we no longer have a diff which I often read before merge the PR.

On the long term I had a different solution in mind ..

🤔 .. the SQLite's .db file is from a DB application which was intended to be a key/value cache with expire date .. the name of the DB app is "ExpireCache" --> from SearXNG's POV it is cache(!)

separate all data (the json files) from SearXNG core (repo searxng/searxng) into its own repository searxng/data .. include the CI (data-update.yml) that updates theses JSON files
In SearXNG core, instead of read from a file we have a read from URL e.g.:
https://raw.githubusercontent.com/searxng/data/refs/heads/master/<schema-version>/currencies.json

We need to versioning the DB schema (<schema-version>) and the workflow will be slightly different to the "current workflow": the tables in the DB will be maintained when the expire date has been reached (ExpireCache).

OT but to give an example: I implemented similar for the SVG weather icons in PR #4663 (comment) ..

there is a WEATHER_DATA_CACHE with a MAXHOLD_TIME
SVG files are loaded from YR_WEATHER_SYMBOL_URL

dalf · 2025-05-23T09:04:24Z

drawback: SQLite's .db file is a binary --> we no longer have a diff which I often read before merge the #4740.

Look at #3458 which creates .csv files in addition to the .db files.
For example currencies.csv.

separate all data (the json files) from SearXNG core (repo searxng/searxng) into its own repository searxng/data

What are the benefits of a different repository?
What are the problem(s) with the current implementation which embed the data?

The data size is 8.6M, pretty much nothing if we look at the any web page.
No need for database versioning: code and data are always synchronized.
Also no error trying to load external data: there are already here locally.

from SearXNG's POV it is cache(!)

Is there an urge to merge this PR?

return42 · 2025-05-23T10:00:12Z

Look at #3458 which creates .csv files in addition to the .db files.

Above you asked ..

Why not store directly the SQLite file

When we replace .json by .cvs .. what have we won? / except a csv diff is hard to read, compared to json file with sorted keys ..

What are the benefits of a different repository?

The JSON files are dumps of databases; in my opinion, maintaining these databases shouldn't be part of the code.

For example, take a look at the https://github.com/ClearURLs project, whose rules are also made available to the application via a database (a JSON dump https://github.com/ClearURLs/Rules ).

By separating them, the database could be maintained independently of the code.

But we don't have to separate it, if you prefer to maintain the DB (the JSON file) here in the code, then we'll leave it here (for now / we can decide it later).

No need for database versioning: code and data are always synchronized.

As long we hold the DB here .. but when we separate DB from code, we need a versioning of the DB schema.

dalf · 2025-05-23T11:24:58Z

except a csv diff is hard to read

Because this is simplest way to make a human readable SQL dump : mapping from one SQL row to one CSV line using SELECT * FROM ... ORDER BY ... so things don't jump around at each updates.

But why not .json, however without foreign keys, everything is flatten to rows.

For example, take a look at the https://github.com/ClearURLs project

ClearURL is a browser extension where the users can pick different rules from different URLs.
There is no point in SearXNG to allow different URLs, as far I know?

In addition, all SearXNG instances are going to ping one more URL when they starts.

... maintaining these databases shouldn't be part of the code.
By separating them, the database could be maintained independently of the code.

I read again the messages, and feel sorry to insist: why we need to do that?

Currently if we change the format of a data file, it's simple:

update the searxng_extra/update script
run the script to update searx/data
update the code in searx/* and tests/*
one commit
done

Everyone is going to pick the updates [1], no connectivity issue, no GH token, no CloudFlare protection, it's a simple life.
[1] one git pull or container pull, and everything is here, up to date.

Most probably I miss your point.

return42 · 2025-05-23T11:48:41Z

I probably don't understand your point.

... we can stick with the current procedure, which is the simplest solution for us developers, as you already said.

As I already said, we can consider separation later if we ever have compelling reasons to do so. There's no need to decide that now.

Because this is the simplest way to make a human-readable SQL dump:

A dump, yes, but the diff is hard to read, and a diff is what we have to read with every PR review.

Anyway, I'd like to stick with JSON files for now... we can decide later whether it might be better to switch to a CVS file for one or another data source.

however without foreign keys, everything is flattened to rows.

For most cases, that's sufficient, and the ExpireCache we currently have in our code is sufficient for that. Regarding external bangs, I need to reconsider whether we need a different DB schema than a simple key/value cache (a DB table with more than just one primary key)... but I can't judge that at this point... perhaps this data is unsuitable for a relational DB... we'll see.

dalf · 2025-05-23T16:01:06Z

perhaps this data is unsuitable for a relational DB... we'll see.

See #4093 (comment)

Can we close this PR?

return42 · 2025-05-24T10:20:20Z

perhaps this data is unsuitable for a relational DB... we'll see.

See #4093 (comment)

Can we close this PR?

No! ..why? and how is it related to a bug in the pipy engine .. I'm confused.

More PRs will follow for the other data objects. This is just one to demonstrate.

To reduce the memory footprint, this patch no longer loads the JSON data completely into memory. Instead, there is an SQL database based on `ExpireCacheSQLite`. The class CurrenciesDB is a simple DB application that encapsulates the DB (queries and initialization) and provides convenient methods like `name_to_iso4217` and `iso4217_to_name`. Related: - searxng#1892 - searxng#3458 (comment) - searxng#4650 Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>

return42 · 2025-05-25T08:40:49Z

More PRs will follow for the other data objects. This is just one to demonstrate.

I need this patch for further development, since no technical defects could be found I will now merge this PR.

Fix C&P typo from searxng#4836 Closes: searxng#4861

Fix C&P typo from #4836 Closes: #4861

To reduce the memory footprint, this patch no longer loads the JSON data completely into memory. Instead, there is an SQL database based on `ExpireCacheSQLite`. The class CurrenciesDB is a simple DB application that encapsulates the DB (queries and initialization) and provides convenient methods like `name_to_iso4217` and `iso4217_to_name`. Related: - searxng#1892 - searxng#3458 (comment) - searxng#4650 Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>

Fix C&P typo from searxng#4836 Closes: searxng#4861

return42 requested a review from dalf May 22, 2025 17:49

return42 force-pushed the data-to-sql branch from 0a07be8 to 3ba8c7e Compare May 22, 2025 17:59

return42 mentioned this pull request May 23, 2025

[feat] tracker url plugin: use ClearURL tracking param list #4678

Merged

return42 force-pushed the data-to-sql branch from 3ba8c7e to 7e34e01 Compare May 24, 2025 11:26

return42 merged commit 848c8d0 into searxng:master May 25, 2025
7 checks passed

return42 deleted the data-to-sql branch May 25, 2025 08:40

return42 added a commit to return42/searxng that referenced this pull request May 29, 2025

[fix] plugins: currency processor bug

5327d1d

Fix C&P typo from searxng#4836 Closes: searxng#4861

return42 mentioned this pull request May 29, 2025

[fix] plugins: currency processor bug #4868

Merged

return42 added a commit that referenced this pull request May 29, 2025

[fix] plugins: currency processor bug

913dfbd

Fix C&P typo from #4836 Closes: #4861

Bnyro pushed a commit to Bnyro/searxng that referenced this pull request Jun 25, 2025

[fix] plugins: currency processor bug

9925315

Fix C&P typo from searxng#4836 Closes: searxng#4861

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[mod] data: implement a simple currencies (SQL) database #4836

[mod] data: implement a simple currencies (SQL) database #4836

Uh oh!

return42 commented May 22, 2025

Uh oh!

dalf commented May 22, 2025 •

edited

Loading

Uh oh!

return42 commented May 23, 2025 •

edited

Loading

Uh oh!

dalf commented May 23, 2025 •

edited

Loading

Uh oh!

return42 commented May 23, 2025

Uh oh!

dalf commented May 23, 2025 •

edited

Loading

Uh oh!

return42 commented May 23, 2025

Uh oh!

dalf commented May 23, 2025 •

edited

Loading

Uh oh!

return42 commented May 24, 2025 •

edited

Loading

Uh oh!

return42 commented May 25, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[mod] data: implement a simple currencies (SQL) database #4836

[mod] data: implement a simple currencies (SQL) database #4836

Uh oh!

Conversation

return42 commented May 22, 2025

Uh oh!

dalf commented May 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

return42 commented May 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dalf commented May 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

return42 commented May 23, 2025

Uh oh!

dalf commented May 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

return42 commented May 23, 2025

Uh oh!

dalf commented May 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

return42 commented May 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

return42 commented May 25, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

dalf commented May 22, 2025 •

edited

Loading

return42 commented May 23, 2025 •

edited

Loading

dalf commented May 23, 2025 •

edited

Loading

dalf commented May 23, 2025 •

edited

Loading

dalf commented May 23, 2025 •

edited

Loading

return42 commented May 24, 2025 •

edited

Loading