-
Notifications
You must be signed in to change notification settings - Fork 2.3k
[refactor] typification of SearXNG (initial) / result items (part 1) #4183
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[refactor] typification of SearXNG (initial) / result items (part 1) #4183
Conversation
bc1cf66 to
e4e7d73
Compare
|
(Side note: This would probably allow us to re-visit #3256 as well, after this PR, to get rid of some hardcoded HTML in result weather engine result templates) |
e4e7d73 to
8cddc4e
Compare
|
I pushed this PR on my instance where you test it online: https://darmarit.org/searx Since this PR does no functional search, everything should be as usual. Except the translation results, which now have collapsible items and links to the origin translators: The translation results (inspired by #3925) has been included in this PR to demonstrate what we can do with typification / how it works .. |
Bnyro
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've done the steps you described and some other tests and didn't find any bugs, so it seems good from that perspective.
I've been going through the code and trying to understand the typification process, but I'm a bit worried about the current syntax for returning results. Let's take the following base example
results = []
...
Translations(results=results, translations=[foo], url=foo_url)
return results
I have to admit I don't 100% understand how this works from looking at the code:
- we're creating a
Translationsclass here, but its result is actually unused - instead it does some magic and change the
resultslist - where is the
urlfield we set here is defined? I couldn't find its definition anywhere in the classesTranslationsinherits from - Since this modifies the original
resultswithout returning anything, wouldn't that be more suitable for a function? - I'd personally love something like
return Translations(translations=[item], url=foo_url)and avoid having to create an empty results array first
Apart from that confusion, that PR looks really awesome, the split of the documentation into different parts for answers, infobox, suggestions, ... is a nice step and I really like the translation answerer appearance change as well as the ability to set custom answerer templates, which is much better than my previous attempt 👍
So if you could please clarify the above point a bit, I'm sure that's good to merge. I'm confident that this is close to bug-free, although we can't tell although it's in production everywhere. Waiting any longer will probably cause more and more conflicts, so we should get this merged as soon as possible imo.
It's an important step in the right direction, thanks for all your work on that Markus!
The class class Result(msgspec.Struct, ..):
"""Base class of all result types :ref:`result types`."""
url: str | None = None
"""A link related to this *result*"""
results: list = [] # in the future it will be of type EngineResults (read below)
"""Result list of an :origin:`engine <searx/engines>` .. to which the reulst item should be added."""
class BaseAnswer(Result, ..):
...
class Translations(BaseAnswer, ..):
...
Yeah that looks a bit strange, the alternative notation would be: results = []
...
x = Translations(results=results, translations=[foo], url=foo_url)
results.append(x)
return resultsbut we are still at the beginning ... that's why it looks a bit strange at the moment ..
In the future we will have a class for the result list, which will then offer a factory method for the typed results ... haven't thougth to its end but let assume we have a class like this: from searx.result_types import Result, Answer, Translation, ...
class EngineResults(list):
Answer = Answer
Translation = Translation
... = ...
def new_result(self, res: Result):
res.results = self
self.append(res)And in the engine we have something like .. from searx.xyz import EngineResults
def response(resp) -> EngineResults:
results = EngineResults()
results.new_result(result.Translations(translations=[foo], url=foo_url))
return resultsThe above is just a 2min protototype to demonstrate wehre I want to go: (in the future) the engine developer:
Hope, the above helps .. but I'm open for suggestions .. but for now I would like to have a optional |
|
Thanks for the clarification, that explains how this worked.
To be honest I do very much prefer this syntax to make the code more self-descriptive and just make it more clear to everyone that's working on implementing an engine. It's very much more clean this way as calling This probably shouldn't be that much too change, as you said, it's already possible and only documentation and the engines using it would need to be modified (let me know if you want me to handle this if you don't have time for it).
That'd indeed be a nice syntax. This is also much closer to the way described above using an array of results, and thus seems very intuitive for engine developers. |
I see that you actually documented this as well, I just missed that part because I didn't figure out that |
Typification of SearXNG
=======================
This patch introduces the typing of the results. The why and how is described
in the documentation, please generate the documentation ..
$ make docs.clean docs.live
and read the following articles in the "Developer documentation":
- result types --> http://0.0.0.0:8000/dev/result_types/index.html
The result types are available from the `searx.result_types` module. The
following have been implemented so far:
- base result type: `searx.result_type.Result`
--> http://0.0.0.0:8000/dev/result_types/base_result.html
- answer results
--> http://0.0.0.0:8000/dev/result_types/answer.html
including the type for translations (inspired by searxng#3925). For all other
types (which still need to be set up in subsequent PRs), template documentation
has been created for the transition period.
Doc of the fields used in Templates
===================================
The template documentation is the basis for the typing and is the first complete
documentation of the results (needed for engine development). It is the
"working paper" (the plan) with which further typifications can be implemented
in subsequent PRs.
- searxng#357
Answer Templates
================
With the new (sub) types for `Answer`, the templates for the answers have also
been revised, `Translation` are now displayed with collapsible entries (inspired
by searxng#3925).
!en-de dog
Plugins & Answerer
==================
The implementation for `Plugin` and `Answer` has been revised, see
documentation:
- Plugin: http://0.0.0.0:8000/dev/plugins/index.html
- Answerer: http://0.0.0.0:8000/dev/answerers/index.html
With `AnswerStorage` and `AnswerStorage` to manage those items (in follow up
PRs, `ArticleStorage`, `InfoStorage` and .. will be implemented)
Autocomplete
============
The autocompletion had a bug where the results from `Answer` had not been shown
in the past. To test activate autocompletion and try search terms for which we
have answerers
- statistics: type `min 1 2 3` .. in the completion list you should find an
entry like `[de] min(1, 2, 3) = 1`
- random: type `random uuid` .. in the completion list, the first item is a
random UUID
Extended Types
==============
SearXNG extends e.g. the request and response types of flask and httpx, a module
has been set up for type extensions:
- Extended Types
--> http://0.0.0.0:8000/dev/extended_types.html
Unit-Tests
==========
The unit tests have been completely revised. In the previous implementation,
the runtime (the global variables such as `searx.settings`) was not initialized
before each test, so the runtime environment with which a test ran was always
determined by the tests that ran before it. This was also the reason why we
sometimes had to observe non-deterministic errors in the tests in the past:
- searxng#2988 is one example for the Runtime
issues, with non-deterministic behavior ..
- searxng#3650
- searxng#3654
- searxng#3642 (comment)
- searxng#3746 (comment)
Why msgspec.Struct
==================
We have already discussed typing based on e.g. `TypeDict` or `dataclass` in the past:
- https://github.com/searxng/searxng/pull/1562/files
- https://gist.github.com/dalf/972eb05e7a9bee161487132a7de244d2
- https://github.com/searxng/searxng/pull/1412/files
- searxng#1356
In my opinion, TypeDict is unsuitable because the objects are still dictionaries
and not instances of classes / the `dataclass` are classes but ...
The `msgspec.Struct` combine the advantages of typing, runtime behaviour and
also offer the option of (fast) serializing (incl. type check) the objects.
Currently not possible but conceivable with `msgspec`: Outsourcing the engines
into separate processes, what possibilities this opens up in the future is left
to the imagination!
Internally, we have already defined that it is desirable to decouple the
development of the engines from the development of the SearXNG core / The
serialization of the `Result` objects is a prerequisite for this.
HINT: The threads listed above were the template for this PR, even though the
implementation here is based on msgspec. They should also be an inspiration for
the following PRs of typification, as the models and implementations can provide
a good direction.
Why just one commit?
====================
I tried to create several (thematically separated) commits, but gave up at some
point ... there are too many things to tackle at once / The comprehensibility of
the commits would not be improved by a thematic separation. On the contrary, we
would have to make multiple changes at the same places and the goal of a change
would be vaguely recognizable in the fog of the commits.
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
8cddc4e to
455d258
Compare
In [1] and [2] we discussed the need of a Result.results property and how we can
avoid unclear code. This patch implements a class for the reslut-lists of
engines::
searx.result_types.EngineResults
A simple example for the usage in engine development::
from searx.result_types import EngineResults
...
def response(resp) -> EngineResults:
res = EngineResults()
...
res.add( res.types.Answer(answer="lorem ipsum ..", url="https://example.org") )
...
return res
[1] searxng#4183 (review)
[2] searxng#4183 (comment)
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
@Bnyro I pushed a new commit on top .. in this patch class from searx.result_types import EngineResults
...
def response(resp) -> EngineResults:
res = EngineResults()
...
res.add( res.types.Answer(answer="lorem ipsum ..", url="https://example.org") )
...
return resBTW the property |
Awesome, thanks for your work! |
Before searxng#4183 a builtin plugin was *defautlt_on* when it is listed in the "enabled_plugins" settings, this patch restores the previous behavior. Not part of this patch but just to mentioning in context of searxng#4263: In the long term, we will abolish the "enabled_plugins:" setting and combine all options for the plugins in the "plugins:" setting, as is already planned in the PR searxng#4282 Closes: searxng#4263 Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
Before searxng#4183 a builtin plugin was *defautlt_on* when it is listed in the "enabled_plugins" settings, this patch restores the previous behavior. Not part of this patch but just to mentioning in context of searxng#4263: In the long term, we will abolish the "enabled_plugins:" setting and combine all options for the plugins in the "plugins:" setting, as is already planned in the PR searxng#4282 Closes: searxng#4263 Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
Before #4183 a builtin plugin was *defautlt_on* when it is listed in the "enabled_plugins" settings, this patch restores the previous behavior. Not part of this patch but just to mentioning in context of #4263: In the long term, we will abolish the "enabled_plugins:" setting and combine all options for the plugins in the "plugins:" setting, as is already planned in the PR #4282 Closes: #4263 Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
Typification of SearXNG
This patch introduces the typing of the results. The why and how is described
in the documentation, please generate the documentation ..
and read the following articles in the "Developer documentation":
The result types are available from the
searx.result_typesmodule. Thefollowing have been implemented so far:
base result type:
searx.result_type.Result--> http://0.0.0.0:8000/dev/result_types/base_result.html
answer results
--> http://0.0.0.0:8000/dev/result_types/answer.html
including the type for translations (inspired by #3925). For all other
types (which still need to be set up in subsequent PRs), template documentation
has been created for the transition period.
Doc of the fields used in Templates
The template documentation is the basis for the typing and is the first complete
documentation of the results (needed for engine development). It is the
"working paper" (the plan) with which further typifications can be implemented
in subsequent PRs.
Answer Templates
With the new (sub) types for
Answer, the templates for the answers have alsobeen revised,
Translationare now displayed with collapsible entries (inspiredby #3925).
Plugins & Answerers
The implementation for
PluginandAnswerhas been revised, seedocumentation:
With
AnswerStorageandAnswerStorageto manage those items (in follow upPRs,
ArticleStorage,InfoStorageand .. will be implemented)Autocomplete
The autocompletion had a bug where the results from
Answerhad not been shownin the past. To test activate autocompletion and try search terms for which we
have answerers
statistics: type
min 1 2 3.. in the completion list you should find anentry like
[de] min(1, 2, 3) = 1random: type
random uuid.. in the completion list, the first item is arandom UUID
Extended Types
SearXNG extends e.g. the request and response types of flask and httpx, a module
has been set up for type extensions:
--> http://0.0.0.0:8000/dev/extended_types.html
Unit-Tests
The unit tests have been completely revised. In the previous implementation,
the runtime (the global variables such as
searx.settings) was not initializedbefore each test, so the runtime environment with which a test ran was always
determined by the tests that ran before it. This was also the reason why we
sometimes had to observe non-deterministic errors in the tests in the past:
make test.unitfails withKeyError: ('engine', 'dummy engine', 'search', 'count', 'sent')#2988 is one example for the Runtimeissues, with non-deterministic behavior ..
[l10n] update translations from Weblate #3650
[fix] tear down TEST_ENGINES after TestBang is proceeded #3654
[mod] remove py 3.6 leftovers #3642 (comment)
Fix tineye engine url, datetime parsing, and minor refactor #3746 (comment)
Why msgspec.Struct
We have already discussed typing based on e.g.
TypeDictordataclassin the past:In my opinion, TypeDict is unsuitable because the objects are still dictionaries
and not instances of classes / the
dataclassare classes but ...The
msgspec.Structcombine the advantages of typing, runtime behaviour andalso offer the option of (fast) serializing (incl. type check) the objects.
Currently not possible but conceivable with
msgspec: Outsourcing the enginesinto separate processes, what possibilities this opens up in the future is left
to the imagination!
Internally, we have already defined that it is desirable to decouple the
development of the engines from the development of the SearXNG core / The
serialization of the
Resultobjects is a prerequisite for this.HINT: The threads listed above were the template for this PR, even though the
implementation here is based on msgspec. They should also be an inspiration for
the following PRs of typification, as the models and implementations can provide
a good direction.
Why just one commit?
I tried to create several (thematically separated) commits, but gave up at some
point ... there are too many things to tackle at once / The comprehensibility of
the commits would not be improved by a thematic separation. On the contrary, we
would have to make multiple changes at the same places and the goal of a change
would be vaguely recognizable in the fog of the commits.
Closes: