这是indexloc提供的服务,不要输入任何密码
Skip to content

Conversation

@return42
Copy link
Member

@return42 return42 commented Jan 14, 2025

Typification of SearXNG

This patch introduces the typing of the results. The why and how is described
in the documentation, please generate the documentation ..

$ make docs.clean docs.live

and read the following articles in the "Developer documentation":

The result types are available from the searx.result_types module. The
following have been implemented so far:

including the type for translations (inspired by #3925). For all other
types (which still need to be set up in subsequent PRs), template documentation
has been created for the transition period.

Doc of the fields used in Templates

The template documentation is the basis for the typing and is the first complete
documentation of the results (needed for engine development). It is the
"working paper" (the plan) with which further typifications can be implemented
in subsequent PRs.

Answer Templates

With the new (sub) types for Answer, the templates for the answers have also
been revised, Translation are now displayed with collapsible entries (inspired
by #3925).

!en-de dog

Plugins & Answerers

The implementation for Plugin and Answer has been revised, see
documentation:

With AnswerStorage and AnswerStorage to manage those items (in follow up
PRs, ArticleStorage, InfoStorage and .. will be implemented)

Autocomplete

The autocompletion had a bug where the results from Answer had not been shown
in the past. To test activate autocompletion and try search terms for which we
have answerers

  • statistics: type min 1 2 3 .. in the completion list you should find an
    entry like [de] min(1, 2, 3) = 1

  • random: type random uuid .. in the completion list, the first item is a
    random UUID

Extended Types

SearXNG extends e.g. the request and response types of flask and httpx, a module
has been set up for type extensions:

Unit-Tests

The unit tests have been completely revised. In the previous implementation,
the runtime (the global variables such as searx.settings) was not initialized
before each test, so the runtime environment with which a test ran was always
determined by the tests that ran before it. This was also the reason why we
sometimes had to observe non-deterministic errors in the tests in the past:

Why msgspec.Struct

We have already discussed typing based on e.g. TypeDict or dataclass in the past:

In my opinion, TypeDict is unsuitable because the objects are still dictionaries
and not instances of classes / the dataclass are classes but ...

The msgspec.Struct combine the advantages of typing, runtime behaviour and
also offer the option of (fast) serializing (incl. type check) the objects.

Currently not possible but conceivable with msgspec: Outsourcing the engines
into separate processes, what possibilities this opens up in the future is left
to the imagination!

Internally, we have already defined that it is desirable to decouple the
development of the engines from the development of the SearXNG core / The
serialization of the Result objects is a prerequisite for this.

HINT: The threads listed above were the template for this PR, even though the
implementation here is based on msgspec. They should also be an inspiration for
the following PRs of typification, as the models and implementations can provide
a good direction.

Why just one commit?

I tried to create several (thematically separated) commits, but gave up at some
point ... there are too many things to tackle at once / The comprehensibility of
the commits would not be improved by a thematic separation. On the contrary, we
would have to make multiple changes at the same places and the goal of a change
would be vaguely recognizable in the fog of the commits.


Closes:

@Bnyro
Copy link
Member

Bnyro commented Jan 16, 2025

(Side note: This would probably allow us to re-visit #3256 as well, after this PR, to get rid of some hardcoded HTML in result weather engine result templates)

@return42 return42 force-pushed the translation-answer-template branch from e4e7d73 to 8cddc4e Compare January 22, 2025 14:39
@return42
Copy link
Member Author

return42 commented Jan 22, 2025

I pushed this PR on my instance where you test it online: https://darmarit.org/searx

Since this PR does no functional search, everything should be as usual.

Except the translation results, which now have collapsible items and links to the origin translators:

The translation results (inspired by #3925) has been included in this PR to demonstrate what we can do with typification / how it works ..

grafik

Copy link
Member

@Bnyro Bnyro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've done the steps you described and some other tests and didn't find any bugs, so it seems good from that perspective.

I've been going through the code and trying to understand the typification process, but I'm a bit worried about the current syntax for returning results. Let's take the following base example

results = []
...
Translations(results=results, translations=[foo], url=foo_url)
return results

I have to admit I don't 100% understand how this works from looking at the code:

  • we're creating a Translations class here, but its result is actually unused
  • instead it does some magic and change the results list
  • where is the url field we set here is defined? I couldn't find its definition anywhere in the classes Translations inherits from
  • Since this modifies the original results without returning anything, wouldn't that be more suitable for a function?
  • I'd personally love something like return Translations(translations=[item], url=foo_url) and avoid having to create an empty results array first

Apart from that confusion, that PR looks really awesome, the split of the documentation into different parts for answers, infobox, suggestions, ... is a nice step and I really like the translation answerer appearance change as well as the ability to set custom answerer templates, which is much better than my previous attempt 👍

So if you could please clarify the above point a bit, I'm sure that's good to merge. I'm confident that this is close to bug-free, although we can't tell although it's in production everywhere. Waiting any longer will probably cause more and more conflicts, so we should get this merged as soon as possible imo.
It's an important step in the right direction, thanks for all your work on that Markus!

@return42
Copy link
Member Author

Let's take the following base example

results = []
...
Translations(results=results, translations=[foo], url=foo_url)
return results
  • where is the url field we set here is defined? I couldn't find its definition anywhere in the classes Translations inherits from

The class searx.result_types.Result is the base class of all result types, here is the inheritance:

class Result(msgspec.Struct, ..):
    """Base class of all result types :ref:`result types`."""

    url: str | None = None
    """A link related to this *result*"""

    results: list = []   # in the future it will be of type EngineResults (read below)
    """Result list of an :origin:`engine <searx/engines>` .. to which the reulst item  should be added."""

class BaseAnswer(Result, ..):
   ...
class Translations(BaseAnswer, ..):
   ...

I have to admit I don't 100% understand how this works from looking at the code:

* we're creating a `Translations` class here, but its result is actually unused

* instead it does some magic and change the `results` list

Yeah that looks a bit strange, the alternative notation would be:

results = []
...
x = Translations(results=results, translations=[foo], url=foo_url)
results.append(x)
return results

but we are still at the beginning ... that's why it looks a bit strange at the moment ..

  • Since this modifies the original results without returning anything, wouldn't that be more suitable for a function?

  • I'd personally love something like return Translations(translations=[item], url=foo_url) and avoid having to create an empty results array first

In the future we will have a class for the result list, which will then offer a factory method for the typed results ... haven't thougth to its end but let assume we have a class like this:

from searx.result_types import Result, Answer, Translation, ...

class EngineResults(list):

    Answer = Answer
    Translation = Translation
    ... = ...

    def new_result(self, res: Result):
        res.results = self
        self.append(res)

And in the engine we have something like ..

from searx.xyz import EngineResults

def response(resp) -> EngineResults:
    results = EngineResults()
    results.new_result(result.Translations(translations=[foo], url=foo_url))
    return results

The above is just a 2min protototype to demonstrate wehre I want to go: (in the future) the engine developer:

  • should have a factory method,
  • should not need to import all the result types (they are already available from EngineResults.Answer, EngineResults.Translation, EngineResults.xyz).
  • should not care much about a result list and how the result items are managed in.

So if you could please clarify the above point a bit,

Hope, the above helps .. but I'm open for suggestions .. but for now I would like to have a optional Result.results field .. I believe that we will be able to benefit from this in the future; if there is context information (an association) between the individual result item and the list in which it is added.

@Bnyro
Copy link
Member

Bnyro commented Jan 26, 2025

Thanks for the clarification, that explains how this worked.

results = []
...
result = Translations(translations=[foo], url=foo_url)
results.append(result)
return results

To be honest I do very much prefer this syntax to make the code more self-descriptive and just make it more clear to everyone that's working on implementing an engine.

It's very much more clean this way as calling Translation(...) doesn't have any side effects and you can directly see what is happening to results because you modified it yourself.

This probably shouldn't be that much too change, as you said, it's already possible and only documentation and the engines using it would need to be modified (let me know if you want me to handle this if you don't have time for it).

from searx.xyz import EngineResults

def response(resp) -> EngineResults:
results = EngineResults()
results.new_result(result.Translations(translations=[foo], url=foo_url))
return results

That'd indeed be a nice syntax. This is also much closer to the way described above using an array of results, and thus seems very intuitive for engine developers.

@Bnyro
Copy link
Member

Bnyro commented Jan 26, 2025

    results: list = []  # https://jcristharif.com/msgspec/structs.html#default-values
    """Result list of an :origin:`engine <searx/engines>` response or a
    :origin:`answerer <searx/answerers>` to which the answer should be added.

    This field is only present for the sake of simplicity.  Typically, the
    response function of an engine has a result list that is returned at the
    end. By specifying the result list in the constructor of the result, this
    result is then immediately added to the list (this parameter does not have
    another function).

    .. code:: python

       def response(resp):
           results = []
           ...
           Answer(results=results, answer=answer, url=url)
           ...
           return results

    """

I see that you actually documented this as well, I just missed that part because I didn't figure out that Translations inherits from the Result base class, sorry for not seeing that earlier.

Bnyro and others added 2 commits January 27, 2025 14:30
Typification of SearXNG
=======================

This patch introduces the typing of the results.  The why and how is described
in the documentation, please generate the documentation ..

    $ make docs.clean docs.live

and read the following articles in the "Developer documentation":

- result types --> http://0.0.0.0:8000/dev/result_types/index.html

The result types are available from the `searx.result_types` module.  The
following have been implemented so far:

- base result type: `searx.result_type.Result`
  --> http://0.0.0.0:8000/dev/result_types/base_result.html

- answer results
  --> http://0.0.0.0:8000/dev/result_types/answer.html

including the type for translations (inspired by searxng#3925).  For all other
types (which still need to be set up in subsequent PRs), template documentation
has been created for the transition period.

Doc of the fields used in Templates
===================================

The template documentation is the basis for the typing and is the first complete
documentation of the results (needed for engine development).  It is the
"working paper" (the plan) with which further typifications can be implemented
in subsequent PRs.

- searxng#357

Answer Templates
================

With the new (sub) types for `Answer`, the templates for the answers have also
been revised, `Translation` are now displayed with collapsible entries (inspired
by searxng#3925).

    !en-de dog

Plugins & Answerer
==================

The implementation for `Plugin` and `Answer` has been revised, see
documentation:

- Plugin: http://0.0.0.0:8000/dev/plugins/index.html
- Answerer: http://0.0.0.0:8000/dev/answerers/index.html

With `AnswerStorage` and `AnswerStorage` to manage those items (in follow up
PRs, `ArticleStorage`, `InfoStorage` and .. will be implemented)

Autocomplete
============

The autocompletion had a bug where the results from `Answer` had not been shown
in the past.  To test activate autocompletion and try search terms for which we
have answerers

- statistics: type `min 1 2 3` .. in the completion list you should find an
  entry like `[de] min(1, 2, 3) = 1`

- random: type `random uuid` .. in the completion list, the first item is a
  random UUID

Extended Types
==============

SearXNG extends e.g. the request and response types of flask and httpx, a module
has been set up for type extensions:

- Extended Types
  --> http://0.0.0.0:8000/dev/extended_types.html

Unit-Tests
==========

The unit tests have been completely revised.  In the previous implementation,
the runtime (the global variables such as `searx.settings`) was not initialized
before each test, so the runtime environment with which a test ran was always
determined by the tests that ran before it.  This was also the reason why we
sometimes had to observe non-deterministic errors in the tests in the past:

- searxng#2988 is one example for the Runtime
  issues, with non-deterministic behavior ..

- searxng#3650
- searxng#3654
- searxng#3642 (comment)
- searxng#3746 (comment)

Why msgspec.Struct
==================

We have already discussed typing based on e.g. `TypeDict` or `dataclass` in the past:

- https://github.com/searxng/searxng/pull/1562/files
- https://gist.github.com/dalf/972eb05e7a9bee161487132a7de244d2
- https://github.com/searxng/searxng/pull/1412/files
- searxng#1356

In my opinion, TypeDict is unsuitable because the objects are still dictionaries
and not instances of classes / the `dataclass` are classes but ...

The `msgspec.Struct` combine the advantages of typing, runtime behaviour and
also offer the option of (fast) serializing (incl. type check) the objects.

Currently not possible but conceivable with `msgspec`: Outsourcing the engines
into separate processes, what possibilities this opens up in the future is left
to the imagination!

Internally, we have already defined that it is desirable to decouple the
development of the engines from the development of the SearXNG core / The
serialization of the `Result` objects is a prerequisite for this.

HINT: The threads listed above were the template for this PR, even though the
implementation here is based on msgspec.  They should also be an inspiration for
the following PRs of typification, as the models and implementations can provide
a good direction.

Why just one commit?
====================

I tried to create several (thematically separated) commits, but gave up at some
point ... there are too many things to tackle at once / The comprehensibility of
the commits would not be improved by a thematic separation. On the contrary, we
would have to make multiple changes at the same places and the goal of a change
would be vaguely recognizable in the fog of the commits.

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
@return42 return42 force-pushed the translation-answer-template branch from 8cddc4e to 455d258 Compare January 27, 2025 13:31
In [1] and [2] we discussed the need of a Result.results property and how we can
avoid unclear code.  This patch implements a class for the reslut-lists of
engines::

    searx.result_types.EngineResults

A simple example for the usage in engine development::

    from searx.result_types import EngineResults
    ...
    def response(resp) -> EngineResults:
        res = EngineResults()
        ...
        res.add( res.types.Answer(answer="lorem ipsum ..", url="https://example.org") )
        ...
        return res

[1] searxng#4183 (review)
[2] searxng#4183 (comment)
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
@return42
Copy link
Member Author

That'd indeed be a nice syntax. This is also much closer to the way described above using an array of results, and thus seems very intuitive for engine developers.

@Bnyro I pushed a new commit on top .. in this patch class searx.result_types.EngineResults is implemented and used in the engines:

    from searx.result_types import EngineResults
    ...
    def response(resp) -> EngineResults:
        res = EngineResults()
        ...
        res.add( res.types.Answer(answer="lorem ipsum ..", url="https://example.org") )
        ...
        return res

BTW the property Result.results is no longer needed / I dropped this field.

@Bnyro
Copy link
Member

Bnyro commented Jan 27, 2025

@Bnyro I pushed a new commit on top .. in this patch class searx.result_types.EngineResults is implemented and used in the engines:

    from searx.result_types import EngineResults
    ...
    def response(resp) -> EngineResults:
        res = EngineResults()
        ...
        res.add( res.types.Answer(answer="lorem ipsum ..", url="https://example.org") )
        ...
        return res

BTW the property Result.results is no longer needed / I dropped this field.

Awesome, thanks for your work!

@return42 return42 merged commit 36a1ef1 into searxng:master Jan 28, 2025
9 checks passed
@return42 return42 deleted the translation-answer-template branch January 28, 2025 06:07
GenericMale added a commit to GenericMale/searxng that referenced this pull request Jan 29, 2025
return42 added a commit to return42/searxng that referenced this pull request Feb 28, 2025
Before searxng#4183 a builtin plugin was *defautlt_on* when it is listed in the
"enabled_plugins" settings, this patch restores the previous behavior.

Not part of this patch but just to mentioning in context of searxng#4263:

  In the long term, we will abolish the "enabled_plugins:" setting and combine
  all options for the plugins in the "plugins:" setting, as is already planned
  in the PR searxng#4282

Closes: searxng#4263
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
return42 added a commit to return42/searxng that referenced this pull request Feb 28, 2025
Before searxng#4183 a builtin plugin was *defautlt_on* when it is listed in the
"enabled_plugins" settings, this patch restores the previous behavior.

Not part of this patch but just to mentioning in context of searxng#4263:

  In the long term, we will abolish the "enabled_plugins:" setting and combine
  all options for the plugins in the "plugins:" setting, as is already planned
  in the PR searxng#4282

Closes: searxng#4263
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
return42 added a commit that referenced this pull request Mar 1, 2025
Before #4183 a builtin plugin was *defautlt_on* when it is listed in the
"enabled_plugins" settings, this patch restores the previous behavior.

Not part of this patch but just to mentioning in context of #4263:

  In the long term, we will abolish the "enabled_plugins:" setting and combine
  all options for the plugins in the "plugins:" setting, as is already planned
  in the PR #4282

Closes: #4263
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants