这是indexloc提供的服务,不要输入任何密码
Skip to content

fix: search tags using keyword instead of fuzzy matching #4552

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

kvangork
Copy link
Member

@kvangork kvangork commented May 7, 2025

Adds an additional index subfield for tags, which is used only when the search_on query parameter is set to tags. Fuzzy matching will still be used when searching generally. Updates one spec to verify the new behavior. Unchanged behaviors are still covered well by other specs.

Closes WEB-616

Adds an additional index subfield for tags, which is used only when the search_on
query parameter is set to tags. Fuzzy matching will still be used when searching
generally. Updates one spec to verify the new behavior. Unchanged behaviors are
still covered well by other specs.

Closes WEB-616
@pleary
Copy link
Member

pleary commented May 7, 2025

A few things to note here:

  • Modifying the elasticsearch index settings in the index files should be done as you have done here. But that will not have any impact on production indices that have already been created - that information will only be used when recreating indices from scratch, like we do for example during tests. We also need to actively modify the production indices. In the past we have sometimes used migrations to do that. More often we will likely manually run the PUT mapping call before deploying as there has been at least one time we have been burned by timing where data was added to the index before the migration ran. And since elasticsearch mappings cannot be changed (they can be appended to but not modified or removed), if something writes to the index before the mapping is defined, that mapping key will forever stay with the default that Elasticsearch guesses for the type
  • This is adding a multi-field type of keyword for tags. Keywords are case-sensitive, as our our tags (sort of).
    • Note: tags have a strange trait where they are validated on the lower case version, and the case of the first instance is what is stored and gets used. For example if someone adds a tag of "Frogs", it'll be stored with that case. If someone later tries to add a tag of "frogs", a search will be performed that sees that "Frogs" already exists, and the tag of "Frogs" will be added to the observation, not "frogs"
    • The case sensitive nature of keyword means that searches for "frogs" will not find observations tagged as "Frogs". I suspect this will be a worse user experience
  • The searching logic in observation_index.rb is updated which is good and should be done. There are a few other places in there that tags is mentioned that should similarly be updated. That said, the logic in there is only used in some places within the rails app, for example for some uses of Observation.page_of_results. The changes here would not affect the API, and similar logic in the API would need to be updated
  • The search logic will eventually need to be updated, but if it needs to be updated to reference a new field (tags.keyword), that field will have no data in it after creation. In order to populate that field, observations will need to be reindexed. So it would be best to deploy the changes to the index first, perform all necessary reindexing, then later deploy changes to search logic so the fields the search logic is updated to reference will have been populated

To me the biggest thing to check here is - how exactly do we want tag searches to act? If we want them to be case-insensitive (e.g. searching "Frogs" finds "frogs"), a different Elasticsearch mapping (type text) and analyzer will need to be used

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants