Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

T2118 fuzzy search #4401

Open
wants to merge 5 commits into
base: main
Choose a base branch
from
Open

T2118 fuzzy search #4401

wants to merge 5 commits into from

Conversation

AndyKilmory
Copy link
Collaborator

@AndyKilmory AndyKilmory commented Jan 14, 2025

What does this change?

This introduces the ability to allow fuzziness in basic searching (i.e. text typed into the search bar, but not chips) to help when users have spelling errors in their search terms

Screenshot 2025-01-29 at 17 14 57

The fuzziness can be switched on and off via an api config parameter and some of the variables controlling the behaviour of fuzziness have also been exposed;

Screenshot 2025-01-29 at 17 19 00

search.fuzziness.enabled : Boolean = true/false (default = false) <-- will fuzziness be activated
search.fuzziness.prefixLength : Int = 0...x (default = 1) <-- how many of the initial characters must be exact match
search.fuzziness.editDistance : String = AUTO or AUTO:short,med or [1,2,3,4] (default: AUTO) <-- sets the allowed edit distance, in case of AUTO this sets the edit distance based on search token length, AUTO:short,med configures the word length boundary for exact matches and single edit or double edit word legnths
search.fuzziness.maxExpansions : Int = 0..x (default = 50) <-- max number of variations created.

For more details see https://www.elastic.co/guide/en/elasticsearch/reference/7.17/query-dsl-fuzzy-query.html

Fuzzy search is only applied to Word queries e.g. Lightroom Develop and not Phrase queries e.g. "Lightroom Develop" (wrapped in quotes) as multi-match queries are treated as exact term match.

Note the intrioduction of fuzziness changes the search structure from 'cross-fields' to 'best-field' - this means that all the words/tokens searched for need to all appear in one of the searched fields rather than being able to appear across the range of searched fields.

If doument is;
{
title: 'red fox',
description: 'jumped over the dog'
}

a cross-field search for "red fox" will match this document but a best-field search will not - the document would need to have the description: 'red fox jumped over the dog' for a match to be found via best-field.

How should a reviewer test this change?

Ensure that the search results match up as expected given the chosen search parameters

Who should look at this?

Tested? Documented?

  • locally by committer
  • locally by Guardian reviewer
  • on the Guardian's TEST environment
  • relevant documentation added or amended (if needed)

@AndyKilmory AndyKilmory marked this pull request as ready for review January 29, 2025 17:36
@AndyKilmory AndyKilmory requested review from a team as code owners January 29, 2025 17:36
Copy link
Contributor

@Conalb97 Conalb97 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants