Accent-insensitive search for Greek #22

karlb · 2022-01-05T14:29:57Z

Accent-insensitive search works for latin characters, but not for Greek characters. Searching for "κοσμος" should yield results for "κόσμος".

ICU support could help with this, but is unfortunately not too easy to enable, see #14.

karlb · 2022-01-05T16:12:13Z

karlb · 2022-07-30T12:47:30Z

karlb · 2022-07-30T12:48:21Z

If I ever want to move off of sqlite, https://duckdb.org/ seems to have a better choice of tokenizers while keeping many of sqlite's benefits.

karlb · 2022-09-18T12:42:17Z

Using stemmers from https://github.com/abiliojr/fts5-snowball should also solve the problem. I'm not sure how much stemming should be done on a dictionary, though.

karlb · 2023-02-16T14:15:21Z

The unaccent function from sqlean's unicode SQLite extension can be used to remove the accents:

sqlite> .load ./unicode
sqlite> SELECT unaccent('κόσμος');
unaccent('κόσμος')
------------------
κοσμος

This still does not integrate it with the FTS index, but that should be doable.

karlb added the enhancement New feature or request label Jan 5, 2022

Provide feedback