Releases: meilisearch/charabia
Charabia v0.9.1
Changes
- Add Turkish normalizer (#305) @tkhshtsh0917
- feat: Adds German compound words decomposition with new segmenter (#303) @luflow
- German: Adds some more test cases and updates dictionary (#306) @luflow
Thanks again to @ManyTheFish, @luflow, @meili-bors[bot], and @tkhshtsh0917! 🎉
Charabia v0.9.0
Changes
(BREAKING) Simplify lang detection (#299) @ManyTheFish
- The Language
allow_list
change from aHashMap<Script, Vec<Language>>
to a slice ofLanguage
:&[Language].
- Add the
tokenize_with_allow_list
method to theTokenizer
, allowing to dynamically pass aLanguage
allow list without having to re-build the tokenizer.
Add math symbols to default separators (#301) @phillitrOSU
Adds all math symbols from https://www.compart.com/en/unicode/category/Sm to the default separator list.
Thanks again to @ManyTheFish, @meili-bors[bot], and @phillitrOSU! 🎉
Charabia v0.8.12
Changes
- Update dependencies (#297) @irevoire
- Add null byte as hard context separator (#295) @LukasKalbertodt
- update internal dependencies for release (#298) @irevoire
Thanks again to @LukasKalbertodt, @irevoire and @meili-bors[bot]! 🎉
Charabia v0.8.11
Changes
- Adds a new normalizer to normalize œ to oe and æ to ae (#278) @Soham1803
- Upgrade Lindera to 0.31.0 (#292) @mosuka
- fix: fixed
chinese-normalization-pinyin
feature test failed (#291) @tkhshtsh0917
Thanks again to @ManyTheFish, @Soham1803, @curquiza, @meili-bors[bot], @mosuka and @tkhshtsh0917! 🎉
Charabia v0.8.10
Changes
- Update bors.toml with missing tests (#286) @curquiza
- Add swedish recomposition normalizer and link it to a feature (#287) @ManyTheFish
Thanks again to @ManyTheFish, @curquiza, @meili-bors[bot] ! 🎉
Charabia v0.8.9
Changes
- Add
\t
as recognized separator (#280) @Gusted - Update Lindera to 0.30.0 (#279) @mosuka
- Fix char boundary panic (#281) @ManyTheFish
- Make the pinyin-normalization optional (#282) @ManyTheFish
- This can be reactivated by enabling the
chinese-normalization-pinyin
feature
- This can be reactivated by enabling the
Thanks again to @Gusted, @ManyTheFish, and @mosuka! 🎉
Charabia v0.8.8
Changes
Thanks again to @6543, @ManyTheFish, @dependabot, @dependabot[bot], @meili-bors[bot], and @mosuka! 🎉
Charabia v0.8.7
Changes
- Fix compilation when vietnamese feature is disabled (#259) @timvisee
- Fix unused FstSegmenter warning when not using khmer compiler features (#261) @timvisee
- Update dependencies (#262) @agourlay
- Add vietnamese benchmarks (#267) @ManyTheFish
- Update README.md (#269) @ManyTheFish
- Vietnamese: Add laking tests and fix bug (#270) @ManyTheFish
Thanks again to @ManyTheFish, @agourlay, @curquiza, @dependabot, @dependabot[bot], @meili-bors[bot], and @timvisee! 🎉
Charabia v0.8.6
Changes
- Improve khmer segmenter performance by using fst segmenter (#251) @xshadowlegendx
- Fix
update-kvariants
CI (#256) @choznerol - normalize Ð and Đ into d (#257) @ngdbao
Thanks again to @ManyTheFish, @choznerol, @dependabot, @dependabot[bot], @meili-bors[bot], @ngdbao and @xshadowlegendx! 🎉
Charabia v0.8.5
Changes
- Fuzz testing with
quickcheck
for normalizers, segmenters, tokenizer and classifier. (#240) @choznerol - add khmer segmenter (#203) @xshadowlegendx
Thanks again to @ManyTheFish, @choznerol, @dependabot, @dependabot[bot], @meili-bors[bot], and @xshadowlegendx! 🎉