You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thanks for this crate. But I had some issues with normalization.
Here are some random examples from the polish alphabet, that don't work as expected; there might be more issues, though.
use nucleo_matcher::pattern::{AtomKind,CaseMatching,Normalization};fnmain(){// Example: https://en.wikipedia.org/wiki/Polish_alphabetlet tests = [('ą','a'),('ć','c'),('ę','e'),('ł','l'),('ń','n'),('ó','o'),('ś','s'),('ź','z'),('ż','z'),('Ą','A'),('Ć','C'),('Ę','E'),('Ł','L'),('Ń','N'),('Ó','O'),('Ś','S'),('Ź','Z'),('Ż','Z'),];// test normalization funcionprintln!("--- normalize ---");for(c, expected)in tests {let normalized = nucleo_matcher::chars::normalize(c);if normalized == expected {println!("okay: {} -> {}.", c, normalized);}else{println!("fail: {} -> {}. expected {}", c, normalized, expected);let c_lower = c.to_lowercase().next().unwrap();let expected_lower = expected.to_lowercase().next().unwrap();let normalized_lower = nucleo_matcher::chars::normalize(c_lower);if normalized_lower == expected_lower {println!(" but lowercase works: {} -> {}", c_lower, normalized_lower);}}}// test matcherprintln!("--- matcher ---");letmut matcher = nucleo_matcher::Matcher::new(nucleo_matcher::Config::DEFAULT);for(c, expected)in tests {// search for c in expectedlet haystack = nucleo_matcher::Utf32String::from(c.to_string());let atom = nucleo_matcher::pattern::Atom::new(&expected.to_string(),CaseMatching::Smart,Normalization::Smart,AtomKind::Exact,false,);if atom.score(haystack.slice_u32(..),&mut matcher).is_some(){println!("okay: {}", c)}else{println!("fail: {} not in {}", atom.needle_text(), haystack)}}}
--- normalize ---
okay: ą -> a.
okay: ć -> c.
okay: ę -> e.
okay: ł -> l.
okay: ń -> n.
okay: ó -> o.
okay: ś -> s.
okay: ź -> z.
okay: ż -> z.
fail: Ą -> Ą. expected A
but lowercase works: ą -> a
fail: Ć -> Ć. expected C
but lowercase works: ć -> c
fail: Ę -> Ę. expected E
but lowercase works: ę -> e
fail: Ł -> Ł. expected L
but lowercase works: ł -> l
fail: Ń -> Ń. expected N
but lowercase works: ń -> n
okay: Ó -> O.
fail: Ś -> Ś. expected S
but lowercase works: ś -> s
fail: Ź -> Ź. expected Z
but lowercase works: ź -> z
fail: Ż -> Ż. expected Z
but lowercase works: ż -> z
--- matcher ---
okay: ą
okay: ć
okay: ę
okay: ł
okay: ń
okay: ó
okay: ś
okay: ź
okay: ż
fail: A not in Ą
fail: C not in Ć
fail: E not in Ę
fail: L not in Ł
fail: N not in Ń
okay: Ó
fail: S not in Ś
fail: Z not in Ź
fail: Z not in Ż
Also, I think you might have made some copy-paste/codegen mistakes (here and in other places):
Thanks for this crate. But I had some issues with normalization.
Here are some random examples from the polish alphabet, that don't work as expected; there might be more issues, though.
Also, I think you might have made some copy-paste/codegen mistakes (here and in other places):
nucleo/matcher/src/chars/normalize.rs
Lines 340 to 390 in 6df3cd0
The text was updated successfully, but these errors were encountered: