-
Notifications
You must be signed in to change notification settings - Fork 66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LintGroup::lint() takes >1s with ~35kb of markdown #257
Comments
I'm honored that you're interested in using Harper for Lockbook! I'm quite certain your slowdown is coming from our spell-checker implementation, which is currently quite slow and being rewritten by @grantlemons. Try disabling it and give it another go. |
I tried disabling spell check using the following config and the runtime came down to ~15ms 👍 let mut linter = LintGroup::new(
LintGroupConfig {
spelled_numbers: Some(true),
an_a: Some(true),
sentence_capitalization: Some(true),
unclosed_quotes: Some(true),
wrong_quotes: Some(true),
long_sentences: Some(true),
repeated_words: Some(true),
spaces: Some(true),
matcher: Some(true),
correct_number_suffix: Some(true),
number_suffix_capitalization: Some(true),
multiple_sequential_pronouns: Some(true),
linking_verbs: Some(true),
avoid_curses: Some(true),
terminating_conjunctions: Some(true),
ellipsis_length: Some(true),
dot_initialisms: Some(true),
spell_check: Some(false), // tl;dr this is the only one that's false
},
dict,
); |
Fantastic. Once @grantlemons has made his spell checking changes this should be a resolved issue. We shall keep you up-to-date, but I can't imagine it will be too long. |
I noticed Grant's working on this (perhaps even as I write!) and wanted to put this out there: have you considered delegating to another library for spell checking? As I understand, Harper is a grammar checker competing with Grammarly and LanguageTool by emphasizing performance and privacy. Competing with spell checkers like Hunspell doesn't seem to be the goal because implementations like Hunspell don't have the drawbacks of API-based grammar checkers. I would specifically recommend trying Spellbook (out of the handful suggested for comparison in #215). Here's their API, reproduced from the top of their readme: fn main() {
let aff = std::fs::read_to_string("en_US.aff").unwrap();
let dic = std::fs::read_to_string("en_US.dic").unwrap();
let dict = spellbook::Dictionary::new(&aff, &dic).unwrap();
let word = std::env::args().nth(1).expect("expected a word to check");
if dict.check(&word) {
println!("{word:?} is in the dictionary.");
} else {
let mut suggestions = Vec::new();
dict.suggest(&word, &mut suggestions);
eprintln!("{word:?} is NOT in the dictionary. Did you mean {suggestions:?}?");
std::process::exit(1);
}
} I POC'd a Spellbook integration for Lockbook and it linted that 35kb Markdown file in ~1ms. Perhaps that's due to memory optimizations they document. Perhaps it's due to their interface: they check single words at a time and offer suggestions separately. Because I'm writing a markdown editor, I already have my markdown parsed all the time (for rendering) and my words parsed all the time to support features like jumping the cursor to the next word boundary with alt+left and alt+right. So, while it's extra work for me to segment words and not spell check words in code blocks, it's work that can be done once in my app instead of Harper redo'ing that work for convenience. On suggestions, I don't need all the suggestions provided up front because they'll only appear in my app for one word at a time (when the user right-clicks a word with a red underline), so Spellbook's suggest-for-a-single-word API is perfect and presumably saves a lot of work. Since Grant's rewriting Harper's spell checking, it's hard to compare, but for suggestions I noticed Grant added A big asterisk here is that Spellbook's suggestions are work-in-progress. I had to pull code from their master branch and comment out a Anyway, thanks for indulging my rant. This has been my case for delegating spell checking to Spellbook. I actually got the idea from a comment by the maintainer of Spellbook himself on their repo's only open issue. |
While you are correct that this is in the The feature is in pull request right now, but benchmarks show uncached spellchecking as being about 10x faster than before, though I'm not sure how it performs relative to As for spellbook integration, @elijah-potter and I discussed it recently, and there was a reason it wouldn't work, though I can't remember it off of the top of my head. Maybe user or file dictionaries? |
We actually use our dictionary for things other than spellchecking, namely tagging words in sentences based on which roles they could fulfill. For example, Spellbook doesn't allow us to mark words like We could still use Spellbook for spellchecking anyway, but it would mean bundling a duplicate dictionary with the binary, which isn't very conducive to our goal of minimizing bundle size. That doesn't mean we won't do it, but it might just be easier to get our spellchecking implementation up to snuff instead. |
Ah okay, makes sense to me! I got caught up working on another feature but I'll give it a try in a few days and let you know how it goes |
Hey, @tvanderstad, I just wanted to check in. We've fully released our faster implementation of spellchecking. Would you mind giving it a whirl on your end? It's still nascent, so we're eager for feedback. Thanks, and I hope your holidays have been swell! |
Thanks for checking in! My holidays were busy but wonderful. I'll try it out this week and get you that feedback - feel free to nudge me again if I forget. |
Hey @elijah-potter! I tried harper-core 0.14 and checking is still taking about 1.0s (down from 1.2s). It's an improvement but it's not categorical. Let me know if there's more information I can provide to help. This is only a bug if you still think it is. It's my first time building a text editor so I'm not sure what performance standards to hold for grammar checking. I can run harper on a background thread, it's just extra work to handle the situations where the grammar check is out-of-date with respect to the document e.g. when the user corrected a word but the grammar check hasn't re-run yet |
@tvanderstad, that doesn't sound right to me. Would you mind sharing your code? |
Sure! It's sitting in this PR in the file let start = std::time::Instant::now();
let mut lints = linter.lint(&document);
println!("linting took {:?}", start.elapsed()); // prints values like 1.2s for ~35kb of text |
Me again 👋
I managed to incorporate
harper-core
into the markdown text editor for Lockbook and included a recorded demo in a PR. It was super easy to get set up with, so, props for that.Because Harper emphasizes speed, I implemented it so it runs synchronously on every keystroke. This ended up being a Bad Time™️ when I opened my monthly dev log, a ~35kb markdown file, which takes around ~1.2s to lint.
I can get it usable by integrating it asynchronously, but your readme makes bold performance claims and asks for performance bug reports. I also don't see why grammar checking couldn't be fast - there's a couple things we get away with doing every keystroke, including parse the markdown AST.
Here's the code from that PR that measures the performance so you know I'm doing it right:
The text was updated successfully, but these errors were encountered: