Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Acronyms at the end of sentence are incorrectly parsed #13

Open
dinamic opened this issue Feb 20, 2020 · 1 comment
Open

Acronyms at the end of sentence are incorrectly parsed #13

dinamic opened this issue Feb 20, 2020 · 1 comment

Comments

@dinamic
Copy link

dinamic commented Feb 20, 2020

The library has been really useful to us to break text into sentences. I've noticed one issue so far. Seems like if a sentence ends with an acronym at the end of the text, everything is okay, but if there's another sentence after it - it gives an incorrect result. It goes even worse if the acronym is capitalized.

Here it works fine:

$sentences = $sentenceBreaker->split('Let\'s meet at 10:00 a.m..', \Sentence::SPLIT_TRIM);

var_dump($sentences);
array(1) {
  [0] =>
  string(25) "Let's meet at 10:00 a.m.."
}

But fails in this one:

$sentences = $sentenceBreaker->split('Let\'s meet at 10:00 a.m.. How about Greg?', \Sentence::SPLIT_TRIM);

var_dump($sentences);
array(2) {
  [0] =>
  string(22) "Let's meet at 10:00 a."
  [1] =>
  string(19) "m.. How about Greg?"
}

Here it fails with a capitalized acronym:

$sentences = $sentenceBreaker->split('Let\'s meet at 10:00 A.M.. How about Greg?', \Sentence::SPLIT_TRIM);

var_dump($sentences);
array(1) {
  [0] =>
  string(41) "Let's meet at 10:00 A.M.. How about Greg?"
}
@dinamic
Copy link
Author

dinamic commented Feb 20, 2020

@vanderlee would you be able to have a look into this one, please?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant