-
Notifications
You must be signed in to change notification settings - Fork 79
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bug fix #145 #143 #135 #153 #146
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just the one question, and an out-of-scope observation that jumped out at me.
self.assertEqual(res[1], expected_leftover, "for=" + text) | ||
|
||
testExtract("Set the ambush for half an hour", | ||
"2017-06-27 13:34:00", "set ambush") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NOTE: remainder is also lowercased here, opening a new issue for this elsewhere
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
documented here #147
@@ -697,15 +739,16 @@ def test_numbers(self): | |||
self.assertEqual(normalize("that's eighteen nineteen twenty"), | |||
"that is 18 19 20") | |||
self.assertEqual(normalize("that's one nineteen twenty two"), | |||
"that is 1 19 20 2") | |||
"that is 1 19 22") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for these unittests we might want to throw a deprecation warning in case some downstream depends on this, relevant discussion MycroftAI/mycroft-core#1962
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i dont think we need a warning, if anything dowtreamns is depending on this behavior it is already utterly broken for n > 20 in general... we would have received bug reports already.
Wellp, I was just being obtuse with my only review comment =P Tests passing, satisfies the issues, that DeprecationWarning is up to you. Looks good to me. |
id rather not throw the deprecation warning, i consider this a bug, you dont tell your users "we are delaying this fix until next release, work around it somehow" only pointed that out due to the linked discussion, which should have been considered a bug since the start IMHO |
We could raise a DeprecationError, so at least stuff in the wild breaks... descriptively? |
Suppose that'd require you to put the correct behavior on a flag in the interim, which still isn't great... |
ordinals=None), 1) | ||
|
||
# test plurals | ||
# NOTE plurals are never considered ordinals, but also not |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ChanceNCounter @krisgesling does this note make sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is the plural form not considered a fraction in the tests where ordinals are True or None?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since you have to extract ordinals explicitly, and there's no such ordinal as The Three Fifth, it's probably sensible to avoid returning fractions in all cases.
On the other hand, it might make sense to call them plurals ordinals, on the assumption that extract_number("the thirds", ordinals=True)
will always be an input failure somewhere upstairs, like STT mishearing "the third."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i was unsure of this one, for consistency i thought it made sense that ordinals=True would not parse fractions at all, the same way ordinals=False would only parse fractions. But since we support explict ordinals (1st, 2nd, 3rd, 4th...) i'm not sure....
i guess its a matter of how explicit is this? what are the possible sources of ambiguity?
"the thirds" could very well mean 3 instead of 1/3 , so i don't think this should be considered an explicit fraction
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i wonder if when ordinals=True we should consider the plural form an integer?
@@ -400,7 +409,7 @@ def _extract_whole_number_with_text_en(tokens, short_scale, ordinals): | |||
current_val = val | |||
|
|||
else: | |||
if all([ | |||
if current_val and all([ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is an unrelated minor bug fix, during testing i ran into a case where current_val was None and int comparison inside all() would throw an exception
eventually ordinals flag in extract number should become an Enum, for backwards compatibility this should not be changed yet but only in next major version |
ordinals=None), 1) | ||
|
||
# test plurals | ||
# NOTE plurals are never considered ordinals, but also not |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is the plural form not considered a fraction in the tests where ordinals are True or None?
self.assertEqual(extract_number("one third of a cup", | ||
ordinals=None), 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'll need to make the ordinals=None
behaviour quite explicit in the docs as I can see this causing headaches. But given it defaults to False they will presumably be doing it intentionally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
at some point i would like to replace this with an Enum, i did it this way for backwards compat only. but imho None should be the default in a next major release
test/test_parse.py
Outdated
self.assertEqual(extract_number("sixth third"), | ||
1 / 6 / 3) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this one makes sense.
"Sixth third" would probably refer to "the sixth instance of a third" which I guess means there's 2
all up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. I dunno what should be returned here, but I don't think it should be cumulative.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this one has been around for ages, i just changed it to a different place so all related tests are grouped together
i dont think it makes a lot of sense either, but i'm declaring it out of scope since it isnt touched by this PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added a code comment for this one
# TODO imperfect test, maybe should return 'my favorite numbers are 20 2', | ||
# let is pass for now since this is likely a STT issue if ever | ||
# encountered in the wild and is somewhat ambiguous, if this was | ||
# spoken by a human the result is what we expect, if in written form | ||
# it is ambiguous but could mean separate numbers | ||
self.assertEqual(normalize('my favorite numbers are twenty 2'), | ||
'my favorite numbers are 22') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
may also be worth adding in where both numbers are digits
"my favorite numbers are 20 2"
Which I would expect would remain unchanged.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added a test for this, but result is also changed to a single number. This shares logic with extract_number and will be hard to fix without a bigger refactor, if you are ok with that i would like to leave it for a follow up PR
whats blocking this? some more context, here is a use case where the lack of case preservation is messing things up, in simple_NER it mutates my input text, so i'm actually mutating it before the function call so i can do a meaningful comparison, otherwise i will get false diffs between strings due to the lowercasing |
Next in line after Catalan, let this block release IMO. Shouldn't be too bad. The will-be merge conflicts look like they should be click-fixes. |
360fda9
to
0fe0ecf
Compare
squashed and pushed. speak soonish or forever hold your peace 🚀 |
fixes #145
fixes #143
fixes #135
fixed #153