-
-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Test and Score: Warn about transformation, raise error if all is nan #3323
Conversation
@markotoplak, something like this? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems fine, except for the first elif.
Did you test for self.resampling == self.TestOnTest
so that you do not get these warnings if you use the Preprocess input?
a267282
to
c23b1ae
Compare
Some tests fail because they trigger the new exception. I can obviously fix the tests. But. Do we consider getting all-nans after domain transformation an error? For the purpose of canvas, it is. The user has probably done something wrong and must be warned, and the result is useless anyway, so no harm done. For the purpose of Orange as DM library, it is not necessarily an error; numpy is generous with nans and infs and lets the programmer figure it out.. One solution is to degrade this from exception to a warning. However, widgets would then have to set the warning filter to escalate this warning to an exception, so they'd catch it and report. I would think that the above distinction is theoretical and in practice we would always escalate to an exception. Orange is never used outside of canvas. I'd like to hear some opinions. My preference is to keep it as an exception, but perhaps add another check: I would not trigger an exception if the original data already consisted of all-nans. |
@janezd, hmm, I really do not know. I'd go either for an exception (without the check) or a warning. Exception + check seems like a strange compromise... |
05d2f54
to
b21e756
Compare
@markotoplak, I had to add some checks to prevent raising exceptions in some semi-legitimate cases. |
b21e756
to
5fa1b6d
Compare
Codecov Report
@@ Coverage Diff @@
## master #3323 +/- ##
==========================================
+ Coverage 82.3% 82.32% +0.02%
==========================================
Files 360 360
Lines 64097 64121 +24
==========================================
+ Hits 52753 52788 +35
+ Misses 11344 11333 -11 |
5fa1b6d
to
a4f998d
Compare
I tried writing tests and I noticed that for scikit-learn based learners this PR does not really work. For scikit-learn methods Orange adds some very aggressive preprocessors, which work hard to remove all nans. For example, with logistic regression, I can still build a learner on iris and apply its output to titanic. I think preprocessing is so agressive to handle potentially missing columns in the test data. Ouch. |
We could still merge it because it works for non-skl methods, like Naive Bayes. And it will gradually work better as we gradually get rid of skl. :) I mean, it doesn't hurt to have this check, although it doesn't always work. To make it better, we could add a flag to preprocessors that would tell them to raise an exception if all data is |
How about first transforming the data to the original domain, then checking for NaNs, and afterwards continuing with however the classifier wants to preprocess? I just tried this:
Surprisingly, all tests pass. Perhaps we could then even remove some conditions, but I did not look well into it. Do you see any potential problems with this approach? |
a4f998d
to
c3a0f36
Compare
c3a0f36
to
19f595f
Compare
Thanks. Fixed as suggested, except that I don't transform the data to the original domain if this cannot raise an exception. |
I added some tests. I think it is ready to merge. Please check the tests, Janez. |
Thanks for the tests. I think they're OK. If you agree with the code, you can merge. |
Issue
Description of changes
Includes