Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test and Score: Warn about transformation, raise error if all is nan #3323

Merged
merged 3 commits into from
Nov 19, 2018

Conversation

janezd
Copy link
Contributor

@janezd janezd commented Oct 19, 2018

Issue
  • Domain transformations in Test and Score (and Predictions?) are implicit. Users should be informed about them.
Description of changes
  • Test and Score shows this exception as a user-readable error (without mentioning "domains")
  • If train and test data have different domains but transformation does not result in all-nans, an information about the transformation appears.
Includes
  • Code changes
  • Tests
  • Documentation

@janezd
Copy link
Contributor Author

janezd commented Oct 19, 2018

@markotoplak, something like this?

Copy link
Member

@markotoplak markotoplak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems fine, except for the first elif.

Did you test for self.resampling == self.TestOnTest so that you do not get these warnings if you use the Preprocess input?

Orange/base.py Outdated Show resolved Hide resolved
Orange/widgets/evaluate/owtestlearners.py Show resolved Hide resolved
@janezd janezd force-pushed the test-score-transformation branch from a267282 to c23b1ae Compare October 19, 2018 13:21
@janezd
Copy link
Contributor Author

janezd commented Oct 20, 2018

Some tests fail because they trigger the new exception. I can obviously fix the tests. But.

Do we consider getting all-nans after domain transformation an error? For the purpose of canvas, it is. The user has probably done something wrong and must be warned, and the result is useless anyway, so no harm done. For the purpose of Orange as DM library, it is not necessarily an error; numpy is generous with nans and infs and lets the programmer figure it out..

One solution is to degrade this from exception to a warning. However, widgets would then have to set the warning filter to escalate this warning to an exception, so they'd catch it and report. I would think that the above distinction is theoretical and in practice we would always escalate to an exception. Orange is never used outside of canvas.

I'd like to hear some opinions. My preference is to keep it as an exception, but perhaps add another check: I would not trigger an exception if the original data already consisted of all-nans.

@markotoplak
Copy link
Member

@janezd, hmm, I really do not know. I'd go either for an exception (without the check) or a warning. Exception + check seems like a strange compromise...

@lanzagar lanzagar added this to the 3.18 milestone Oct 26, 2018
@janezd janezd force-pushed the test-score-transformation branch 3 times, most recently from 05d2f54 to b21e756 Compare October 27, 2018 21:03
@janezd
Copy link
Contributor Author

janezd commented Oct 28, 2018

@markotoplak, I had to add some checks to prevent raising exceptions in some semi-legitimate cases.

@janezd janezd force-pushed the test-score-transformation branch from b21e756 to 5fa1b6d Compare November 6, 2018 21:22
@codecov
Copy link

codecov bot commented Nov 6, 2018

Codecov Report

Merging #3323 into master will increase coverage by 0.02%.
The diff coverage is 96.42%.

@@            Coverage Diff             @@
##           master    #3323      +/-   ##
==========================================
+ Coverage    82.3%   82.32%   +0.02%     
==========================================
  Files         360      360              
  Lines       64097    64121      +24     
==========================================
+ Hits        52753    52788      +35     
+ Misses      11344    11333      -11

@janezd janezd force-pushed the test-score-transformation branch from 5fa1b6d to a4f998d Compare November 7, 2018 13:43
@markotoplak markotoplak self-assigned this Nov 7, 2018
@markotoplak
Copy link
Member

I tried writing tests and I noticed that for scikit-learn based learners this PR does not really work. For scikit-learn methods Orange adds some very aggressive preprocessors, which work hard to remove all nans. For example, with logistic regression, I can still build a learner on iris and apply its output to titanic.

I think preprocessing is so agressive to handle potentially missing columns in the test data. Ouch.

@janezd
Copy link
Contributor Author

janezd commented Nov 8, 2018

We could still merge it because it works for non-skl methods, like Naive Bayes. And it will gradually work better as we gradually get rid of skl. :)

I mean, it doesn't hurt to have this check, although it doesn't always work.

To make it better, we could add a flag to preprocessors that would tell them to raise an exception if all data is nan. When using them for preprocessing training data, we'd set this flag. Since a model cannot be fit to a table of nans, raising such an exception would make sense, right?

@markotoplak
Copy link
Member

How about first transforming the data to the original domain, then checking for NaNs, and afterwards continuing with however the classifier wants to preprocess? I just tried this:

            # transform to the original domain and check if the data is compatible
            data = data.transform(self.original_domain)
             if data.X.size \
                     and np.isnan(data.X).all() \
                     and not np.isnan(orig.X).all() \
                     and data.domain.attributes != orig.domain.attributes:
                 raise DomainTransformationError(
                     "domain transformation produced no defined values")
            # apply transformations added by the learner
            data = data.transform(self.domain)

Surprisingly, all tests pass. Perhaps we could then even remove some conditions, but I did not look well into it.

Do you see any potential problems with this approach?

@lanzagar lanzagar modified the milestones: 3.18, 3.19 Nov 13, 2018
@janezd janezd force-pushed the test-score-transformation branch from a4f998d to c3a0f36 Compare November 18, 2018 13:05
@janezd janezd force-pushed the test-score-transformation branch from c3a0f36 to 19f595f Compare November 18, 2018 13:06
@janezd
Copy link
Contributor Author

janezd commented Nov 18, 2018

Thanks. Fixed as suggested, except that I don't transform the data to the original domain if this cannot raise an exception.

@markotoplak
Copy link
Member

I added some tests. I think it is ready to merge. Please check the tests, Janez.

@janezd
Copy link
Contributor Author

janezd commented Nov 19, 2018

Thanks for the tests. I think they're OK. If you agree with the code, you can merge.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants