Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kNN works differently in scikit-learn 1.0 #5610

Closed
markotoplak opened this issue Sep 27, 2021 · 1 comment
Closed

kNN works differently in scikit-learn 1.0 #5610

markotoplak opened this issue Sep 27, 2021 · 1 comment
Labels
needs discussion Core developers need to discuss the issue

Comments

@markotoplak
Copy link
Member

scikit-learn introduces some changes. @janezd fixed the tests to pass in #5608.

The failed ROC test points out some changes in the underlying learners. Here I tried replicating the old version of the test in workflow with old scikit. This one seems to correspond to what the tests were expecting.

image

And here is how it looks with new scikit-learn.

image

I find differences in kNN disturbing. It would be prudent to also check other learners.

@markotoplak markotoplak added bug report Bug is reported by user, not yet confirmed by the core team bug A bug confirmed by the core team and removed bug report Bug is reported by user, not yet confirmed by the core team labels Sep 27, 2021
@markotoplak
Copy link
Member Author

markotoplak commented Sep 27, 2021

Hm, I saw the following warning in the scikit docs:

Regarding the Nearest Neighbors algorithms, if two neighbors and have identical distances but different labels, the result will depend on the ordering of the training data.

I tried playing with shuffling the data in the following workflow (the Data Sampler is set to sample 100% of the data) and saw that both the old and new versions are sensitive to the order of the data. The degenerate case we can observe on 1.0 on titanic disappears when the data is shuffled. Makes sense: titanic contains many repeated rows and is nicely sorted.

image

So perhaps we don't need to worry about it.

@markotoplak markotoplak added needs discussion Core developers need to discuss the issue and removed bug A bug confirmed by the core team labels Sep 27, 2021
@janezd janezd closed this as completed Oct 8, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs discussion Core developers need to discuss the issue
Projects
None yet
Development

No branches or pull requests

2 participants