Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle different number of classes between y_true and y_pred #12

Draft
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

valosekj
Copy link
Member

Handle ValueError: y_true and y_pred contain different number of classes

valosekj and others added 4 commits January 21, 2025 17:06
…true_and_y_pred' of github.com:SomeoneInParticular/modular_optuna_ml into jv/experimental-fix_different_num_of_classes_between_y_true_and_y_pred
@SomeoneInParticular
Copy link
Collaborator

Is this finalized? Or are you still testing it in-draft?

@valosekj valosekj marked this pull request as ready for review January 27, 2025 21:07
@valosekj
Copy link
Member Author

I believe it can be tested by you and merged

@SomeoneInParticular
Copy link
Collaborator

SomeoneInParticular commented Jan 28, 2025

Apologies for the delay, wanted to really dig through everything before accepting this PR.

I don't think this is going to work long-term, as there is an edge case this does not account for currently. Namely, where their are unique classes in both the training and testing data. For example, if training has classes [A, B, C], and the testing has [A, B, D], right now the result for predict_proba for class C will be used to 'represent' the probability for class D in the testing dataset, which is inherently incorrect.

This is actually an issue with most of the supervised categorical metrics at the moment, and as such we will need to look into a more universal solution here... I'll dig into it later this week, but for now I don't think its smart to push this PR.

@valosekj
Copy link
Member Author

For example, if training has classes [A, B, C], and the testing has [A, B, D], right now the result for predict_proba for class C will be used to 'represent' the probability for class D in the testing dataset, which is inherently incorrect.

Very good point!

Okay, let's not processed with merging.

@valosekj valosekj marked this pull request as draft January 29, 2025 12:23
@valosekj valosekj added the experimental Related to elements of the code base which are volatile and/or not guaranteed to be implemented. label Jan 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
experimental Related to elements of the code base which are volatile and/or not guaranteed to be implemented.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants