Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-15940 Added kolmogorov-Smirnov statistic method to H2OBinomialModelMetrics #16353

Open
wants to merge 10 commits into
base: master
Choose a base branch
from

Conversation

Shashank1202
Copy link

@Shashank1202 Shashank1202 commented Aug 5, 2024

This PR address issue #15940

Added a "kolmogorov_smirnov()" method to the model_performance object in the H2O library.

Previously, users could retrieve the Kolmogorov-Smirnov (KS) statistic using model.kolmogorov_smirnov(), which provided the value on the training sample or out-of-bag (OOB) estimate for DRF models.

This enhancement improves usability by providing a more straightforward way to compute the KS statistic on different datasets.

Copy link
Contributor

@maurever maurever left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @Shashank1202. Thanks for your contribution! Please add a test case to test the new functionality you added.

@maurever maurever changed the title Added kolmogorov-Smirnov statistic method to H2OBinomialModelMetrics GH-15940 Added kolmogorov-Smirnov statistic method to H2OBinomialModelMetrics Aug 7, 2024
@Shashank1202
Copy link
Author

Hi @Shashank1202. Thanks for your contribution! Please add a test case to test the new functionality you added.

Hey @maurever . Thanks for your suggestion and I have added testing regarding this.

Copy link
Contributor

@maurever maurever left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove empty files and config files.
image

model = H2OGradientBoostingEstimator(ntrees=1, gainslift_bins=5)
model.train(x=["Origin", "Distance"], y="IsDepDelayed", training_frame=airlines)
ks = model.kolmogorov_smirnov()
ks = model.kolmogorov_smirnov(thresholds=[0.01, 0.5, 0.99])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest to keep both cases:

  1. call the method without thresholds
  2. call the method with thresholds

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @maurever
This was my first open-source contribution, and after your review, I have detailedly checked and made the necessary changes accordingly. Could you please go through it once more to ensure everything is in order?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @Shashank1202. I am happy you are trying to contribute to our open-source library. 👍

But your code is not working. I went through your code again and found out the goal is not to add thresholds as a parameter but to implement the kolmogorow_smirnov method on the performance object. So, we can call:

model.model_performance(data).kolmogorov_smirnov()

No thresholds are needed. The KS metric is calculated with different thresholds (same for gains lift) than other metrics such as AUC, etc.

@@ -976,3 +976,26 @@ def thresholds_and_metric_scores(self):
if 'thresholds_and_metric_scores' in self._metric_json:
return self._metric_json['thresholds_and_metric_scores']
return None

def kolmogorov_smirnov(self, thresholds= None):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def kolmogorov_smirnov(self, thresholds= None):
def kolmogorov_smirnov(self):

... validation_frame = valid)
>>> cars_gbm.kolmogorov_smirnov()
"""
return self.metric("ks", thresholds=thresholds)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The goal is something like this:

Suggested change
return self.metric("ks", thresholds=thresholds)
return max(self.gains_lift()["kolmogorov_smirnov"])

# Test with specific thresholds
model = H2OGradientBoostingEstimator(ntrees=1, gainslift_bins=5)
model.train(x=["Origin", "Distance"], y="IsDepDelayed", training_frame=airlines)
ks = model.kolmogorov_smirnov(thresholds=[0.01, 0.5, 0.99])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you tried to run the test? This is not working even if you add the method to binomial.py. The goal is not change and test model.kolmogorov_smirnov() but add and test model.model_performance(data).kolmogorov_smirnov()

Suggested change
ks = model.kolmogorov_smirnov(thresholds=[0.01, 0.5, 0.99])

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants