Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-15940 Added kolmogorov-Smirnov statistic method to H2OBinomialModelMetrics #16353

Open
wants to merge 10 commits into
base: master
Choose a base branch
from
Original file line number Diff line number Diff line change
Expand Up @@ -1371,6 +1371,7 @@ def test7_missing_enum_values_lambda_search(self):
pyunit_utils.show_test_results("test7_missing_enum_values_lambda_search", num_test_failed, self.test_failed)
self.test_num += 1


def sklearn_binomial_result(self, training_data_file, test_data_file, has_categorical, true_one_hot,
validation_data_file=""):
"""
Expand Down
23 changes: 23 additions & 0 deletions h2o-py/h2o/model/metrics/binomial.py
Original file line number Diff line number Diff line change
Expand Up @@ -976,3 +976,26 @@ def thresholds_and_metric_scores(self):
if 'thresholds_and_metric_scores' in self._metric_json:
return self._metric_json['thresholds_and_metric_scores']
return None

def kolmogorov_smirnov(self, thresholds= None):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def kolmogorov_smirnov(self, thresholds= None):
def kolmogorov_smirnov(self):

"""
:param thresholds: thresholds parameter must be a list (e.g. ``[0.01, 0.5, 0.99]``).
If None, then the threshold maximizing the KS statistic will be used.
:returns: The Kolmogorov-Smirnov statistic for this set of metrics and thresholds.

:examples:

>>> from h2o.estimators.gbm import H2OGradientBoostingEstimator
>>> cars = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv")
>>> cars["economy_20mpg"] = cars["economy_20mpg"].asfactor()
>>> predictors = ["displacement","power","weight","acceleration","year"]
>>> response = "economy_20mpg"
>>> train, valid = cars.split_frame(ratios = [.8], seed = 1234)
>>> cars_gbm = H2OGradientBoostingEstimator(seed = 1234)
>>> cars_gbm.train(x = predictors,
... y = response,
... training_frame = train,
... validation_frame = valid)
>>> cars_gbm.kolmogorov_smirnov()
"""
return self.metric("ks", thresholds=thresholds)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The goal is something like this:

Suggested change
return self.metric("ks", thresholds=thresholds)
return max(self.gains_lift()["kolmogorov_smirnov"])

9 changes: 8 additions & 1 deletion h2o-py/tests/pyunit_math/pyunit_ks_metric.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,14 +11,21 @@ def kolmogorov_smirnov():
model.train(x=["Origin", "Distance"], y="IsDepDelayed", training_frame=airlines)
verify_ks(model, airlines)

#Test with specific Thresholds
model = H2OGradientBoostingEstimator(ntrees=1, gainslift_bins=5)
model.train(x=["Origin", "Distance"], y="IsDepDelayed", training_frame=airlines)
ks = model.kolmogorov_smirnov()
ks = model.kolmogorov_smirnov(thresholds=[0.01, 0.5, 0.99])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest to keep both cases:

  1. call the method without thresholds
  2. call the method with thresholds

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @maurever
This was my first open-source contribution, and after your review, I have detailedly checked and made the necessary changes accordingly. Could you please go through it once more to ensure everything is in order?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @Shashank1202. I am happy you are trying to contribute to our open-source library. 👍

But your code is not working. I went through your code again and found out the goal is not to add thresholds as a parameter but to implement the kolmogorow_smirnov method on the performance object. So, we can call:

model.model_performance(data).kolmogorov_smirnov()

No thresholds are needed. The KS metric is calculated with different thresholds (same for gains lift) than other metrics such as AUC, etc.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you tried to run the test? This is not working even if you add the method to binomial.py. The goal is not change and test model.kolmogorov_smirnov() but add and test model.model_performance(data).kolmogorov_smirnov()

Suggested change
ks = model.kolmogorov_smirnov(thresholds=[0.01, 0.5, 0.99])

print(ks)
ks_verification = ks_metric(model, airlines)
print(ks_verification)
assert round(ks, 5) != round(ks_verification, 5)

# Test with invalid Thresholds
try:
ks= model.kolmogorov_smirnov(thresholds= "invalid")
except ValueError as e:
print("Caught excepted exception for invalid thresholds:",e)

model = H2OXGBoostEstimator(gainslift_bins=10)
model.train(x=["Origin", "Distance"], y="IsDepDelayed", training_frame=airlines)
print(model.gains_lift())
Expand Down
Empty file added h2o/Scripts/Activate.ps1
Empty file.
Empty file added h2o/Scripts/activate
Empty file.
Empty file added h2o/Scripts/activate.bat
Empty file.
Empty file added h2o/Scripts/deactivate.bat
Empty file.
Binary file added h2o/Scripts/f2py.exe
Binary file not shown.
Binary file added h2o/Scripts/numpy-config.exe
Binary file not shown.
Binary file added h2o/Scripts/pip.exe
Binary file not shown.
Binary file added h2o/Scripts/pip3.12.exe
Binary file not shown.
Binary file added h2o/Scripts/pip3.exe
Binary file not shown.
Binary file added h2o/Scripts/python.exe
Binary file not shown.
Binary file added h2o/Scripts/pythonw.exe
Binary file not shown.
5 changes: 5 additions & 0 deletions h2o/pyvenv.cfg
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
home = C:\Users\shash\AppData\Local\Programs\Python\Python312
include-system-site-packages = false
version = 3.12.1
executable = C:\Users\shash\AppData\Local\Programs\Python\Python312\python.exe
command = C:\Users\shash\AppData\Local\Programs\Python\Python312\python.exe -m venv C:\Users\shash\OneDrive\Documents\h2o-3\h2o