Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sklearn 1.6.0 compatibility #290

Open
SvenKlaassen opened this issue Jan 20, 2025 · 1 comment · May be fixed by #291
Open

Sklearn 1.6.0 compatibility #290

SvenKlaassen opened this issue Jan 20, 2025 · 1 comment · May be fixed by #291

Comments

@SvenKlaassen
Copy link
Member

With scikit-learn 1.6.0, __sklearn_tags__ were introduces, see Estimator Tags
This raised some issues at

Further, this caused issues for stacked Global Learners as in Example Gallery

Minimal Example

import doubleml as dml
import sklearn
import numpy as np
import pandas as pd
from sklearn.linear_model import LassoCV
from doubleml.rdd.datasets import make_simple_rdd_data
from doubleml.rdd import RDFlex
from doubleml.utils.global_learner import GlobalRegressor
from sklearn.ensemble import StackingRegressor

print(sklearn.__version__)
print(dml.__version__)

np.random.seed(42)
data_dict = make_simple_rdd_data(n_obs=1000, fuzzy=False)
cov_names = ['x' + str(i) for i in range(data_dict['X'].shape[1])]
df = pd.DataFrame(np.column_stack((data_dict['Y'], data_dict['D'], data_dict['score'], data_dict['X'])), columns=['y', 'd', 'score'] + cov_names)
dml_data = dml.DoubleMLData(df, y_col='y', d_cols='d', x_cols=cov_names, s_col='score')
ml_g = StackingRegressor([("global", GlobalRegressor(LassoCV())),
                          ("local", LassoCV())], final_estimator=LassoCV())
rdflex_obj = RDFlex(dml_data, ml_g, fuzzy=False)
rdflex_obj.fit()

Output:

1.6.0
0.9.3
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
[<ipython-input-4-deff49535681>](https://localhost:8080/#) in <cell line: 0>()
     20                           ("local", LassoCV())], final_estimator=LassoCV())
     21 rdflex_obj = RDFlex(dml_data, ml_g, fuzzy=False)
---> 22 rdflex_obj.fit()

6 frames
[/usr/local/lib/python3.11/dist-packages/sklearn/ensemble/_base.py](https://localhost:8080/#) in _validate_estimators(self)
    232         for est in estimators:
    233             if est != "drop" and not is_estimator_type(est):
--> 234                 raise ValueError(
    235                     "The estimator {} should be a {}.".format(
    236                         est.__class__.__name__, is_estimator_type.__name__[3:]

ValueError: The estimator GlobalRegressor should be a regressor.
@SvenKlaassen SvenKlaassen linked a pull request Jan 20, 2025 that will close this issue
6 tasks
@SvenKlaassen
Copy link
Member Author

The PR #291 updates global learners to work with scikit-learn>=1.6.0
The issue will be closed if also xgboost is updated.

@SvenKlaassen SvenKlaassen linked a pull request Jan 20, 2025 that will close this issue
6 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant