Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

model.estimate_effect and model.refute_astimate throws 'A column-vector y was passed ...' error #1212

Open
diro5t opened this issue Jun 18, 2024 · 3 comments
Labels
bug Something isn't working

Comments

@diro5t
Copy link

diro5t commented Jun 18, 2024

Hi team,

when I use above mentioned methods I get the the following message:

"A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel()."

How can I avoid this? Is this a bug or am I doing something wrong?

Thanks

@diro5t diro5t added the bug Something isn't working label Jun 18, 2024
@drawlinson
Copy link
Contributor

Also getting this issue; there seems to be an easy fix described here, thought it would need to be done in DoWhy:

https://stackoverflow.com/questions/34165731/a-column-vector-y-was-passed-when-a-1d-array-was-expected

@bloebp
Copy link
Member

bloebp commented Nov 8, 2024

Hey, can you link to the code or give the full error stack? I can prepare a fix.
Or feel free to open a PR.

@drawlinson
Copy link
Contributor

drawlinson commented Nov 9, 2024

I haven't had time to get to the bottom of it, but I can share what I've found out so far... hope that helps!

You can see the issue in this old dowhy tutorial, code block 13:
https://www.pywhy.org/dowhy/v0.6/example_notebooks/dowhy_refuter_notebook.html

This is the error (it's actually just a warning):

A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().
A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().
A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().
A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().
A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().
A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().
A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().
A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().
...

I get this warning only when using the EconML Double ML model[s], and only in certain configurations. I attempted to follow this (more recent) tutorial:

https://www.pywhy.org/dowhy/v0.9.1/example_notebooks/tutorial-causalinference-machinelearning-using-dowhy-econml.html

Currently, I observe the warning when my outcome variable is categorical, meaning that model_y must be a classifier, e.g.:

if variable_type == "numerical":
  discrete_outcome = False
  model_y = GradientBoostingRegressor()
elif variable_type == "categorical":
  discrete_outcome = True
  model_y = GradientBoostingClassifier()
          
model_params["init_params"] = {
  'model_y': model_y,  # fit outcome to features
  'model_t': GradientBoostingClassifier(),  # fit treatment to features
  'model_final': LassoCV(fit_intercept=False),
  'discrete_treatment': True,  # Treatment always binary
  'discrete_outcome': discrete_outcome,  # default: False
}

model.estimate_effect(
  estimand,
  method_params=method_params,
  ...
)

In addition to the warning, when I try to run the refuters on models which give this warning, I also get an exception:

ValueError: object too deep for desired array

from causal_refuter.py, line 166, in perform_bootstrap_test:

def perform_bootstrap_test(estimate, simulations: List):
    # This calculates a two-sided percentile p-value
    # See footnotes in https://journals.sagepub.com/doi/full/10.1177/2515245920911881
    half_p_value = np.mean([(x > estimate.value) + 0.5 * (x == estimate.value) for x in simulations])
    return 2 * min(half_p_value, 1 - half_p_value)

The problem seems to be that estimate.value is not a type or shape which is compatible with this function. I checked the type of estimate.value and it's a NumPy array.

I'm sorry that's as far as I've got with this one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants