-
Notifications
You must be signed in to change notification settings - Fork 65
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Bayesian instrumental variable estimation #213
Add Bayesian instrumental variable estimation #213
Conversation
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
Cool! 🚀 ! |
…ated MLE treatment effects Signed-off-by: Nathaniel <[email protected]>
Signed-off-by: Nathaniel <[email protected]>
Signed-off-by: Nathaniel <[email protected]>
Ok @drbenvincent I think this is starting to look like something concrete. I've added a new PyMC model type and a PyMC experiment type to run the Bayesian version of the IV regression discussed in @juanitorduz 's blog post. The docs build and I've added an example, and one integration test. I've compared in my example the Bayesian estimate with the two-stage least squares approach estimated with Sklearn's Linear Regression class. I've also chosen to allow priors to be specified for the coefficients. Defaulting to the MLE estimates where none are specified. Before I add more to the documentation and the write up, and add some more data validation tests and the like. I wanted to check that this approach made sense and fits into the broad patterns you already have here. |
causalpy/pymc_models.py
Outdated
sd_dist = pm.HalfCauchy.dist(beta=2, shape=2) | ||
chol, corr, sigmas = pm.LKJCholeskyCov( | ||
name="chol_cov", eta=2, n=2, sd_dist=sd_dist |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you want to make these prior parameters also customizable through the coords
mapping?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just want to clear up what you mean here - should I let the coord names and values be changeable or the prior values for the Lkj var. Or both?
|
||
|
||
@pytest.mark.integration | ||
def test_iv_reg(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we also check the expected effect estimation to test the implementation itself?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would also be nice to parametrize this test with different priors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Definitely intend to add more tests of this type. Just added one to ensure I could run the testing code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI I've added a parameter recovery example to the accompanying notebook. I just wasn't sure we wanted any long-running tests. It can take a little time to sample the multivariate distribution...
docs/source/notebooks/iv_pymc.ipynb
Outdated
@@ -0,0 +1,4160 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably a good idea. I was experimenting with svg at one point to try to keep file sizes down. But I have a vague memory that svg had some rendering issues on either GitHub or readthedocs.
docs/source/notebooks/iv_pymc.ipynb
Outdated
@@ -0,0 +1,4160 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Line #6. simple_ols_reg = sk_lin_reg().fit(X, y)
Not relevant for this PR but I think it would be nicer to test the bayesian estimates with statsmodels so that we get the usual stats like p-values and confidence intervals. WDYT?
Reply via ReviewNB
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have an ambition of specifically adding in statsmodels as a CausalPy backend, but I think you meant just to compare with a simple statsmodels output just for this example?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I wasn't sure of its status as a dependency and realised I could use sklearn to do 2SLS so I didn't have to risk adding any more dependencies.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's create a discussion or issue to discuss this. As I mentioned initially, this is irrelevant to the PR 🙈 !
@NathanielF It is looking good! I left some minor comments :) |
Codecov Report
@@ Coverage Diff @@
## main #213 +/- ##
==========================================
+ Coverage 71.26% 73.86% +2.60%
==========================================
Files 19 19
Lines 1044 1148 +104
==========================================
+ Hits 744 848 +104
Misses 300 300
|
Nice! Some quick thoughts:
I should have time to take a proper look at the code and example notebook in the next few days. |
Signed-off-by: Nathaniel <[email protected]>
Signed-off-by: Nathaniel <[email protected]>
Signed-off-by: Nathaniel <[email protected]>
Signed-off-by: Nathaniel <[email protected]>
docs/source/notebooks/iv_pymc.ipynb
Outdated
@@ -0,0 +1,4154 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Worth running black on this. It should be done in the regular pre-commit I believe
Reply via ReviewNB
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this runs at each pre-commit check
This is all looking very promising. So far I've just done a quick review of the example notebook. Will try to look at the code changes soon. Need to update from main as there have been quite a few updates :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, in a previous comment I requested updates to the glossary. Bear in mind that the way switched to a 'proper' sphinx glossary, so once you've updated from main, there should be a glossary.rst
file.
remove the over-cautious text on the code being alpha
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added some comments.
The example notebook is great. Depending what we do with adding separate non-Bayesian classes for IV's, we could just have this one notebook as it already has a component of comparing Bayesian and non-Bayesian approaches. So we could rename the notebook to just be general, removing reference to PyMC.
Signed-off-by: Nathaniel <[email protected]>
Signed-off-by: Nathaniel <[email protected]>
I think i've addressed the above points, but let me know if there is anything missing? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a axis labels for the final plot in the notebook?
I think after this we'll merge :)
causalpy/pymc_experiments.py
Outdated
as an outcome variable and in the data object to be used as a covariate. | ||
""" | ||
) | ||
check_binary = len(self.data[treatment.strip()].unique()) > 2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this check might need to be more robust because we're seeing the exception raised in the example notebook where presumably there is no problem.
It could be worth adding a test to check that we aren't getting false positives or false negatives with this check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This check is actually working fine. It just so happens that the treatment variable risk
in the model is a score between 0-10. The treatment effect here should be thought of as a dosage treatment rather than a binary treatment here. I've added a line to the write up mentioning this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah ok. So I misunderstood before - I didn't read that closely and assumed it had to be binary. If that's not the case, then I guess we can remove that exception? Sorry about that.
EDIT: For some reason it wasn't showing changes to me. I see that you changed the wording of the exception. Happy whichever way you want to do it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's leave it in. I think many of the use-cases will be binary treatments, and it's no harm to be reminded to take care interpreting coefficients. Can revisit when i implement the sklearn/statsmodels IV class if you like.
Signed-off-by: Nathaniel <[email protected]>
Signed-off-by: Nathaniel <[email protected]>
Depending on the previous commend about the exception, this looks ready. Ok for me to merge? |
Happy for you to merge! Thanks! |
Working through the repo to figure out how to best integrate the multivariate approach to Bayesian instrumental regression as seen in @juanitorduz 's blog. This is very early stages. Just want to make sure to test the basic idea (seems to work!) and get to know the Repo.
In addition happy to take feedback and have a conversation about choices. Will be experimenting a bit this weekend.
Relates to Add functionality for Bayesian Instrumental Variables #212