-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for nested formulae (useful e.g. in IV contexts). #108
base: main
Are you sure you want to change the base?
Conversation
74b45e9
to
b3575c8
Compare
b3575c8
to
4f0cb98
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #108 +/- ##
===========================================
- Coverage 100.00% 99.75% -0.25%
===========================================
Files 53 39 -14
Lines 2850 2425 -425
===========================================
- Hits 2850 2419 -431
- Misses 0 6 +6
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
4f0cb98
to
891c31a
Compare
@bashtage Any thoughts on this before it gets merged? |
@s3alfisc: I just saw your project @ https://github.com/s3alfisc/pyfixest to implement fixest for Python. That looks awesome. I had some internal work that did IV based on this PR, but I was wondering whether you would be interested in having this support too? |
Hi Matthew - yes, I'd definitely be interested in that! Right now I do a lot of string parsing to get the two formulas for first and second stage and call 'model_matrix' twice. Likely not very efficient and clearly not too elegant, but it works =) please let me know if I can be of any help in testing & debugging this PR! |
They syntax looks good to me. I will definitely switch from my own so-so parser to this. |
891c31a
to
5b88650
Compare
Hi @bashtage ,
Persuant to #24, I did a quick draft of additional support for IV-like formula in formulaic (in addition to the multi-part formula that was already implemented). There are some bugs and rough edges, but would you mind taking a look and adding any suggestions? I'm also not sure whether this should be a plugin or part of the default stack, so your thoughts there would be helpful too. All naming/etc is in draft status, so you can feel free to suggest improvements there.
Suppose you wanted to model some data using IV. With these patches you could write:
The resulting formula could then be parsed by the consumer of the formula to do the right things.
If you end up using an interaction term, or later multiplying, formulaic still does the right thing.
The
x1:x2_hat
is considered one factor, and looked up by name.Note that this could also (with a small amount of effort) also be used for double ML (if we add a
delta
transform/operator), and more general things like:Though this does stress credulity a bit.
Lastly, I plan to add some utility methods to Formulaic to allow easy recursive iteration over the formula to assist with the evaluation of dependencies and updating of the dataframe as you go up the tree. This might even be able to be integrated into the high-level tooling, if so desired, with the user passing a
dep_data_resolver
hook of some description.closes: #24