-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for multi-stage formulas. #24
Comments
What is the intention of the first formula? What is exogenous and what is endogenous? Clearly the Z are instruments. |
Returning to this after several years 😓 . Multi-part formulas are already implemented as of @bashtage : If I were to take this further, I'd look to implement something like:
This is within reach of the parser now, but I'd love your take on this (given that you have much more experience in this space). |
An advanced syntax would be great. I have a few current uses.
|
Nice. I don't yet know how much it makes sense to always have these advanced operators in place (versus having a family of parsers that extend some common set), but I'll definitely be working toward making the parser capable of generating formulae for these kinds of situations. For further clarity:
On 3. Would a
|
I haven't really through about it. I could imagine that formulas could be nested. For example
could be something like
and when you access
Maybe too complicted. |
Just to add to this style of syntax, mlogit uses something similar for multinomial choice models. Not saying it should be implemented here, but there is another use case for the
|
@GuiMarthe This is actually already implemented in Formulaic (leaving the interpretation to the calling library). The wrapping library would then just need to validate that the formula has the expected structure (it could also, if desired, disable the intercept additions in the formula parser). |
In some of my work I am interested in exploring two-stage least-square regression on sparse data, and thus in making Formulaic able to handle it nicely.
My plan is to allow formulas of form:
y ~ a + [b + c ~ z1 + z2] | a + [e + f ~ z1 + z2] | d + [b + c ~ z1 + z2] | d + [e + f ~ z1 + z2]
In my proposed grammar, this would also be equivalent to:
y ~ (a|d) + [b + c | e + f ~ z1 + z2]
Using multipart syntax in the rhs of nested formulas would be forbidden.
The API for accessing the various pieces of this Formula is as yet not fully fleshed out, and naming has not been properly considered, but would be something like:
On a multipart formula like this one, calls to
get_model_matrix
will need to specify the part and stage for which the model matrix should be generated. If there is only one part or stage, this will not be necessary. Formulaic explicitly will not attempt to do any modeling with this, and will expect users of the library to do any memoisation that is required for two-stage least-squares to work when pumping new data sets through a pre-trained model.I'm especially keen to know what @bashtage thinks about this, given that this is something he has explored a lot more in linearmodels.
The text was updated successfully, but these errors were encountered: