-
Notifications
You must be signed in to change notification settings - Fork 114
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[G]VIF #548
Conversation
Codecov ReportPatch coverage is
📢 Thoughts on this report? Let us know!. |
@@ -362,7 +362,7 @@ fitted(m::LinPredModel) = m.rr.mu | |||
predict(mm::LinPredModel) = fitted(mm) | |||
residuals(obj::LinPredModel) = residuals(obj.rr) | |||
|
|||
function formula(obj::LinPredModel) | |||
function StatsModels.formula(obj::LinPredModel) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While we are at it. When is it called. When I do:
julia> formula(lm(x, y))
ERROR: type LinearModel has no field fr
julia> formula(glm(x, y, Normal()))
ERROR: type GeneralizedLinearModel has no field fr
other methods are called.
Do we have tests for different cases when formula is not present?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm, will investigate. I thought we caught this when Milan removed TableRegressionModel.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On current master:
julia> formula(lm(ones(10, 1), randn(10)))
ERROR: ArgumentError: model was fitted without a formula
Stacktrace:
[1] formula(obj::LinearModel{GLM.LmResp{Vector{Float64}}, GLM.DensePredChol{Float64, LinearAlgebra.CholeskyPivoted{Float64, Matrix{Float64}, Vector{Int64}}}})
@ GLM ~/Code/GLM.jl/src/linpred.jl:366
[2] top-level scope
@ REPL[13]:1
julia> formula(glm(ones(10, 1), randn(10), Normal()))
ERROR: ArgumentError: model was fitted without a formula
Stacktrace:
[1] formula(obj::GeneralizedLinearModel{GLM.GlmResp{Vector{Float64}, Normal{Float64}, IdentityLink}, GLM.DensePredChol{Float64, LinearAlgebra.CholeskyPivoted{Float64, Matrix{Float64}, Vector{Int64}}}})
@ GLM ~/Code/GLM.jl/src/linpred.jl:366
[2] top-level scope
@ REPL[14]:1
(will have to keep this in mind for the backport to 1.x where we still have TableRegressionModel)
@testset "[G]VIF" begin | ||
duncan = RDatasets.dataset("car", "Duncan") | ||
lm1 = lm(@formula(Prestige ~ 1 + Income + Education), duncan) | ||
@test termnames(lm1)[2] == coefnames(lm1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- do we have tests when
coefnames
andtermnames
differ? - do we have a decision what should be done in the case of
lm(X, y)
(i.e. model fitted without formula, it still prints variable names asx1
etc.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- This falls back to StatsModels -- the test there is just making sure we've successfully imported and exported the symbol.
- on
master
,termnames
will error based on there being no formula (formula
will returnnothing
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think termnames
should not be defined if there is no formula -- there are only Terms when there is a formula.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So what should a user do to perform VIF analysis for the model = lm(X, y)
case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
vif
works, but not gvif
. So I think they can still do vif
. If they're able to construct a model matrix directly for something with non trivial contrast coding, then they could probably also do adapt the gvif
source to extract the correct columns.
@test termnames(lm1)[2] == coefnames(lm1) | ||
@test vif(lm1) ≈ gvif(lm1) | ||
lm2 = lm(@formula(Prestige ~ 1 + Income + Education + Type), duncan) | ||
@test gvif(lm2; scale=true) ≈ [1.486330, 2.301648, 1.502666] atol=1e-4 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Can you please add a comment on where these values are taken from?
- Also do we have tests for
vif
/gvif
forglm
? - Do we have tests for
vif
/gvif
for models without formula? - Do we have tests for
vif
/gvif
for models that have complex formulas, something like e.g@formula(y~(1+a*(b+log(c)))&(1+d))
? (of course this is artificial, but I hope it is clear what I mean
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are just the StatsModels tests carried forward to models actually fitted here. 😄 But I can add a cross reference.
@@ -21,7 +22,7 @@ module GLM | |||
export coef, coeftable, confint, deviance, nulldeviance, dof, dof_residual, | |||
loglikelihood, nullloglikelihood, nobs, stderror, vcov, residuals, predict, | |||
fitted, fit, fit!, model_response, response, modelmatrix, r2, r², adjr2, adjr², | |||
cooksdistance, hasintercept, dispersion | |||
cooksdistance, hasintercept, dispersion, vif, gvif, termnames |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should just reexport StatsModels? That sounds natural.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only "problem" is that breaking changes in StatsModels necessarily become breaking changes in GLM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah but there shouldn't be breaking changes in StatsModels minor releases, and anyway users who need these functions will do using StatsModels
.
closes #428