[G]VIF #548

palday · 2023-09-13T09:27:36Z

closes #428

codecov · 2023-09-13T09:32:49Z

Codecov Report

Patch coverage is 100.00% of modified lines.

Files Changed	Coverage
src/GLM.jl	`ø`
src/linpred.jl	`100.00%`

📢 Thoughts on this report? Let us know!.

bkamins · 2023-09-13T09:55:05Z

src/linpred.jl

@@ -362,7 +362,7 @@ fitted(m::LinPredModel) = m.rr.mu
 predict(mm::LinPredModel) = fitted(mm)
 residuals(obj::LinPredModel) = residuals(obj.rr)

-function formula(obj::LinPredModel)
+function StatsModels.formula(obj::LinPredModel)


While we are at it. When is it called. When I do:

julia> formula(lm(x, y)) ERROR: type LinearModel has no field fr julia> formula(glm(x, y, Normal())) ERROR: type GeneralizedLinearModel has no field fr

other methods are called.

Do we have tests for different cases when formula is not present?

Hmmm, will investigate. I thought we caught this when Milan removed TableRegressionModel.

On current master:

julia> formula(lm(ones(10, 1), randn(10))) ERROR: ArgumentError: model was fitted without a formula Stacktrace: [1] formula(obj::LinearModel{GLM.LmResp{Vector{Float64}}, GLM.DensePredChol{Float64, LinearAlgebra.CholeskyPivoted{Float64, Matrix{Float64}, Vector{Int64}}}}) @ GLM ~/Code/GLM.jl/src/linpred.jl:366 [2] top-level scope @ REPL[13]:1 julia> formula(glm(ones(10, 1), randn(10), Normal())) ERROR: ArgumentError: model was fitted without a formula Stacktrace: [1] formula(obj::GeneralizedLinearModel{GLM.GlmResp{Vector{Float64}, Normal{Float64}, IdentityLink}, GLM.DensePredChol{Float64, LinearAlgebra.CholeskyPivoted{Float64, Matrix{Float64}, Vector{Int64}}}}) @ GLM ~/Code/GLM.jl/src/linpred.jl:366 [2] top-level scope @ REPL[14]:1

(will have to keep this in mind for the backport to 1.x where we still have TableRegressionModel)

bkamins · 2023-09-13T09:56:40Z

test/runtests.jl

+@testset "[G]VIF" begin
+    duncan = RDatasets.dataset("car", "Duncan")
+    lm1 = lm(@formula(Prestige ~ 1 + Income + Education), duncan)
+    @test termnames(lm1)[2] == coefnames(lm1)


do we have tests when coefnames and termnames differ?

do we have a decision what should be done in the case of lm(X, y) (i.e. model fitted without formula, it still prints variable names as x1 etc.)

This falls back to StatsModels -- the test there is just making sure we've successfully imported and exported the symbol.

on master,termnames will error based on there being no formula (formula will return nothing).

I think termnames should not be defined if there is no formula -- there are only Terms when there is a formula.

So what should a user do to perform VIF analysis for the model = lm(X, y) case?

vif works, but not gvif. So I think they can still do vif. If they're able to construct a model matrix directly for something with non trivial contrast coding, then they could probably also do adapt the gvif source to extract the correct columns.

bkamins · 2023-09-13T10:00:04Z

test/runtests.jl

+    @test termnames(lm1)[2] == coefnames(lm1)
+    @test vif(lm1) ≈ gvif(lm1)
+    lm2 = lm(@formula(Prestige ~ 1 + Income + Education + Type), duncan)
+    @test gvif(lm2; scale=true) ≈ [1.486330, 2.301648, 1.502666] atol=1e-4


Can you please add a comment on where these values are taken from?

Also do we have tests for vif/gvif for glm?

Do we have tests for vif/gvif for models without formula?

Do we have tests for vif/gvif for models that have complex formulas, something like e.g @formula(y~(1+a*(b+log(c)))&(1+d))? (of course this is artificial, but I hope it is clear what I mean

These are just the StatsModels tests carried forward to models actually fitted here. 😄 But I can add a cross reference.

nalimilan · 2023-09-14T08:49:06Z

src/GLM.jl

@@ -21,7 +22,7 @@ module GLM
    export coef, coeftable, confint, deviance, nulldeviance, dof, dof_residual,
           loglikelihood, nullloglikelihood, nobs, stderror, vcov, residuals, predict,
           fitted, fit, fit!, model_response, response, modelmatrix, r2, r², adjr2, adjr²,
-           cooksdistance, hasintercept, dispersion
+           cooksdistance, hasintercept, dispersion, vif, gvif, termnames


Maybe we should just reexport StatsModels? That sounds natural.

The only "problem" is that breaking changes in StatsModels necessarily become breaking changes in GLM.

Yeah but there shouldn't be breaking changes in StatsModels minor releases, and anyway users who need these functions will do using StatsModels.

* [G]VIF * add reference value source * more tests * glm tests (cherry picked from commit b1ba4c5)

* [G]VIF (#548) * [G]VIF * add reference value source * more tests * glm tests (cherry picked from commit b1ba4c5) * fix formula implementation * version bump

[G]VIF

468d143

palday requested a review from bkamins September 13, 2023 09:28

bkamins reviewed Sep 13, 2023

View reviewed changes

palday added 3 commits September 14, 2023 09:29

add reference value source

f465d1a

more tests

c2ceb27

glm tests

977ecaa

palday requested a review from bkamins September 14, 2023 07:54

nalimilan reviewed Sep 14, 2023

View reviewed changes

bkamins approved these changes Sep 14, 2023

View reviewed changes

palday merged commit b1ba4c5 into master Sep 14, 2023
12 checks passed

palday deleted the pa/vif branch September 14, 2023 09:46

palday added a commit that referenced this pull request Sep 14, 2023

[G]VIF (#548)

9ed02f8

* [G]VIF * add reference value source * more tests * glm tests (cherry picked from commit b1ba4c5)

palday added a commit that referenced this pull request Sep 14, 2023

backport VIF to 1.x release (#549)

afbb513

* [G]VIF (#548) * [G]VIF * add reference value source * more tests * glm tests (cherry picked from commit b1ba4c5) * fix formula implementation * version bump

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[G]VIF #548

[G]VIF #548

palday commented Sep 13, 2023

codecov bot commented Sep 13, 2023 •

edited

Loading

bkamins Sep 13, 2023

palday Sep 13, 2023

palday Sep 14, 2023

bkamins Sep 13, 2023

palday Sep 13, 2023

palday Sep 14, 2023

bkamins Sep 14, 2023 •

edited

Loading

palday Sep 14, 2023

bkamins Sep 13, 2023

palday Sep 13, 2023

nalimilan Sep 14, 2023

palday Sep 14, 2023

nalimilan Sep 14, 2023

[G]VIF #548

[G]VIF #548

Conversation

palday commented Sep 13, 2023

codecov bot commented Sep 13, 2023 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bkamins Sep 14, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov bot commented Sep 13, 2023 •

edited

Loading

bkamins Sep 14, 2023 •

edited

Loading