Fix `coeftable` for saturated linear models #458

nalimilan · 2021-11-26T22:38:57Z

coeftable failed for saturated LinearModels due to trying to compute F and T distributions with zero DOF.

Fixes #456.

`coeftable` failed for saturated `LinearModel`s due to trying to compute F and T distributions with zero DOF.

codecov-commenter · 2021-11-26T22:44:16Z

Codecov Report

Merging #458 (d636397) into master (affcebc) will increase coverage by 0.99%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master     #458      +/-   ##
==========================================
+ Coverage   84.12%   85.12%   +0.99%     
==========================================
  Files           7        7              
  Lines         819      827       +8     
==========================================
+ Hits          689      704      +15     
+ Misses        130      123       -7

Impacted Files	Coverage Δ
src/glmfit.jl	`78.74% <100.00%> (+0.14%)`	⬆️
src/lm.jl	`96.24% <100.00%> (+0.17%)`	⬆️
src/linpred.jl	`83.19% <0.00%> (+5.88%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update affcebc...d636397. Read the comment docs.

andreasnoack · 2021-11-30T14:41:11Z

I don't think the NaNs are right here. I guess the standard errors and t-values are already correct, i.e. Inf and zero respectively. I believe the CI should also just be [-Inf,Inf] and the P-value should be one.

nalimilan · 2021-12-04T19:37:07Z

Actually that was my first thought, but I changed to NaN when I noticed that GLMs used that already and that R did the same. But I agree it sounds mathematically more correct to use Inf, 1.0 and 0.0. This changes the behavior of coeftable for GLMs though (for LMs an error was thrown so it's OK), but I guess we can consider this as non-breaking.

nalimilan · 2022-03-31T15:10:14Z

@andreasnoack I had forgotten this PR. OK to merge?

palday · 2022-04-02T19:34:30Z

@nalimilan I think the infinite CIs make sense here for a saturated model, but when the model becomes "oversaturated" i.e. rank deficient, then we return NaNs for the test statistics, right? I guess that's still coherent -- in the pivoted case, the coefficients are set to zero and there's not really values for the associated errors, i.e. the errors are Not A Number. In the saturated case, it's just that the uncertainty is infinite.

palday · 2022-04-02T19:35:00Z

@nalimilan make a patch bump and then we can immediately tag a release 😄

nalimilan · 2022-04-03T10:51:31Z

Good point. For rank-deficient models, the PR used NaN for the t-value but not for the p-value and the CI. I've pushed a commit to fix this, with a test for lm. glm doesn't support rank-deficient models yet so I've left a comment to remember testing this as part of #340. BTW, note that with the PR dispersion will Inf for saturated rank-deficient models once we can fit them. This is better than returning NaN, right?

I've also bumped the minor version. That seems appropriate given that this PR changes the behavior a bit in use cases that are already supported in the current release, so this isn't just a small bugfix.

palday · 2022-04-05T16:03:26Z

@nalimilan I agree with the minor instead of patch release.

Maybe @dmbates can comment on the origin of this convention and whether everything still makes sense?

The one advantage for p=1.0 that I can see from a naive perspective is that logical comparisons work as expected. For example, NaN > 0.05 is false. But then again if a user is paying that little attention, then I guess there's nothing we can do to help them.

nalimilan · 2022-04-11T15:37:12Z

@palday Let's merge?

nalimilan · 2022-04-11T19:13:25Z

JuliaRegistries/General#58360

Fix coeftable for saturated linear models

4affa6d

`coeftable` failed for saturated `LinearModel`s due to trying to compute F and T distributions with zero DOF.

nalimilan requested a review from andreasnoack November 26, 2021 22:38

nalimilan mentioned this pull request Nov 26, 2021

Linear models failing to display with ambiguous datatypes with unhelpful error messages #456

Closed

Use Inf/1.0/0.0 rather than NaN

0d81ff0

palday approved these changes Apr 2, 2022

View reviewed changes

nalimilan added 3 commits April 3, 2022 12:36

Fix handling of rank-deficient models

182bf75

Bump minor version

575c58a

Merge branch 'master' into nl/saturated

d636397

nalimilan mentioned this pull request Apr 8, 2022

fix predict docstring and remove trailing dim #467

Merged

palday merged commit 42a0d04 into master Apr 11, 2022

palday deleted the nl/saturated branch April 11, 2022 18:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix `coeftable` for saturated linear models #458

Fix `coeftable` for saturated linear models #458

nalimilan commented Nov 26, 2021

codecov-commenter commented Nov 26, 2021 •

edited

Loading

andreasnoack commented Nov 30, 2021

nalimilan commented Dec 4, 2021

nalimilan commented Mar 31, 2022

palday commented Apr 2, 2022

palday commented Apr 2, 2022

nalimilan commented Apr 3, 2022

palday commented Apr 5, 2022

nalimilan commented Apr 11, 2022

nalimilan commented Apr 11, 2022

Fix coeftable for saturated linear models #458

Fix coeftable for saturated linear models #458

Conversation

nalimilan commented Nov 26, 2021

codecov-commenter commented Nov 26, 2021 • edited Loading

Codecov Report

andreasnoack commented Nov 30, 2021

nalimilan commented Dec 4, 2021

nalimilan commented Mar 31, 2022

palday commented Apr 2, 2022

palday commented Apr 2, 2022

nalimilan commented Apr 3, 2022

palday commented Apr 5, 2022

nalimilan commented Apr 11, 2022

nalimilan commented Apr 11, 2022

Fix `coeftable` for saturated linear models #458

Fix `coeftable` for saturated linear models #458

codecov-commenter commented Nov 26, 2021 •

edited

Loading