Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix coeftable for saturated linear models #458

Merged
merged 5 commits into from
Apr 11, 2022
Merged

Fix coeftable for saturated linear models #458

merged 5 commits into from
Apr 11, 2022

Conversation

nalimilan
Copy link
Member

coeftable failed for saturated LinearModels due to trying to compute F and T distributions with zero DOF.

Fixes #456.

`coeftable` failed for saturated `LinearModel`s due to trying
to compute F and T distributions with zero DOF.
@codecov-commenter
Copy link

codecov-commenter commented Nov 26, 2021

Codecov Report

Merging #458 (d636397) into master (affcebc) will increase coverage by 0.99%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master     #458      +/-   ##
==========================================
+ Coverage   84.12%   85.12%   +0.99%     
==========================================
  Files           7        7              
  Lines         819      827       +8     
==========================================
+ Hits          689      704      +15     
+ Misses        130      123       -7     
Impacted Files Coverage Δ
src/glmfit.jl 78.74% <100.00%> (+0.14%) ⬆️
src/lm.jl 96.24% <100.00%> (+0.17%) ⬆️
src/linpred.jl 83.19% <0.00%> (+5.88%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update affcebc...d636397. Read the comment docs.

@andreasnoack
Copy link
Member

I don't think the NaNs are right here. I guess the standard errors and t-values are already correct, i.e. Inf and zero respectively. I believe the CI should also just be [-Inf,Inf] and the P-value should be one.

@nalimilan
Copy link
Member Author

Actually that was my first thought, but I changed to NaN when I noticed that GLMs used that already and that R did the same. But I agree it sounds mathematically more correct to use Inf, 1.0 and 0.0. This changes the behavior of coeftable for GLMs though (for LMs an error was thrown so it's OK), but I guess we can consider this as non-breaking.

@nalimilan
Copy link
Member Author

@andreasnoack I had forgotten this PR. OK to merge?

@palday
Copy link
Member

palday commented Apr 2, 2022

@nalimilan I think the infinite CIs make sense here for a saturated model, but when the model becomes "oversaturated" i.e. rank deficient, then we return NaNs for the test statistics, right? I guess that's still coherent -- in the pivoted case, the coefficients are set to zero and there's not really values for the associated errors, i.e. the errors are Not A Number. In the saturated case, it's just that the uncertainty is infinite.

@palday
Copy link
Member

palday commented Apr 2, 2022

@nalimilan make a patch bump and then we can immediately tag a release 😄

@nalimilan
Copy link
Member Author

Good point. For rank-deficient models, the PR used NaN for the t-value but not for the p-value and the CI. I've pushed a commit to fix this, with a test for lm. glm doesn't support rank-deficient models yet so I've left a comment to remember testing this as part of #340. BTW, note that with the PR dispersion will Inf for saturated rank-deficient models once we can fit them. This is better than returning NaN, right?

I've also bumped the minor version. That seems appropriate given that this PR changes the behavior a bit in use cases that are already supported in the current release, so this isn't just a small bugfix.

@palday
Copy link
Member

palday commented Apr 5, 2022

@nalimilan I agree with the minor instead of patch release.

Maybe @dmbates can comment on the origin of this convention and whether everything still makes sense?

The one advantage for p=1.0 that I can see from a naive perspective is that logical comparisons work as expected. For example, NaN > 0.05 is false. But then again if a user is paying that little attention, then I guess there's nothing we can do to help them.

@nalimilan
Copy link
Member Author

@palday Let's merge?

@palday palday merged commit 42a0d04 into master Apr 11, 2022
@palday palday deleted the nl/saturated branch April 11, 2022 18:14
@nalimilan
Copy link
Member Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Linear models failing to display with ambiguous datatypes with unhelpful error messages
4 participants