Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Linear models failing to display with ambiguous datatypes with unhelpful error messages #456

Closed
jakewilliami opened this issue Nov 21, 2021 · 2 comments · Fixed by #458
Closed

Comments

@jakewilliami
Copy link

jakewilliami commented Nov 21, 2021

The linear model is failing somewhere in show (I guess) if you give it DataFrames with type Any. MWE:

julia> df_test = DataFrame(date_numeric = [737810, 737841, 737869, 737900, 737930, 737961, 737991, 738022, 738053, 738083, 738114], num = Any[-74587.93, -74550.49, -74482.09, -74441.45, -74316.17, -74252.81, -73976.21, -73587.65, -73170.53, -72753.41, -72304.01])
11×2 DataFrame
 Row │ date_numeric  num
     │ Int64         Any
─────┼────────────────────────────
   1737810  -74587.9
   2737841  -74550.5
   3737869  -74482.1
   4737900  -74441.4
   5737930  -74316.2
   6737961  -74252.8
   7737991  -73976.2
   8738022  -73587.6
   9738053  -73170.5
  10738083  -72753.4
  11738114  -72304.0

julia> model = fit(LinearModel, @formula(date_numeric ~ num), df_test)
ERROR: ArgumentError: FDist: the condition ν1 > zero(ν1) && ν2 > zero(ν2) is not satisfied.
Stacktrace:
  # ...

julia> df_test = DataFrame(date_numeric = [737810, 737841, 737869, 737900, 737930, 737961, 737991, 738022, 738053, 738083, 738114], num = Float64[-74587.93, -74550.49, -74482.09, -74441.45, -74316.17, -74252.81, -73976.21, -73587.65, -73170.53, -72753.41, -72304.01]);

julia> model = fit(LinearModel, @formula(date_numeric ~ num), df_test)
StatsModels.TableRegressionModel{LinearModel{GLM.LmResp{Vector{Float64}}, GLM.DensePredChol{Float64, LinearAlgebra.CholeskyPivoted
{Float64, Matrix{Float64}}}}, Matrix{Float64}}

date_numeric ~ 1 + num

Coefficients:
─────────────────────────────────────────────────────────────────────────────────────
                      Coef.    Std. Error       t  Pr(>|t|)  Lower 95%      Upper 95%
─────────────────────────────────────────────────────────────────────────────────────
(Intercept)   746741.0       1103.35       676.79    <1e-21  7.44245e5  749237.0
num                0.118876     0.0149383    7.96    <1e-04  0.0850827       0.152668
─────────────────────────────────────────────────────────────────────────────────────

Took me quite a while to figure out what exactly was happening here (realised something weird was happening when running model = fit(LinearModel, @formula(date_numeric ~ num), df_test); in the REPL worked, but didn't work when I didn't suppress the display (i.e., when I removed the semicolon).

@nalimilan
Copy link
Member

The problem is that variables with eltype Any are treated as categorical. You get the same error by wrapping num in a CategoricalArray. So the underlying problem is that coeftable fails for a model with zero residual degrees of freedom. See #458.

@floswald
Copy link

I agree with the OP that the error message could be improved here. Here is another case, which I just found when reading data from an Excel sheet with XLSX.readtable (it parses everything as Any by default).

julia> d = DataFrame(Y = Any[1,2,3], X = Any[3,2,1])
3×2 DataFrame
 Row │ Y    X   
     │ Any  Any 
─────┼──────────
   1 │ 1    3
   2 │ 2    2
   3 │ 3    1

julia> lm(@formula(Y ~ X), d)
ERROR: MethodError: no method matching fit(::Type{LinearModel}, ::Matrix{Float64}, ::Matrix{Float64}, ::Nothing)
Closest candidates are:
  fit(::StatisticalModel, ::Any...) at /Users/74097/.julia/packages/StatsBase/IPydo/src/statmodels.jl:178
  fit(::Type{StatsBase.Histogram}, ::Any...; kwargs...) at /Users/74097/.julia/packages/StatsBase/IPydo/src/hist.jl:383
  fit(::Type{LinearModel}, ::AbstractMatrix{var"#s49"} where var"#s49"<:Real, ::AbstractVector{var"#s50"} where var"#s50"<:Real, ::Union{Nothing, Bool}; wts, dropcollinear) at /Users/74097/.julia/packages/GLM/hDWc9/src/lm.jl:156

can we not check for the element type of each column and show a warning if we find Any?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants