Use StatsAPI instead of StatsBase, define confint and vcov

StatsAPI provides a common namespace for packages providing statistical functionality. It supersedes StatsBase's use for this purpose. As part of the migration to using StatsAPI, which is itself non-functional, I've made two functional changes in the form of deprecations of custom functions in favor of their existing StatsAPI equivalents. Namely, `estimate_covar` has been deprecated in favor of `vcov` and `confidence_interval` has been deprecated in favor of `confint`. While the former is a simple renaming, the latter also involves specifying the confidence level rather than the Type I error rate (1 - alpha instead of alpha).
JuliaNLSolvers · Oct 6, 2023 · add24d8 · add24d8
1 parent 220a63c
commit add24d8
Show file tree

Hide file tree

Showing 6 changed files with 43 additions and 43 deletions.
diff --git a/Project.toml b/Project.toml
@@ -8,13 +8,13 @@ ForwardDiff = "f6369f11-7733-5829-9624-2563aa707210"
 LinearAlgebra = "37e2e46d-f89d-539d-b4ee-838fcccc9c8e"
 NLSolversBase = "d41bc354-129a-5804-8e4c-c37616107c6c"
 Printf = "de0858da-6303-5e67-8744-51eddeeeb8d7"
-StatsBase = "2913bbd2-ae8a-5f71-8c99-4fb6c76f3a91"
+StatsAPI = "82ae8749-77ed-4fe6-ae5f-f523153014b0"
 
 [compat]
 Distributions = "0.18, 0.19, 0.20, 0.21, 0.22, 0.23, 0.24, 0.25"
 ForwardDiff = "0.10"
 NLSolversBase = "7.5"
-StatsBase = "0.32, 0.33, 0.34"
+StatsAPI = "1"
 julia = "1.6"
 
 [extras]

diff --git a/README.md b/README.md
@@ -52,7 +52,7 @@ fit_bounds = curve_fit(model, xdata, ydata, p0_bounds, lower=lb, upper=ub)
 sigma = stderror(fit)
 # to get margin of error and confidence interval of each parameter at 5% significance level:
 margin_of_error = margin_error(fit, 0.05)
-confidence_inter = confidence_interval(fit, 0.05)
+confidence_inter = confint(fit; level=0.95)
 
 # The finite difference method is used above to approximate the Jacobian.
 # Alternatively, a function which calculates it exactly can be supplied instead.
@@ -206,18 +206,18 @@ If no weights are provided for the fits, the variance is estimated from the mean
 
 This returns the product of standard error and critical value of each parameter at `alpha` significance level.
 
-`confidence_interval = confidence_interval(fit, alpha=0.05; atol, rtol)`:
+`confidence_interval = confint(fit; level=0.05, atol, rtol)`:
 
 * `fit`: result of curve_fit (a `LsqFitResult` type)
-* `alpha`: significance level
+* `level`: confidence level
 * `atol`: absolute tolerance for negativity check
 * `rtol`: relative tolerance for negativity check
 
-This returns confidence interval of each parameter at `alpha` significance level.
+This returns confidence interval of each parameter at `level` significance level.
 
 ----
 
-`covar = estimate_covar(fit)`:
+`covar = vcov(fit)`:
 
 * `fit`: result of curve_fit (a `LsqFitResult` type)
 * `covar`: parameter covariance matrix calculated from the Jacobian of the model at the fit point, using the weights (if specified) as the inverse covariance of observations

diff --git a/docs/src/getting_started.md b/docs/src/getting_started.md
@@ -46,10 +46,10 @@ julia> param = fit.param
  2.0735
 ```
 
-`LsqFit.jl` also provides functions to examinep0 = [0.5, 0.5] the goodness of fit. `estimate_covar(fit)` computes the estimated covariance matrix.
+`LsqFit.jl` also provides functions to examinep0 = [0.5, 0.5] the goodness of fit. `vcov(fit)` computes the estimated covariance matrix.
 
 ```Julia
-julia> cov = estimate_covar(fit)
+julia> cov = vcov(fit)
 2×2 Array{Float64,2}:
  0.000116545  0.000174633
  0.000174633  0.00258261
@@ -64,10 +64,10 @@ julia> se = stderror(fit)
  0.0508193
 ```
 
-To get the confidence interval at 10% significance level, run `confidence_interval(fit, alpha)`, which essentially computes `the estimate parameter value` ± (`standard error` * `critical value from t-distribution`).
+To get the confidence interval at 10% significance level, run `confint(fit; level=0.9)`, which essentially computes `the estimate parameter value` ± (`standard error` * `critical value from t-distribution`).
 
 ```Julia
-julia> confidence_interval = confidence_interval(fit, 0.1)
+julia> confidence_interval = confint(fit; level=0.9)
 2-element Array{Tuple{Float64,Float64},1}:
  (0.992333, 1.02977)
  (1.98537, 2.16162)

diff --git a/docs/src/tutorial.md b/docs/src/tutorial.md
@@ -186,10 +186,10 @@ In `LsqFit.jl`, the covariance matrix calculation uses QR decomposition to [be m
 \mathbf{Cov}(\boldsymbol{\gamma}^*) = \hat{\sigma}^2 \mathrm{R}^{-1}(\mathrm{R}^{-1})'
 ```
 
-`estimate_covar()` computes the covariance matrix of fit:
+`vcov()` computes the covariance matrix of fit:
 
 ```Julia
-julia> cov = estimate_covar(fit)
+julia> cov = vcov(fit)
 2×2 Array{Float64,2}:
  0.000116545  0.000174633
  0.000174633  0.00258261
@@ -213,10 +213,10 @@ julia> margin_of_error = margin_error(fit, 0.1)
  0.0902435
 ```
 
-`confidence_interval()` returns the confidence interval of each parameter at certain significance level, which is essentially the estimate value ± margin of error. To get the confidence interval at 10% significance level, run:
+`confint()` returns the confidence interval of each parameter at certain significance level, which is essentially the estimate value ± margin of error. To get the confidence interval at 10% significance level, run:
 
 ```Julia
-julia> confidence_intervals = confidence_interval(fit, 0.1)
+julia> confidence_intervals = confint(fit; level=0.9)
 2-element Array{Tuple{Float64,Float64},1}:
  (0.976316, 1.01613)
  (1.91047, 2.09096)
@@ -352,7 +352,7 @@ Pass the vector of `1 ./ var(ε)` or the matrix `inv(covar(ε))` as the weight p
 ```Julia
 julia> wt = inv(cov_ε)
 julia> fit = curve_fit(m, tdata, ydata, wt, p0)
-julia> cov = estimate_covar(fit)
+julia> cov = vcov(fit)
 ```
 
 !!! note
@@ -372,7 +372,7 @@ Pass the matrix `inv(covar(ε))` as the weight parameter (`wt`) to the function
 ```Julia
 julia> wt = 1 ./ yvar
 julia> fit = curve_fit(m, tdata, ydata, wt, p0)
-julia> cov = estimate_covar(fit)
+julia> cov = vcov(fit)
 ```
 
 ## Estimate the Optimal Weight
@@ -388,7 +388,7 @@ Unweighted fitting (OLS) will return the residuals we need, since the estimator
 julia> fit_OLS = curve_fit(m, tdata, ydata, p0)
 julia> wt = 1 ./ fit_OLS.resid
 julia> fit_WLS = curve_fit(m, tdata, ydata, wt, p0)
-julia> cov = estimate_covar(fit_WLS)
+julia> cov = vcov(fit_WLS)
 ```
 
 ## References

diff --git a/src/LsqFit.jl b/src/LsqFit.jl
@@ -2,29 +2,29 @@ module LsqFit
 
 export curve_fit,
     margin_error,
-    confidence_interval,
-    estimate_covar,
     make_hessian,
     Avv,
-    # StatsBase reexports
+    # StatsAPI reexports
     dof,
     coef,
+    confint,
     nobs,
     mse,
     rss,
     stderror,
     weights,
-    residuals
+    residuals,
+    vcov
 
 using Distributions
 using LinearAlgebra
 using ForwardDiff
 using Printf
+using StatsAPI
 
 import NLSolversBase:
     value, value!, jacobian, jacobian!, value_jacobian!!, OnceDifferentiable
-import StatsBase
-import StatsBase: coef, dof, nobs, rss, stderror, weights, residuals
+using StatsAPI: coef, confint, dof, nobs, rss, stderror, weights, residuals, vcov
 
 import Base.summary
 

diff --git a/src/curve_fit.jl b/src/curve_fit.jl
@@ -7,12 +7,12 @@ struct LsqFitResult{P,R,J,W <: AbstractArray,T}
     wt::W
 end
 
-StatsBase.coef(lfr::LsqFitResult) = lfr.param
-StatsBase.dof(lfr::LsqFitResult) = nobs(lfr) - length(coef(lfr))
-StatsBase.nobs(lfr::LsqFitResult) = length(lfr.resid)
-StatsBase.rss(lfr::LsqFitResult) = sum(abs2, lfr.resid)
-StatsBase.weights(lfr::LsqFitResult) = lfr.wt
-StatsBase.residuals(lfr::LsqFitResult) = lfr.resid
+StatsAPI.coef(lfr::LsqFitResult) = lfr.param
+StatsAPI.dof(lfr::LsqFitResult) = nobs(lfr) - length(coef(lfr))
+StatsAPI.nobs(lfr::LsqFitResult) = length(lfr.resid)
+StatsAPI.rss(lfr::LsqFitResult) = sum(abs2, lfr.resid)
+StatsAPI.weights(lfr::LsqFitResult) = lfr.wt
+StatsAPI.residuals(lfr::LsqFitResult) = lfr.resid
 mse(lfr::LsqFitResult) = rss(lfr) / dof(lfr)
 isconverged(lsr::LsqFitResult) = lsr.converged
 
@@ -253,7 +253,7 @@ function curve_fit(
     lmfit(f, g, p0, wt; kwargs...)
 end
 
-function estimate_covar(fit::LsqFitResult)
+function StatsAPI.vcov(fit::LsqFitResult)
     # computes covariance matrix of fit parameters
     J = fit.jacobian
 
@@ -271,12 +271,12 @@ function estimate_covar(fit::LsqFitResult)
     return covar
 end
 
-function StatsBase.stderror(fit::LsqFitResult; rtol::Real=NaN, atol::Real=0)
+function StatsAPI.stderror(fit::LsqFitResult; rtol::Real=NaN, atol::Real=0)
     # computes standard error of estimates from
     #   fit   : a LsqFitResult from a curve_fit()
     #   atol  : absolute tolerance for approximate comparisson to 0.0 in negativity check
     #   rtol  : relative tolerance for approximate comparisson to 0.0 in negativity check
-    covar = estimate_covar(fit)
+    covar = vcov(fit)
     # then the standard errors are given by the sqrt of the diagonal
     vars = diag(covar)
     vratio = minimum(vars) / maximum(vars)
@@ -304,24 +304,24 @@ function margin_error(fit::LsqFitResult, alpha=0.05; rtol::Real=NaN, atol::Real=
     return std_errors * critical_values
 end
 
-function confidence_interval(
-    fit::LsqFitResult,
-    alpha=0.05;
-    rtol::Real=NaN,
-    atol::Real=0,
-)
+function StatsAPI.confint(fit::LsqFitResult; level=0.95, rtol::Real=NaN, atol::Real=0)
     # computes confidence intervals at alpha significance level from
     #   fit   : a LsqFitResult from a curve_fit()
-    #   alpha : significance level, e.g. alpha=0.05 for 95% confidence
+    #   level : confidence level
     #   atol  : absolute tolerance for approximate comparisson to 0.0 in negativity check
     #   rtol  : relative tolerance for approximate comparisson to 0.0 in negativity check
     std_errors = stderror(fit; rtol=rtol, atol=atol)
-    margin_of_errors = margin_error(fit, alpha; rtol=rtol, atol=atol)
-    confidence_intervals =
-        collect(zip(coef(fit) - margin_of_errors, coef(fit) + margin_of_errors))
+    margin_of_errors = margin_error(fit, 1 - level; rtol=rtol, atol=atol)
+    return collect(zip(coef(fit) - margin_of_errors, coef(fit) + margin_of_errors))
 end
 
+@deprecate(confidence_interval(fit::LsqFitResult, alpha=0.05; rtol::Real=NaN, atol::Real=0),
+           confint(fit; level=(1 - alpha), rtol=rtol, atol=atol))
+
+@deprecate estimate_covar(fit::LsqFitResult) vcov(fit)
+
 @deprecate standard_errors(args...; kwargs...) stderror(args...; kwargs...)
+
 @deprecate estimate_errors(
     fit::LsqFitResult,
     confidence=0.95;