Skip to content

Commit

Permalink
Use StatsAPI instead of StatsBase, define confint and vcov (#231)
Browse files Browse the repository at this point in the history
StatsAPI provides a common namespace for packages providing statistical
functionality. It supersedes StatsBase's use for this purpose.

As part of the migration to using StatsAPI, which is itself
non-functional, I've made two functional changes in the form of
deprecations of custom functions in favor of their existing StatsAPI
equivalents. Namely, `estimate_covar` has been deprecated in favor of
`vcov` and `confidence_interval` has been deprecated in favor of
`confint`. While the former is a simple renaming, the latter also
involves specifying the confidence level rather than the Type I error
rate (1 - alpha instead of alpha).
  • Loading branch information
ararslan authored Oct 8, 2023
1 parent 220a63c commit 93736e2
Show file tree
Hide file tree
Showing 6 changed files with 43 additions and 43 deletions.
4 changes: 2 additions & 2 deletions Project.toml
Original file line number Diff line number Diff line change
Expand Up @@ -8,13 +8,13 @@ ForwardDiff = "f6369f11-7733-5829-9624-2563aa707210"
LinearAlgebra = "37e2e46d-f89d-539d-b4ee-838fcccc9c8e"
NLSolversBase = "d41bc354-129a-5804-8e4c-c37616107c6c"
Printf = "de0858da-6303-5e67-8744-51eddeeeb8d7"
StatsBase = "2913bbd2-ae8a-5f71-8c99-4fb6c76f3a91"
StatsAPI = "82ae8749-77ed-4fe6-ae5f-f523153014b0"

[compat]
Distributions = "0.18, 0.19, 0.20, 0.21, 0.22, 0.23, 0.24, 0.25"
ForwardDiff = "0.10"
NLSolversBase = "7.5"
StatsBase = "0.32, 0.33, 0.34"
StatsAPI = "1"
julia = "1.6"

[extras]
Expand Down
10 changes: 5 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,7 @@ fit_bounds = curve_fit(model, xdata, ydata, p0_bounds, lower=lb, upper=ub)
sigma = stderror(fit)
# to get margin of error and confidence interval of each parameter at 5% significance level:
margin_of_error = margin_error(fit, 0.05)
confidence_inter = confidence_interval(fit, 0.05)
confidence_inter = confint(fit; level=0.95)

# The finite difference method is used above to approximate the Jacobian.
# Alternatively, a function which calculates it exactly can be supplied instead.
Expand Down Expand Up @@ -206,18 +206,18 @@ If no weights are provided for the fits, the variance is estimated from the mean

This returns the product of standard error and critical value of each parameter at `alpha` significance level.

`confidence_interval = confidence_interval(fit, alpha=0.05; atol, rtol)`:
`confidence_interval = confint(fit; level=0.05, atol, rtol)`:

* `fit`: result of curve_fit (a `LsqFitResult` type)
* `alpha`: significance level
* `level`: confidence level
* `atol`: absolute tolerance for negativity check
* `rtol`: relative tolerance for negativity check

This returns confidence interval of each parameter at `alpha` significance level.
This returns confidence interval of each parameter at `level` significance level.

----

`covar = estimate_covar(fit)`:
`covar = vcov(fit)`:

* `fit`: result of curve_fit (a `LsqFitResult` type)
* `covar`: parameter covariance matrix calculated from the Jacobian of the model at the fit point, using the weights (if specified) as the inverse covariance of observations
Expand Down
8 changes: 4 additions & 4 deletions docs/src/getting_started.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,10 +46,10 @@ julia> param = fit.param
2.0735
```

`LsqFit.jl` also provides functions to examinep0 = [0.5, 0.5] the goodness of fit. `estimate_covar(fit)` computes the estimated covariance matrix.
`LsqFit.jl` also provides functions to examinep0 = [0.5, 0.5] the goodness of fit. `vcov(fit)` computes the estimated covariance matrix.

```Julia
julia> cov = estimate_covar(fit)
julia> cov = vcov(fit)
2×2 Array{Float64,2}:
0.000116545 0.000174633
0.000174633 0.00258261
Expand All @@ -64,10 +64,10 @@ julia> se = stderror(fit)
0.0508193
```

To get the confidence interval at 10% significance level, run `confidence_interval(fit, alpha)`, which essentially computes `the estimate parameter value` ± (`standard error` * `critical value from t-distribution`).
To get the confidence interval at 10% significance level, run `confint(fit; level=0.9)`, which essentially computes `the estimate parameter value` ± (`standard error` * `critical value from t-distribution`).

```Julia
julia> confidence_interval = confidence_interval(fit, 0.1)
julia> confidence_interval = confint(fit; level=0.9)
2-element Array{Tuple{Float64,Float64},1}:
(0.992333, 1.02977)
(1.98537, 2.16162)
Expand Down
14 changes: 7 additions & 7 deletions docs/src/tutorial.md
Original file line number Diff line number Diff line change
Expand Up @@ -186,10 +186,10 @@ In `LsqFit.jl`, the covariance matrix calculation uses QR decomposition to [be m
\mathbf{Cov}(\boldsymbol{\gamma}^*) = \hat{\sigma}^2 \mathrm{R}^{-1}(\mathrm{R}^{-1})'
```

`estimate_covar()` computes the covariance matrix of fit:
`vcov()` computes the covariance matrix of fit:

```Julia
julia> cov = estimate_covar(fit)
julia> cov = vcov(fit)
2×2 Array{Float64,2}:
0.000116545 0.000174633
0.000174633 0.00258261
Expand All @@ -213,10 +213,10 @@ julia> margin_of_error = margin_error(fit, 0.1)
0.0902435
```

`confidence_interval()` returns the confidence interval of each parameter at certain significance level, which is essentially the estimate value ± margin of error. To get the confidence interval at 10% significance level, run:
`confint()` returns the confidence interval of each parameter at certain significance level, which is essentially the estimate value ± margin of error. To get the confidence interval at 10% significance level, run:

```Julia
julia> confidence_intervals = confidence_interval(fit, 0.1)
julia> confidence_intervals = confint(fit; level=0.9)
2-element Array{Tuple{Float64,Float64},1}:
(0.976316, 1.01613)
(1.91047, 2.09096)
Expand Down Expand Up @@ -352,7 +352,7 @@ Pass the vector of `1 ./ var(ε)` or the matrix `inv(covar(ε))` as the weight p
```Julia
julia> wt = inv(cov_ε)
julia> fit = curve_fit(m, tdata, ydata, wt, p0)
julia> cov = estimate_covar(fit)
julia> cov = vcov(fit)
```

!!! note
Expand All @@ -372,7 +372,7 @@ Pass the matrix `inv(covar(ε))` as the weight parameter (`wt`) to the function
```Julia
julia> wt = 1 ./ yvar
julia> fit = curve_fit(m, tdata, ydata, wt, p0)
julia> cov = estimate_covar(fit)
julia> cov = vcov(fit)
```

## Estimate the Optimal Weight
Expand All @@ -388,7 +388,7 @@ Unweighted fitting (OLS) will return the residuals we need, since the estimator
julia> fit_OLS = curve_fit(m, tdata, ydata, p0)
julia> wt = 1 ./ fit_OLS.resid
julia> fit_WLS = curve_fit(m, tdata, ydata, wt, p0)
julia> cov = estimate_covar(fit_WLS)
julia> cov = vcov(fit_WLS)
```

## References
Expand Down
12 changes: 6 additions & 6 deletions src/LsqFit.jl
Original file line number Diff line number Diff line change
Expand Up @@ -2,29 +2,29 @@ module LsqFit

export curve_fit,
margin_error,
confidence_interval,
estimate_covar,
make_hessian,
Avv,
# StatsBase reexports
# StatsAPI reexports
dof,
coef,
confint,
nobs,
mse,
rss,
stderror,
weights,
residuals
residuals,
vcov

using Distributions
using LinearAlgebra
using ForwardDiff
using Printf
using StatsAPI

import NLSolversBase:
value, value!, jacobian, jacobian!, value_jacobian!!, OnceDifferentiable
import StatsBase
import StatsBase: coef, dof, nobs, rss, stderror, weights, residuals
using StatsAPI: coef, confint, dof, nobs, rss, stderror, weights, residuals, vcov

import Base.summary

Expand Down
38 changes: 19 additions & 19 deletions src/curve_fit.jl
Original file line number Diff line number Diff line change
Expand Up @@ -7,12 +7,12 @@ struct LsqFitResult{P,R,J,W <: AbstractArray,T}
wt::W
end

StatsBase.coef(lfr::LsqFitResult) = lfr.param
StatsBase.dof(lfr::LsqFitResult) = nobs(lfr) - length(coef(lfr))
StatsBase.nobs(lfr::LsqFitResult) = length(lfr.resid)
StatsBase.rss(lfr::LsqFitResult) = sum(abs2, lfr.resid)
StatsBase.weights(lfr::LsqFitResult) = lfr.wt
StatsBase.residuals(lfr::LsqFitResult) = lfr.resid
StatsAPI.coef(lfr::LsqFitResult) = lfr.param
StatsAPI.dof(lfr::LsqFitResult) = nobs(lfr) - length(coef(lfr))
StatsAPI.nobs(lfr::LsqFitResult) = length(lfr.resid)
StatsAPI.rss(lfr::LsqFitResult) = sum(abs2, lfr.resid)
StatsAPI.weights(lfr::LsqFitResult) = lfr.wt
StatsAPI.residuals(lfr::LsqFitResult) = lfr.resid
mse(lfr::LsqFitResult) = rss(lfr) / dof(lfr)
isconverged(lsr::LsqFitResult) = lsr.converged

Expand Down Expand Up @@ -253,7 +253,7 @@ function curve_fit(
lmfit(f, g, p0, wt; kwargs...)
end

function estimate_covar(fit::LsqFitResult)
function StatsAPI.vcov(fit::LsqFitResult)
# computes covariance matrix of fit parameters
J = fit.jacobian

Expand All @@ -271,12 +271,12 @@ function estimate_covar(fit::LsqFitResult)
return covar
end

function StatsBase.stderror(fit::LsqFitResult; rtol::Real=NaN, atol::Real=0)
function StatsAPI.stderror(fit::LsqFitResult; rtol::Real=NaN, atol::Real=0)
# computes standard error of estimates from
# fit : a LsqFitResult from a curve_fit()
# atol : absolute tolerance for approximate comparisson to 0.0 in negativity check
# rtol : relative tolerance for approximate comparisson to 0.0 in negativity check
covar = estimate_covar(fit)
covar = vcov(fit)
# then the standard errors are given by the sqrt of the diagonal
vars = diag(covar)
vratio = minimum(vars) / maximum(vars)
Expand Down Expand Up @@ -304,24 +304,24 @@ function margin_error(fit::LsqFitResult, alpha=0.05; rtol::Real=NaN, atol::Real=
return std_errors * critical_values
end

function confidence_interval(
fit::LsqFitResult,
alpha=0.05;
rtol::Real=NaN,
atol::Real=0,
)
function StatsAPI.confint(fit::LsqFitResult; level=0.95, rtol::Real=NaN, atol::Real=0)
# computes confidence intervals at alpha significance level from
# fit : a LsqFitResult from a curve_fit()
# alpha : significance level, e.g. alpha=0.05 for 95% confidence
# level : confidence level
# atol : absolute tolerance for approximate comparisson to 0.0 in negativity check
# rtol : relative tolerance for approximate comparisson to 0.0 in negativity check
std_errors = stderror(fit; rtol=rtol, atol=atol)
margin_of_errors = margin_error(fit, alpha; rtol=rtol, atol=atol)
confidence_intervals =
collect(zip(coef(fit) - margin_of_errors, coef(fit) + margin_of_errors))
margin_of_errors = margin_error(fit, 1 - level; rtol=rtol, atol=atol)
return collect(zip(coef(fit) - margin_of_errors, coef(fit) + margin_of_errors))
end

@deprecate(confidence_interval(fit::LsqFitResult, alpha=0.05; rtol::Real=NaN, atol::Real=0),
confint(fit; level=(1 - alpha), rtol=rtol, atol=atol))

@deprecate estimate_covar(fit::LsqFitResult) vcov(fit)

@deprecate standard_errors(args...; kwargs...) stderror(args...; kwargs...)

@deprecate estimate_errors(
fit::LsqFitResult,
confidence=0.95;
Expand Down

0 comments on commit 93736e2

Please sign in to comment.