This repository accompanies the paper "Calibration tests in multi-class classification: A unifying framework" by Widmann, Lindsten, and Zachariah, which was presented at NeurIPS 2019.
The folder paper
contains the LaTeX source code of the paper.
The folder experiments
contains the source code and the results of our
experiments.
The folder src
contains common implementations such as the definition of the
generative models, which are used for generating the figures in our paper and
for some experiments.
You can rerun our experiments and recompile our paper. Every folder contains instructions for how to build and run the files therein.
We published software packages for the proposed calibration errors and calibration tests.
- CalibrationErrors.jl and CalibrationErrorsDistributions.jl for estimating calibration errors from data sets of predictions and targets, including general probabilistic predictive models.
- CalibrationTests.jl for statistical hypothesis tests of calibration.
- pycalibration is a Python interface for CalibrationErrors.jl, CalibrationErrorsDistributions.jl, and CalibrationTests.jl.
- rcalibration is an R interface for CalibrationErrors.jl, CalibrationErrorsDistributions.jl, and CalibrationTests.jl.