Test for inference correctness #236

damonbayer · 2024-12-13T15:19:58Z

@SamuelBrand1 I made this issue based on our discussion in the 2024-12-13 developer meeting. Please add more details.

SamuelBrand1 · 2024-12-18T12:21:24Z

The basic idea here is that we can In a forward pass:

Determine/set parameters of the model $\theta^*$.
Generate underlying latent processes conditional on those parameters, $Z_t$.
Generate observable data $y_t$.

In the back pass we run the inference to generate our MCMC chain(s) $\theta^{(i)} \sim p(\theta | y_t)$.

Then we need to decide if the ensemble $\left( \theta^{(i)} \right)_{i=1,\dots}$ of posterior draws is compatible with $\theta^*$, with incompatibility being evidence of a failure in inference.

Context

In EpiAware we do this as part of testing the core LatentDelay concept (e.g. actuals are observed with noise later) across a matrix of way of modelling the actual infections (directly/Renewal using AR/RW etc).

https://github.com/CDCgov/Rt-without-renewal/blob/ae5c564cba6c8ceee5b30a42e903bc72d4d0ac7d/EpiAware/test/EpiObsModels/modifiers/LatentDelay.jl#L165-L270

What we've gone for here is 1) $\theta^*$ is drawn from the priors, 2) we only compare on scalar parameters (e.g. the std of a random walk but not the whole path of a random walk), 3) we chose the threshold for compatibility to be that all the true parameters are within the 99% CI of the inferred posterior distribution. Our view is that real fails of the model tend to lead to wildly different inferred parameters compared to the actuals and this level avoids false fails even when there are a fairly large number of scalar parameters being tested in a model.

A more rigourous but compute intensive approach is simulation based calibration (SBC):

This does what we did above, but instead of thresholding (which is usually a bit arbitrary/heuristic), the user creates an ensemble of posterior draws across multiple inferences where the "true" parameters are drawn from the prior to repeatedly create new datasets to run inference upon. The consistency of Bayesian inference implies that if the posterior draws are correct in distribution then this ensemble of posteriors should recover the priors: and you can check if they do or not.

damonbayer · 2024-12-18T16:02:52Z

@SamuelBrand1 Just wondering what kind of issues (if any) you have caught in EpiAware because of this testing and what kind of fixes they required.

SamuelBrand1 · 2024-12-18T16:43:39Z

You can get identifiability issues which ended up causing fails in practise (in theory you should eventually just have some horrible posterior) between neg binomial dispersion param and std params of latent processes.

For example, if you are saying the underlying infections are some process (not a Renewal) with a delay on it then how much is "wiggliness" in the data reflecting "wiggliness" in the process vs observational noise? That was sufficiently annoying that for CI we ended up using a Poisson link

damonbayer added testing science more info needed labels Dec 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Test for inference correctness #236

Test for inference correctness #236

damonbayer commented Dec 13, 2024

SamuelBrand1 commented Dec 18, 2024 •

edited

Loading

damonbayer commented Dec 18, 2024

SamuelBrand1 commented Dec 18, 2024

Test for inference correctness #236

Test for inference correctness #236

Comments

damonbayer commented Dec 13, 2024

SamuelBrand1 commented Dec 18, 2024 • edited Loading

Context

damonbayer commented Dec 18, 2024

SamuelBrand1 commented Dec 18, 2024

SamuelBrand1 commented Dec 18, 2024 •

edited

Loading