Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test for inference correctness #236

Open
damonbayer opened this issue Dec 13, 2024 · 3 comments
Open

Test for inference correctness #236

damonbayer opened this issue Dec 13, 2024 · 3 comments

Comments

@damonbayer
Copy link
Collaborator

@SamuelBrand1 I made this issue based on our discussion in the 2024-12-13 developer meeting. Please add more details.

@SamuelBrand1
Copy link
Collaborator

SamuelBrand1 commented Dec 18, 2024

The basic idea here is that we can In a forward pass:

  1. Determine/set parameters of the model $\theta^*$.
  2. Generate underlying latent processes conditional on those parameters, $Z_t$.
  3. Generate observable data $y_t$.

In the back pass we run the inference to generate our MCMC chain(s) $\theta^{(i)} \sim p(\theta | y_t)$.

Then we need to decide if the ensemble $\left( \theta^{(i)} \right)_{i=1,\dots}$ of posterior draws is compatible with $\theta^*$, with incompatibility being evidence of a failure in inference.

Context

In EpiAware we do this as part of testing the core LatentDelay concept (e.g. actuals are observed with noise later) across a matrix of way of modelling the actual infections (directly/Renewal using AR/RW etc).

https://github.com/CDCgov/Rt-without-renewal/blob/ae5c564cba6c8ceee5b30a42e903bc72d4d0ac7d/EpiAware/test/EpiObsModels/modifiers/LatentDelay.jl#L165-L270

What we've gone for here is 1) $\theta^*$ is drawn from the priors, 2) we only compare on scalar parameters (e.g. the std of a random walk but not the whole path of a random walk), 3) we chose the threshold for compatibility to be that all the true parameters are within the 99% CI of the inferred posterior distribution. Our view is that real fails of the model tend to lead to wildly different inferred parameters compared to the actuals and this level avoids false fails even when there are a fairly large number of scalar parameters being tested in a model.

A more rigourous but compute intensive approach is simulation based calibration (SBC):

This does what we did above, but instead of thresholding (which is usually a bit arbitrary/heuristic), the user creates an ensemble of posterior draws across multiple inferences where the "true" parameters are drawn from the prior to repeatedly create new datasets to run inference upon. The consistency of Bayesian inference implies that if the posterior draws are correct in distribution then this ensemble of posteriors should recover the priors: and you can check if they do or not.

@damonbayer
Copy link
Collaborator Author

@SamuelBrand1 Just wondering what kind of issues (if any) you have caught in EpiAware because of this testing and what kind of fixes they required.

@SamuelBrand1
Copy link
Collaborator

You can get identifiability issues which ended up causing fails in practise (in theory you should eventually just have some horrible posterior) between neg binomial dispersion param and std params of latent processes.

For example, if you are saying the underlying infections are some process (not a Renewal) with a delay on it then how much is "wiggliness" in the data reflecting "wiggliness" in the process vs observational noise? That was sufficiently annoying that for CI we ended up using a Poisson link

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants