Skip to content

Parameters

Alex Zarebski edited this page Aug 18, 2022 · 4 revisions

Rate (and Probability) Parameters

It is non-trivial to get rates and probability parameters correct in TimTam. This difficulty is partly due to most of BEAST2 operating in units of years, were we often think of epidemiological parameters in terms of days. For example, consider an infection that takes, on average, 7 days to resolve. This corresponds to an average of \(7/365 ≈ 0.01917\) years. When expressed as the rate of an exponential random variable, this is \(365/7 ≈ 52.14\).

Rates and Probabilities

A main use-case for TimTam is when you have sequence data and a time series of confirmed cases. In this situation, the counts in the time series are modelled as scheduled data[fn:1]. In this case, the tree prior has a birth rate, \(λ\), a death rate, \(μ\), a sampling rate, \(ψ\), and a sampling probability, \(ν\). You can compute the effective reproduction number for this model. However, due to the scheduled observations, the calculation is painful. Luckily, there is a simple approximation that works well provided the time between each count is small.

As in the example above, consider an exponential wait time, \(X\), with rate \(α = 365/7\) (i.e. the amount of time in years when the average time is seven days). Let’s approximate \(X\) with a discrete random variable \(Y\) where \(Y = δ × (N + 1/2)\) and \(N\) is a geometric random variable with probability parameter \(p\). Let \(δ\) be some small amount of time, for example, one day, in which case \(δ = 1/365\). So now we have a discrete random variable \(Y\) which is time in years, at a resolution of days, and a continuous random variable \(X\) which is time in years.

How should we choose the parameter \(p\) so that the distributions of \(X\) and \(Y\) are similar? Suppose we want both distributions to have the same mean value. The mean of \(X\) is \(1/α\) and the mean of \(Y\) is \(δ × ((1-p)/p + 1/2)\). Solving for \(p\) in terms of \(α\) we get \(2δ\alpha / (δ\alpha + 2)\). If we use this with the values of \(α\) and \(δ\) above, the distributions are very close, as shown in the figure below.

In this figure, the line indicates the PDF of the exponential distribution (on a logarithmic scale) and the points are the PMF of the geometric approximation to this distribution. If we consider different resolutions of time, for example letting \(δ\) be one, two, three, etc days, then we get the distributions below.

This shows us that the distributions of an exponential delay and a geometric delay are similar, provided \(δ\) is small-ish relative to \(1/α\). In light of this we suggest the following approximation:

Approximate the effective reproduction number with \(λ / (μ + ψ + \hat{ω})\) where \(\hat{ω} = 2p / δ(2-p)\) and \(p\) is the estimated value of \(ν\).

If the daily sampling probability is small, then this is similar to \(p/δ\), which is the approximation used in the tutorials where it seems to work reasonably.

[fn:1] Scheduled data is when every infected individual is sampled with some probability. See the glossary for further details.

Clone this wiki locally