Counter-intuitive result for power threshold with a single site #37

alimanfoo · 2024-07-10T11:38:24Z

Thanks so much for this amazing package!

I'm using DRpower to investigate power to detect new resistance variants during the early stages of emergence, when variants may not yet be widely spread and hence only detectable at a single site or small number of sites.

For interest I was looking at power to detect a variant with prevalence 10% above a threshold of 5% at a single site, given different sample sizes, and got a counter-intuitive result where increasing sample size seemed to decrease power:

Is this expected or am I doing something wrong or inappropriate?

bobverity · 2024-07-10T13:59:18Z

Hi Alistair,

Glad you like the package :)

Yes, this is a known edge case in which the method will produce counterintuitive results. The reason is to do with the way the model operates when it has little information to go on. It took me a while to get my head around, here's my best effort at an explanation.

The default prior on prevalence is Uniform(0,1), which actually puts quite a lot of mass on large values, for example, the prior mean is 50%. This will be overpowered by the data whenever we have a decent amount of information on prevalence. Now think about intra-cluster correlation (ICC). When ICC is high it effectively waters down the amount of information that we have in the data, i.e. the effective sample size decreases. These two things work together - when the ICC is high it effectively up-weights the importance of the prior on the prevalence, which pulls things up towards 50%.

The best way to see this is in the following analysis, where I've forced the priors on both prevalence and ICC to be completely flat to emphasize the effect:

DRpower::plot_joint(n = 1, N = 100,
                    prior_prev_shape1 = 1,
                    prior_prev_shape2 = 1,
                    prior_ICC_shape1 = 1,
                    prior_ICC_shape2 = 1,
                    n_bins = 20)

Here is the plot it produces:

Notice that when ICC is close to zero we converge on the prevalence estimate of 1% from the data (1 in 100). But when ICC is high, the prevalence estimate bleeds into higher values.

This is not a problem when there is enough information in the data to be informative about the ICC (or if we have a strong constraining prior). This is the reason why I only produce sample size tables from 2 or more clusters in the docs, and I caution against using it for a single cluster. For the target sizes for pfhrp2 delection studies, ie. aiming for 10 clusters of 30, there is plenty of evidence to pin down the ICC and so this isn't an issue, but I appreciate people will want to use the package for reasons beyond this initial use case.

Hope this helps?

PS, in your case it might also be good to checkout the get_power_presence() and get_sample_size_presence() functions, in which the aim is to detect at least one copy of a rare variant.

PPS, the other thing that can happen with this sort of simulation-based power analysis and discrete data is that you can get slightly ragged power curves as the model steps from one discrete value to the next, but I don't think that's the main issue that's happening in your case

alimanfoo · 2024-07-10T17:34:12Z

Thanks so much Bob.

I think I see how the prior on prevalence will tend to pull the prevalence estimate towards 50% when there is little information to go on, which could be due to small sample sizes or high ICC. This will happen any time I try to estimate a prevalence from data, e.g., via the the get_prevalence() function.

I think I also see how this applies to the get_power_threshold() function. This function uses the given parameter values for prevalence and ICC to simulate data, then uses get_prevalence() to infer prevalence and see if we would correctly detect the variant above the threshold (as nicely described here). If the get_prevalence() inference step uses the uniform prior on prevalence, its estimates of prevalence will be pulled upwards by the prior. And so if there is less information (e.g., smaller sample size), the inference step will detect the variant above the threshold more often - but this is entirely due to the effect of the prior. This gives the appearance of higher power for lower sample size.

Does that sound right?

I wonder if it would be better to set a prior on prevalence which is more like the site frequency spectrum for neutral variants, which will vary between species and populations but in general finds that rare variants are more abundant than common variants. In this case the prior will place much more weight towards lower prevalence, requiring more data to demonstrate that the prevalence of an important variant under selection like HRP2/3 deletions or a resistance mutation is above the threshold. I.e., the burden of proof is higher, which also would increase the robustness of the conclusion if high prevalence is found.

Perhaps for HRP2/3 studies where you have something like the recommended number of sites and sample sizes this doesn't matter much because the prior doesn't have much impact. But I wonder if you are interested to find new resistance variants as early as possible, and you may not yet have data from many sites because the variant has not spread very far yet, whether this is worth considering.

PS, in your case it might also be good to checkout the get_power_presence() and get_sample_size_presence() functions, in which the aim is to detect at least one copy of a rare variant.

Thanks, I've run these too, these are very helpful.

I was thinking to complement this with get_power_threshold() too however because if you are interested to discover new resistance variants then you may not know a priori what you are looking for. I.e., you may have power to detect the presence of a resistance variant, but it will be one of many thousands or even millions of variants that you detect, and you need additional evidence to narrow down to variants which are resistance candidates.

This is a more complex question to try and figure out power for, but resistance variants will be under selection and will increase in frequency, and so I was thinking that a simple approximation for the purpose of a power calculation could be to say what power would I have to detect variants above say a 5% threshold.

Perhaps I am bending DRpower a little here though!

In any case this has all been extremely helpful, thanks again.

alimanfoo · 2024-07-10T17:57:24Z

Just to add if I try a different prior on the prevalence which looks a bit more like a neutral site frequency spectrum, e.g., Beta(0.3, 0.7):

...then I get the result that increasing sample size increases power:

bobverity · 2024-07-10T21:04:52Z

Yes I agree with your description of what's happening with the get_power_threshold() function.

I did wonder about setting the prior on prevalence to be more informative, and concentrated at smaller values. In the end I was a bit uncomfortable of doing this because I didn't want to preclude (or strongly discourage) the possibility of high prevalence. There are some places in the world where pfhrp2/3 deletions have been found at very high prevalence, and this is also a changing situation so might increase in the future.

Instead, my workaround was to require quite strong evidence of prevalence > 5% when performing a hypothesis test. So in the power analysis method, you need >95% confidence that prevalence is > 5% in order to come to the conclusion that it is indeed above the threshold. One of the nice things about this argument is that, under the prior, 95% of the probability mass is above 5%, meaning under the prior we are right on the cusp of concluding that prevalence is above vs. below the threshold. It's like a priori we're allocating equal probability to being above or below. As more evidence rolls in, it will nudge the distribution either one way or the other.

But anyone who has enough statistical expertise to understand the subtleties of the priors and who wants to set a more informative prior is free to do so! I hadn't thought about the case of searching for thousands of variants. I can see that the informative prior argument makes sense here, as you can apply something like a neutral frequency spectrum and then essentially look for outliers.

But maybe yours is more like a single site design, i.e. do you care about ICC? Accounting for ICC is only important when you are looking to extrapolate beyond the site to a larger region, and hence you use multiple sites to capture variation across the region. But if you just want to know if there are high frequency mutations in that exact spot then you can run the analysis with ICC = 0. You lose something in terms of generalisability, but I think it's pretty valid as a pilot design to get the lay of the land. Sample sizes come back much smaller.

If you wanted to do this, you would have to set ICC and ICC_infer to zero in the simulation function. The first sets the ICC used when simulating data, the second forces the function to assume this known value when estimating prevalence.

DRpower::get_power_threshold(N = 169,
                             prior_prev_shape1 = 1,
                             prior_prev_shape2 = 9,
                             prevalence = 0.1,
                             ICC = 0,
                             ICC_infer = 0,
                             reps = 1e3)

But come to think of it, in this case it is pretty easily solved exactly. We no longer have an ICC prior to sum over, just the Beta prior on prevalence, so the posterior prevalence becomes beta and the power is given by:

# exact power calculation in the case of a single site with zero ICC
get_power_exact <- function(N, prevalence = 0.1, prev_thresh = 0.05, rejection_threshold = 0.95,
                            prior_prev_shape1 = 1, prior_prev_shape2 = 1) {
  x <- 0:N
  prob_above <- pbeta(prev_thresh, shape1 = prior_prev_shape1 + x, shape2 = prior_prev_shape2 + N - x, lower.tail = FALSE)
  sum(dbinom(x, N, prob = prevalence) * (prob_above > rejection_threshold))
}

For example, if I run get_power_exact(169, prior_prev_shape1 = 1, prior_prev_shape2 = 9) then I get 80% power, which about matches the simulations. 169 samples seems like a reasonable number for a pilot study in my view, and potentially could be rolled out over many spatial locations to then focus efforts in a followup study.

alimanfoo · 2024-07-10T22:23:31Z

Thanks again Bob, super helpful and interesting 🙏

alimanfoo mentioned this issue Jul 10, 2024

Counter-intuitive result for power threshold with a single site #38

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Counter-intuitive result for power threshold with a single site #37

Counter-intuitive result for power threshold with a single site #37

alimanfoo commented Jul 10, 2024

bobverity commented Jul 10, 2024

alimanfoo commented Jul 10, 2024 •

edited

Loading

alimanfoo commented Jul 10, 2024

bobverity commented Jul 10, 2024

alimanfoo commented Jul 10, 2024

Counter-intuitive result for power threshold with a single site #37

Counter-intuitive result for power threshold with a single site #37

Comments

alimanfoo commented Jul 10, 2024

bobverity commented Jul 10, 2024

alimanfoo commented Jul 10, 2024 • edited Loading

alimanfoo commented Jul 10, 2024

bobverity commented Jul 10, 2024

alimanfoo commented Jul 10, 2024

alimanfoo commented Jul 10, 2024 •

edited

Loading