-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Counter-intuitive result for power threshold with a single site #37
Comments
Thanks so much Bob. I think I see how the prior on prevalence will tend to pull the prevalence estimate towards 50% when there is little information to go on, which could be due to small sample sizes or high ICC. This will happen any time I try to estimate a prevalence from data, e.g., via the the I think I also see how this applies to the Does that sound right? I wonder if it would be better to set a prior on prevalence which is more like the site frequency spectrum for neutral variants, which will vary between species and populations but in general finds that rare variants are more abundant than common variants. In this case the prior will place much more weight towards lower prevalence, requiring more data to demonstrate that the prevalence of an important variant under selection like HRP2/3 deletions or a resistance mutation is above the threshold. I.e., the burden of proof is higher, which also would increase the robustness of the conclusion if high prevalence is found. Perhaps for HRP2/3 studies where you have something like the recommended number of sites and sample sizes this doesn't matter much because the prior doesn't have much impact. But I wonder if you are interested to find new resistance variants as early as possible, and you may not yet have data from many sites because the variant has not spread very far yet, whether this is worth considering.
Thanks, I've run these too, these are very helpful. I was thinking to complement this with This is a more complex question to try and figure out power for, but resistance variants will be under selection and will increase in frequency, and so I was thinking that a simple approximation for the purpose of a power calculation could be to say what power would I have to detect variants above say a 5% threshold. Perhaps I am bending DRpower a little here though! In any case this has all been extremely helpful, thanks again. |
Yes I agree with your description of what's happening with the I did wonder about setting the prior on prevalence to be more informative, and concentrated at smaller values. In the end I was a bit uncomfortable of doing this because I didn't want to preclude (or strongly discourage) the possibility of high prevalence. There are some places in the world where pfhrp2/3 deletions have been found at very high prevalence, and this is also a changing situation so might increase in the future. Instead, my workaround was to require quite strong evidence of prevalence > 5% when performing a hypothesis test. So in the power analysis method, you need >95% confidence that prevalence is > 5% in order to come to the conclusion that it is indeed above the threshold. One of the nice things about this argument is that, under the prior, 95% of the probability mass is above 5%, meaning under the prior we are right on the cusp of concluding that prevalence is above vs. below the threshold. It's like a priori we're allocating equal probability to being above or below. As more evidence rolls in, it will nudge the distribution either one way or the other. But anyone who has enough statistical expertise to understand the subtleties of the priors and who wants to set a more informative prior is free to do so! I hadn't thought about the case of searching for thousands of variants. I can see that the informative prior argument makes sense here, as you can apply something like a neutral frequency spectrum and then essentially look for outliers. But maybe yours is more like a single site design, i.e. do you care about ICC? Accounting for ICC is only important when you are looking to extrapolate beyond the site to a larger region, and hence you use multiple sites to capture variation across the region. But if you just want to know if there are high frequency mutations in that exact spot then you can run the analysis with ICC = 0. You lose something in terms of generalisability, but I think it's pretty valid as a pilot design to get the lay of the land. Sample sizes come back much smaller. If you wanted to do this, you would have to set
But come to think of it, in this case it is pretty easily solved exactly. We no longer have an ICC prior to sum over, just the Beta prior on prevalence, so the posterior prevalence becomes beta and the power is given by:
For example, if I run |
Thanks again Bob, super helpful and interesting 🙏 |
Thanks so much for this amazing package!
I'm using DRpower to investigate power to detect new resistance variants during the early stages of emergence, when variants may not yet be widely spread and hence only detectable at a single site or small number of sites.
For interest I was looking at power to detect a variant with prevalence 10% above a threshold of 5% at a single site, given different sample sizes, and got a counter-intuitive result where increasing sample size seemed to decrease power:
Is this expected or am I doing something wrong or inappropriate?
The text was updated successfully, but these errors were encountered: