Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

INFO SCORE across multiple samples #24

Open
SABiagini opened this issue Mar 9, 2023 · 2 comments
Open

INFO SCORE across multiple samples #24

SABiagini opened this issue Mar 9, 2023 · 2 comments
Labels
good first issue Good for newcomers

Comments

@SABiagini
Copy link

Hi,

I have imputed a batch of 5000 individuals. As far as I understand, each of them has been independently imputed by QUILT.

However, I have noticed that the INFO SCORE for each variant is obviously only one. Therefore, I would like to know more about how the INFO SCORE is calculated when multiple samples are imputed together.

Also, in case it is a consensus INFO SCORE, I would like to know what its reliability would be for filtering variants across multiple individuals.

Thank you.

@rwdavies
Copy link
Owner

Hey,

So the INFO score being used is defined here
https://www.well.ox.ac.uk/~gav/snptest/#info_measures
Informally, the INFO score captures the non-uniformity of the genotype posteriors. If they are flat (non-informative), it is low, while if they are mostly concentrated in one genotype, it goes closer to 1. It is indeed always a consensus score, it doesn't really make sense for one sample (being entirely derived from the genotype posteriors - you could just use that somehow, e.g. the genotype posterior of the argmax genotype).

It's normally highly reliable for filtering across multiple individuals. See this for some general comments re: INFO score (though note, with QUILT, there could be calibration issues, so I would say it's more reliable for STITCH than for QUILT)
rwdavies/STITCH#75

Let me know if you have more questions,
Robbie

@SABiagini
Copy link
Author

Hi Robbie,

Thanks for clarifying about the INFO SCORE being a consensus when imputing multiple samples together.

Your insight on filtering using INFO SCORE being more reliable for STITCH than QUILT helped me to better understand my observations.

I have 3 imputed samples that I use as proxies for testing filtering strategies, and I also have the high coverage copy for each of them.

In my analyses, I noticed that using only the INFO SCORE filter with a threshold of <0.4 is not effective in improving data quality. Specifically, I found that for all the statistics I tested (sensitivity, accuracy, precision, specificity, among others), the filtered data remained similar to the unfiltered ones. However, increasing the filter threshold to 0.8 improved the statistics, but it would be too aggressive.

Nevertheless, I discovered that combining INFO SCORE with other filters can be useful. At least, this is what I observed in my tests on these 3 separate samples. Dealing with multiple samples, of course, presents a challenge, but that's a different story!

Thanks again.

Best,

S.

@Zilong-Li Zilong-Li added the good first issue Good for newcomers label Aug 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

3 participants