-
Notifications
You must be signed in to change notification settings - Fork 55
whether to use deduplicated bam #29
Comments
what is the data coverage? WGS or WES or targeted sequencing?
When I ran Msisensor, I found the results are quite different between using the deduplicated bam and not deduplicated bam. I wonder which bam shoud be used , the deduplicated bam or not deduplicated bam .
not_dedeplicated.bam
Total_Number_of_Sites Number_of_Somatic_Sites %
9739 1501 15.41
dedeplicated.bam
Total_Number_of_Sites Number_of_Somatic_Sites %
8798 122 1.39
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub<#29>, or mute the thread<https://github.com/notifications/unsubscribe-auth/AB9s-8n9bqtREHkiLxof0R7RtG-3Ig8jks5u_XFQgaJpZM4ZngYf>.
|
It is WES , the mean coverage is 180 and the dup_ratio is 54.88% |
how did you remove duplicates ? looks like dup ratio is so high. |
I have noticed the same behavior and now routinely msisensor on dedupped BAMs (obtained using |
You should use the deduplicated BAMs. In the end, you can get the correct results only by using the data that you think is the cleanest. |
I am wondering whether bam with marked duplicates is sufficient or I have to export deduplicated reads to a separate bam? Thanks! |
Marking duplicates is not sufficient. There is often a notable difference between using a BAM file with duplicates marked and with duplicates removed. |
Thank you very much for the quick response! In addition, there is a closed issue where people suggested using coverage normalization. I find score slightly changes. But this classifies samples with score around cutoff point 3.5% differently. Do you have any suggestion? Many thanks! |
We suggest : MSI_H: msiscore >= 10%, MSI_L: 3.5% =< msiscore < 10%; MSS: msiscore < 3.5% |
Thank you very much for the great information! Do you suggest coverage normalize for normal and tumor samples? Thanks! |
We din't normalize the TCGA UCEC data ( msiscore: 3.5% ) in MSIsensor original version. You can test with or without normalization option. We suggest that you choose this option when normal and tumor coverage are very different. |
Thank you very much! Can you please specify how you implement coverage normalization and/or how normalization affects the the length distribution / msi calling? This is very important to me because with and without normalization classify my samples to MSI_H and MSI_L respectively. Thanks! |
The difference in the depth of sequencing between tumor tissue and normal tissue will affect the judgment of whether the site is stable. Therefore, we normalize the read distribution so that the area of their distribution is in the same magnitude. The specific practices are as follows: compare the sequencing depth of normal tissues and tumor tissues and correct the sequencing data with a small depth, that is, |
Thank you very much, this is very clear! I plan to extract the coverages of tumor and normal samples at all possible MS loci that are qualified for MSI calling, then see whether I need to adopt "coverage normalization". Do you have a suggestion about what range of coverage difference between normal and tumor is good for using "coverage normalization"? Thanks! |
When I ran Msisensor, I found the results are quite different between using the deduplicated bam and not deduplicated bam. I wonder which bam shoud be used , the deduplicated bam or not deduplicated bam .
not_dedeplicated.bam
dedeplicated.bam
The text was updated successfully, but these errors were encountered: