-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
output interpretation #66
Comments
I believe that the first column represents the start read of that window, then the second column would be the end of the window, and the last one would be the average coverage of it. Regarding the posterior probability, I think that they mention what you say... a 95% is a good cutoff, but I am not sure if the number of reads should be filtered as well... I was having the exact same question regarding that. I also believe the hap0...hap'n' are correlated across the window files, meaning that hap0 in the first window would be the same hap0 in your 7th window, for example... In the end you are doing haplotype reconstruction. Regarding everything else I am not sure... I was having the same question about the reads and estimating the frequency. |
Hello dear users,
Yup, I confirm it : it's window position (should also match the name of the window) and number of reads inside that window.
@ozagordi , one of the devs of ShoRAH had back then started working on some tool to post process the output of local haplotypes. We're also featuring similar filtering feature in a visualization tool I'm working on.
Well, I would say two things:
Depending on the above, three possible interpretations:
(Also, it could be a bug, if your results seem completely bonkers, even more likely if they are differently broken across successive runs. I was thinking I had weeded out the memory corruption bugs, but I might have missed something. In that case, I would be interested in your data if possible)
Nope. No, no, no. Each window is processed independently by a different The only ways to match haplotypes between windows is to compare the sequences in the overlapping parts, and/or look which haplotype in different windows share the same sequences as their contributing reads. I am working currently on a visualisation tool for whichi I explore leveraging the above.
Note that recent version only do local haplotype reconstruction. As an example we're currently relying on a different 3rd party engine - Savage - by default in our our V-pipe, and offer additional alternative engines, such as our Haploclique and we have PhD students working on further solutions.
We're happy you find our software useful. (Some of the original developers have moved onto other jobs but still lurk around this git, I'll definitely transmit it to them) |
Hello, Thank you for such detail response! It is much clearer to me now! Yes, indeed, I am working with a high variable gene-marker regions used for haplotyping and keep track of these haplotypes across different regions and patients, so it does make sense I am getting up to 8 haplotypes in a region. I've noticed some of my regions agree with high number of haplotypes, whereas the ends of the marker are more 'conserved', detecting up to 1 or 2. We were also dealing with some controls were we have only one strain per cultured sample, but up to 3, 4 or 5 haplotypes show up in the support/ folder. I am guessing even though I have one strain, I can end up with polyclonality in my cultured samples, and that could explain it. Thanks for clearing up the hap0, hap1, hap2, I was thinking the same as @dcastaneda5, but it's good to know it is not the same haplotype ID across each window, so I don't have to keep track of it. I was not aware of the V-Pipe that you mention! I was running shoRAH as a stand alone program for haplotyping, but in the future I'll use V-Pipe! and yes, I noticed that global analysis has been discontinued for this software and I realize how hard it is to reconstruct it from scratch, especially if I am dealing with length-polymorphic markers. Thanks for what you guys are doing, and for clearing this up for me! Much appreciated! Kabita Baral |
Dear Shorah developers,
First, I would like to say that you have a great program that is very useful for local haplotype reconstruction, especially if we try to genotype across a larger region while using illumina, and multiple reads are spanning over a gene marker. Thanks for sharing to the community! Nevertheless, I was wondering if you could help me interpret my output. After running Shorah with a malaria sample, I got this in my coverage.txt file:
Does this mean each numeric column represents the coverage of each haplotype that was inferred by shotgun local approach after looking at each window file? If so, would this mean I have 3 haplotypes? I was also looking at the reconstruction haplotype of each set of window reads, and I also noticed how in some cases the posterior probability was 0.99 with as little as 2 reads. I read in one paper that it is recommended to apply a 95% cutoff on shorah posterior? Should I apply something like this on read number as well? I am asking because in the 949-1139 reads haplotype reconstruction there were 8 haplotypes over 95%, but this was not the case in the rest of the windows. Am I missing something important here?
Thanks in advance for such an useful program.
The text was updated successfully, but these errors were encountered: