Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

posterior fraction > 1? #14

Closed
bramvrancken opened this issue Feb 13, 2015 · 5 comments
Closed

posterior fraction > 1? #14

bramvrancken opened this issue Feb 13, 2015 · 5 comments

Comments

@bramvrancken
Copy link

Dear Dr. Zagordi,

I've been trying to apply Shorah to a pretty homogenous population of rabies virus (sequence length ~11kb). The data are from the Illumina Myseq patfom. it seems the analysis starts well. however, I get numerous warnings the frequency of haplotypes exceeds 1, and afterwards lots of error messages.
Could you please point me to what error I'm making?
Thanks a lot!

Bram

Frequency error message:

/Applications/_evolutionarySoftware/shorah-master/snv.py:127: UserWarning: posterior = 1.364 > 1
warnings.warn('posterior = %4.3f > 1' % post)

....

next error message:

Traceback (most recent call last):
File "/Applications/_evolutionarySoftware/shorah-master/shorah.py", line 175, in
mm.main('%s_cor.rest' % in_stem, maxhaplo=200)
File "/Applications/_evolutionarySoftware/shorah-master/mm.py", line 713, in main
totalPaths = countPaths(graph, source, sink)
File "/Applications/_evolutionarySoftware/shorah-master/mm.py", line 630, in countPaths
getPaths(source)
File "/Applications/_evolutionarySoftware/shorah-master/mm.py", line 627, in getPaths
numPaths[node] = sum(map(getPaths, graph[node]))
....

and finally the crash message:

RuntimeError: maximum recursion depth exceeded in cmp

@ozagordi
Copy link
Collaborator

Hi, sorry for the late reply.
The first error is not exactly a bug, rather a problem in probabilistic
clustering. It could be ignored for posteriors only slightly higher than
one, but in general it signals a poor mixing. The crash message at first
sight seems saying that the program did not find a solution in the maximum
number of iterations. You would need to tell me more of your problem and
maybe provide me with a toy example. First of
all, what window size did you choose? Do you really need global
reconstruction or could you do with local only?

@bramvrancken
Copy link
Author

Hello Osvaldo,

I'm sorry in turn for the late reply.
Some details on the data that are hopefully of diagnostic use: it are paired end MySeq reads, ~145nt long. Through this link you can access the dropbox folder I created, and find the sequence data (Vphaser_input_sorted.bam) and the reference sequence (S3_ntCons.fasta).
The various folders have the name of the options I tested.
link: https://www.dropbox.com/sh/ao3q0iv3neq8vyr/AAD9p2Ii1ZTyT7sN0BTzdeQza?dl=0

Do you really need global reconstruction or could you do with local only
This is a sample with little variation in the population. To illustrate I also included the overview per AA position (*.xls). Despite the risk for in silico recombinants, we prefer global reconstruction -after all, the last few variants are close to each other (within 1 read length).

Again thanks for looking into this.

Sincerely,
Bram

On 17 Feb 2015, at 23:30, Osvaldo Zagordi wrote:

Hi, sorry for the late reply.
The first error is not exactly a bug, rather a problem in probabilistic
clustering. It could be ignored for posteriors only slightly higher than
one, but in general it signals a poor mixing. The crash message at first
sight seems saying that the program did not find a solution in the maximum
number of iterations. You would need to tell me more of your problem and
maybe provide me with a toy example. First of
all, what window size did you choose? Do you really need global
reconstruction or could you do with local only?

On 13 February 2015 at 22:37, bramvrancken <[email protected]mailto:[email protected]> wrote:

Dear Dr. Zagordi,

I've been trying to apply Shorah to a pretty homogenous population of
rabies virus (sequence length ~11kb). The data are from the Illumina Myseq
patfom. it seems the analysis starts well. however, I get numerous warnings
the frequency of haplotypes exceeds 1, and afterwards lots of error
messages.
Could you please point me to what error I'm making?
Thanks a lot!
Bram Frequency error message:

/Applications/_evolutionarySoftware/shorah-master/snv.py:127: UserWarning:
posterior = 1.364 > 1
warnings.warn('posterior = %4.3f > 1' % post)
.... next error message:

Traceback (most recent call last):
File "/Applications/_evolutionarySoftware/shorah-master/shorah.py", line
175, in
mm.main('%s_cor.rest' % in_stem, maxhaplo=200)
File "/Applications/_evolutionarySoftware/shorah-master/mm.py", line 713,
in main
totalPaths = countPaths(graph, source, sink)
File "/Applications/_evolutionarySoftware/shorah-master/mm.py", line 630,
in countPaths
getPaths(source)
File "/Applications/_evolutionarySoftware/shorah-master/mm.py", line 627,
in getPaths
numPaths[node] = sum(map(getPaths, graph[node]))
....
and finally the crash message:

RuntimeError: maximum recursion depth exceeded in cmp


Reply to this email directly or view it on GitHub
#14.

Ciao.
Osvaldo


Reply to this email directly or view it on GitHubhttps://github.com//issues/14#issuecomment-74769131.

@ozagordi
Copy link
Collaborator

Bram,
I looked at your alignment with samtools tview Vphaser_input_sorted.bam S3_ntCons.fasta There is hardly any variation.

Then I checked the coverage with samtools depth Vphaser_input_sorted.bam | less, and I realised that you have coverage only little above 100. Shorah is designed to work in heterogeneous samples at high coverage, so I'm really not sure it's the right tool for you. I don't think you can get more than a consensus sequence here.

@bramvrancken
Copy link
Author

Hello Osvaldo,

I understand that the reconstruction is difficult when little variation is present, but was under the impression this has more to do with correctly linking variations separated by more than 1 read length than correctly inferring there is no variation. Is this view of the haplotype reconstruction problem wrong?

Thanks again for having a look at the data.

Best wishes,
Bram

On 23 Feb 2015, at 10:00, Osvaldo Zagordi wrote:

Bram,
I looked at your alignment with samtools tview Vphaser_input_sorted.bam S3_ntCons.fasta There is hardly any variation.

Then I checked the coverage with samtools depth Vphaser_input_sorted.bam | less, and I realised that you have coverage only little above 100. Shorah is designed to work in heterogeneous samples at high coverage, so I'm really not sure it's the right tool for you. I don't think you can get more than a consensus sequence here.


Reply to this email directly or view it on GitHubhttps://github.com//issues/14#issuecomment-75508167.

@ozagordi
Copy link
Collaborator

You refer to having variants far apart and not knowing how to phase them.
In this case in theory each local reconstruction should return one
haplotype only. I saw this in one of your outputs, but you might check more
extensively. Then the global reconstruction should phase them altogether in
a single haplotype, but I don't think I ever tested the behaviour in this
extreme case since this is more correctly done by standard denovo
assemblers and it is not in the scope of the software.

Update: I checked and actually I had already opened an issue to fix the single haplotype/no reads problem issue #6

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants