posterior fraction > 1? #14

bramvrancken · 2015-02-13T21:37:02Z

Dear Dr. Zagordi,

I've been trying to apply Shorah to a pretty homogenous population of rabies virus (sequence length ~11kb). The data are from the Illumina Myseq patfom. it seems the analysis starts well. however, I get numerous warnings the frequency of haplotypes exceeds 1, and afterwards lots of error messages.
Could you please point me to what error I'm making?
Thanks a lot!

Bram

Frequency error message:

/Applications/_evolutionarySoftware/shorah-master/snv.py:127: UserWarning: posterior = 1.364 > 1
warnings.warn('posterior = %4.3f > 1' % post)

....

next error message:

Traceback (most recent call last):
File "/Applications/_evolutionarySoftware/shorah-master/shorah.py", line 175, in
mm.main('%s_cor.rest' % in_stem, maxhaplo=200)
File "/Applications/_evolutionarySoftware/shorah-master/mm.py", line 713, in main
totalPaths = countPaths(graph, source, sink)
File "/Applications/_evolutionarySoftware/shorah-master/mm.py", line 630, in countPaths
getPaths(source)
File "/Applications/_evolutionarySoftware/shorah-master/mm.py", line 627, in getPaths
numPaths[node] = sum(map(getPaths, graph[node]))
....

and finally the crash message:

RuntimeError: maximum recursion depth exceeded in cmp

ozagordi · 2015-02-17T22:30:26Z

Hi, sorry for the late reply.
The first error is not exactly a bug, rather a problem in probabilistic
clustering. It could be ignored for posteriors only slightly higher than
one, but in general it signals a poor mixing. The crash message at first
sight seems saying that the program did not find a solution in the maximum
number of iterations. You would need to tell me more of your problem and
maybe provide me with a toy example. First of
all, what window size did you choose? Do you really need global
reconstruction or could you do with local only?

bramvrancken · 2015-02-23T08:33:51Z

Hello Osvaldo,

I'm sorry in turn for the late reply.
Some details on the data that are hopefully of diagnostic use: it are paired end MySeq reads, ~145nt long. Through this link you can access the dropbox folder I created, and find the sequence data (Vphaser_input_sorted.bam) and the reference sequence (S3_ntCons.fasta).
The various folders have the name of the options I tested.
link: https://www.dropbox.com/sh/ao3q0iv3neq8vyr/AAD9p2Ii1ZTyT7sN0BTzdeQza?dl=0

Do you really need global reconstruction or could you do with local only
This is a sample with little variation in the population. To illustrate I also included the overview per AA position (*.xls). Despite the risk for in silico recombinants, we prefer global reconstruction -after all, the last few variants are close to each other (within 1 read length).

Again thanks for looking into this.

Sincerely,
Bram

On 17 Feb 2015, at 23:30, Osvaldo Zagordi wrote:

Hi, sorry for the late reply.
The first error is not exactly a bug, rather a problem in probabilistic
clustering. It could be ignored for posteriors only slightly higher than
one, but in general it signals a poor mixing. The crash message at first
sight seems saying that the program did not find a solution in the maximum
number of iterations. You would need to tell me more of your problem and
maybe provide me with a toy example. First of
all, what window size did you choose? Do you really need global
reconstruction or could you do with local only?

On 13 February 2015 at 22:37, bramvrancken <[email protected]mailto:[email protected]> wrote:

Dear Dr. Zagordi,

I've been trying to apply Shorah to a pretty homogenous population of
rabies virus (sequence length ~11kb). The data are from the Illumina Myseq
patfom. it seems the analysis starts well. however, I get numerous warnings
the frequency of haplotypes exceeds 1, and afterwards lots of error
messages.
Could you please point me to what error I'm making?
Thanks a lot!
Bram Frequency error message:

/Applications/_evolutionarySoftware/shorah-master/snv.py:127: UserWarning:
posterior = 1.364 > 1
warnings.warn('posterior = %4.3f > 1' % post)
.... next error message:

Traceback (most recent call last):
File "/Applications/_evolutionarySoftware/shorah-master/shorah.py", line
175, in
mm.main('%s_cor.rest' % in_stem, maxhaplo=200)
File "/Applications/_evolutionarySoftware/shorah-master/mm.py", line 713,
in main
totalPaths = countPaths(graph, source, sink)
File "/Applications/_evolutionarySoftware/shorah-master/mm.py", line 630,
in countPaths
getPaths(source)
File "/Applications/_evolutionarySoftware/shorah-master/mm.py", line 627,
in getPaths
numPaths[node] = sum(map(getPaths, graph[node]))
....
and finally the crash message:

RuntimeError: maximum recursion depth exceeded in cmp

—
Reply to this email directly or view it on GitHub
#14.

Ciao.
Osvaldo

—
Reply to this email directly or view it on GitHubhttps://github.com//issues/14#issuecomment-74769131.

ozagordi · 2015-02-23T09:00:03Z

Bram,
I looked at your alignment with samtools tview Vphaser_input_sorted.bam S3_ntCons.fasta There is hardly any variation.

Then I checked the coverage with samtools depth Vphaser_input_sorted.bam | less, and I realised that you have coverage only little above 100. Shorah is designed to work in heterogeneous samples at high coverage, so I'm really not sure it's the right tool for you. I don't think you can get more than a consensus sequence here.

bramvrancken · 2015-02-23T09:24:57Z

Hello Osvaldo,

I understand that the reconstruction is difficult when little variation is present, but was under the impression this has more to do with correctly linking variations separated by more than 1 read length than correctly inferring there is no variation. Is this view of the haplotype reconstruction problem wrong?

Thanks again for having a look at the data.

Best wishes,
Bram

On 23 Feb 2015, at 10:00, Osvaldo Zagordi wrote:

Bram,
I looked at your alignment with samtools tview Vphaser_input_sorted.bam S3_ntCons.fasta There is hardly any variation.

Then I checked the coverage with samtools depth Vphaser_input_sorted.bam | less, and I realised that you have coverage only little above 100. Shorah is designed to work in heterogeneous samples at high coverage, so I'm really not sure it's the right tool for you. I don't think you can get more than a consensus sequence here.

—
Reply to this email directly or view it on GitHubhttps://github.com//issues/14#issuecomment-75508167.

ozagordi · 2015-02-23T09:39:48Z

You refer to having variants far apart and not knowing how to phase them.
In this case in theory each local reconstruction should return one
haplotype only. I saw this in one of your outputs, but you might check more
extensively. Then the global reconstruction should phase them altogether in
a single haplotype, but I don't think I ever tested the behaviour in this
extreme case since this is more correctly done by standard denovo
assemblers and it is not in the scope of the software.

Update: I checked and actually I had already opened an issue to fix the single haplotype/no reads problem issue #6

ozagordi closed this as completed Apr 17, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

posterior fraction > 1? #14

posterior fraction > 1? #14

bramvrancken commented Feb 13, 2015

ozagordi commented Feb 17, 2015

bramvrancken commented Feb 23, 2015

ozagordi commented Feb 23, 2015

bramvrancken commented Feb 23, 2015

ozagordi commented Feb 23, 2015

posterior fraction > 1? #14

posterior fraction > 1? #14

Comments

bramvrancken commented Feb 13, 2015

Bram

Frequency error message:

....

next error message:

and finally the crash message:

ozagordi commented Feb 17, 2015

bramvrancken commented Feb 23, 2015

ozagordi commented Feb 23, 2015

bramvrancken commented Feb 23, 2015

ozagordi commented Feb 23, 2015