Reads dropped during ASE calling? #17

rmagoglia · 2016-07-20T22:56:25Z

There are four categories that reads can fall into during GetGeneASEbyReads.py: ref, alt, no snp, or ambiguous. The sum of all of the counts in these four categories does not equal the number of reads in my input bam file - what's causing this?

petercombs · 2016-11-08T22:36:55Z

Just noticed this issue. I assume you have either figured this out or given up on it, but I would guess what's going on is one or both of the following:

If multiple genes map to the same coordinates, then a given read could in principle count towards both of them. So if you're just taking the sum of each of the four columns, you would get more reads than you started with.
If a read does not map to any gene in the GFF file, then it won't get counted. So if you have a lot of reads not mapping to annotated genes (species specific exons, unannotated lncRNAs, eRNAs, or DNA contamination are a few possibilities that come to mind), then you could end up with fewer reads than you started with.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reads dropped during ASE calling? #17

Reads dropped during ASE calling? #17

rmagoglia commented Jul 20, 2016

petercombs commented Nov 8, 2016

Reads dropped during ASE calling? #17

Reads dropped during ASE calling? #17

Comments

rmagoglia commented Jul 20, 2016

petercombs commented Nov 8, 2016