Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unequal number of reference reads #73

Open
grasshoffm opened this issue May 12, 2022 · 4 comments
Open

Unequal number of reference reads #73

grasshoffm opened this issue May 12, 2022 · 4 comments

Comments

@grasshoffm
Copy link

Hi. I am using VarTrix on the mitochondrial genome and I use all possible variants.

When I check the ref_matrix_coverage.mtx results, the reference reads are different.

For example:
The three variants MT:16043:A:C, MT:16043:A:G and MT:16043:A:T all have the same reference base.
But for one cell, the variants MT:16043:A:C and MT:16043:A:G have 1 read, while MT:16043:A:T has 0 reads.

Since the reference is the same and I am analysing the exact same reads (those covering this position), these values should be the same.

@pmarks
Copy link
Contributor

pmarks commented May 19, 2022

@grasshoffm I'm not sure you necessarily expect the ref read count to be identical in all cases here.

Consider the case of one read that contains a C at that position. For A:C you'd expect 0 ref reads. For A:G, it’s a toss-up whether that read gets counted as ref or alt -- and it looks like the code will default to counting it as a ref base. So that might explain what you're seeing. If you want to dig into details it might be useful to post and IGV screenshot of the locus along with the results you're seeing.

@ifiddes might also have some thoughts.

@ifiddes
Copy link

ifiddes commented May 19, 2022

I agree, I would need to see some screenshots. In the case of an alignment tie, VarTrix calls the read as reference. It only calls a read as alt if the alignment score to the alt allele outscores the reference alignment.

@grasshoffm
Copy link
Author

I looked at the position 16043 and cell AAACCCAAGGAACTAT-1.

For this position, I have a read with an T insertion after the position of interest.
This reads get counted as reference for the A>C and A>G variants, but not for the A>T.
It is also not counted as an alternative read.
Position_16043_Cell_AAACCCAAGGAACTAT-1

I then check a different position (8965).
Here I find the scenario you mentioned. A mutated read is counted as reference, even if it does not support the reference allele.
Position_8965_Cell_AAACCCATCAATCTTC-1

But this isn't true for cell AAACCCATCCTGCTAC-1.
Here mutated read is counted towards the alternative allele, but the number of reference reads is the same for all variants.
Please see the attached excel file.
Position_8965_Cell_AAACCCATCCTGCTAC-1

Here is an excel file with the reads I get from VarTrix and IGV.
example_reads.xlsx

@ifiddes
Copy link

ifiddes commented May 23, 2022

I haven't had time to look closely, but one thing to remember about VarTrix is that it is performing local realignment of each read using Smith-Waterman. As a result, it is possible that what VarTrix is counting is not exactly what you are seeing in IGV.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants