demultiplexing based on primer #31

FabianRoger · 2021-11-05T15:37:07Z

Hi,

I just discovered the tool and used it for de multiplexing of an Illumina MiSeq run, it seemed to work well! (only 6% of reads with unclear barcodes for 91 pooled samples).

On the same run however, I also pooled to differnt primer pairs. The Primer sequences are part of the biological read.

I was wondering if pheniqs could be used for that? I use cutadapt but the PHRED error model would be a nicer way than just allowing for a certain number of mismatches.

edit: for primer search, IUPAC wildcard characters would ideally need to be supported

moonwatcher · 2021-11-10T00:40:49Z

Hi Fabian

I am not sure I understand what you are trying to achieve. Are you trying to remove the primer from the biological read or classify by it?

Maybe you can provide an example?

FabianRoger · 2021-11-10T15:17:55Z

classify by primer (removing would be the next step although it could be done in one).
It is quite common that we pool amplicons generated with different primer pairs onto one run. We usually do not index these separately as the primer sequences serve as indicies.

So after demultiplexing, my files look like this

file_index1_R1:

PRIMER_F_1_READ
PRIMER_F_2_READ
PRIMER_F_1_READ
PRIMER_F_1_READ
PRIMER_F_2_READ

file_index1_R2:

PRIMER_R_1_READ
PRIMER_R_2_READ
PRIMER_R_1_READ
PRIMER_R_1_READ
PRIMER_R_2_READ

what I want is file_index1_PRIMER1_R1, file_index1_PRIMER1_R2 and file_index1_PRIMER2_R1 file_index1_PRIMER2_R2

Right now, we (and most people I assume), use cutadapt for this and it works well. However it doesn't take into account the quality scores but just allows for a certain number of miss-matches (10 % be default). I think that the PAMLD decoder could be a brilliant alternative.

moonwatcher · 2021-11-10T15:30:23Z

So, if I understand this correctly: you have two possible primers and 91 sample barcodes. The "libraries" are characterized by the dot product, so 2x91 libraries.

Pheniqs only allows one sample barcode decoder (since sample barcodes represent read groups and those have their own rules and life cycle) BUT every decoder in pheniqs (sample, cellular or molecular) can have an unlimited number of segments. That means you can treat your sample barcode as a barcode with two segments and encode the full dot product (so you will declare 91x2 barcodes).

for instance

"primer_1_library_1": {
    "barcode": [
        "<your barcode sequence for library 1>",
        "<your first primer sequence>"
    ]
},
"primer_2_library_1": {
    "barcode": [
        "<your barcode sequence for library 1>",
        "<your second primer sequence>"
    ]
},

You will, of course, need to define tokens for locating the primer sequence. If this sounds confusing just upload a sample config file you made and an example read with the location of the primers and the two sequences and I will help you with the config but it sounds doable.

P.S. Since primers are usually longer than barcodes, you can expect posterior decoding to have significantly less false negatives (loosing reads) without risking more false positives (misclassifying reads). From a probabilistic approach telling two long sequences apart with high confidence is possible even with a lot of mismatches. would be interesting though to see what the noise filter thinks about it, but I suspect that won't be an issue either.

P.S.2
Of course pheniqs can easily also remove it from the output, you have absolute control of what comes out in the output.

moonwatcher · 2021-11-10T17:05:36Z

Fabian

Could you please also post your two primers? do they have ambiguous nucleotides? the statistical core that computes the posterior will have to be extended to support those.

FabianRoger · 2021-11-12T11:26:23Z

in my case I use the following primer pairs:

PRIMER 1
Fwrd: GGWTGAACWGTWTAYCCYCC
Rev: TAAACTTCAGGGTGACCAAAAAATCA

PRIMER 2
Fwrd: CGAGAAGACCCTATGGAGCTTA
Rev: AATCGTTGAACAAACGAACC

Note that the first forward primer does indeed contain ambiguous nucleotides (but in my case no Iosine - which could be teh case for other primers)

moonwatcher · 2021-11-12T16:42:14Z

So those primers show up on a fixed offset from some edge of the read segments? Is the only thing you are missing is support for ambiguity codes or do they also shift in position? Do they have indels? in which case this becomes more of an alignment issue...

I think adding support for ambiguity codes in the barcode declaration is possible, the statistical model can definitely apply to it. Basically a W means that both A and T should be considered a positive match.

Have you tried what I suggested? does it work for you other than the ambiguity code? (you can try and do a run with just an A or just a T to see that it otherwise works).

What are you r thoughts?

FabianRoger · 2021-11-15T09:25:38Z

sorry I just saw you suggestion, I had only seen your question about the barcode sequence. I will try it out later this week and come back to you. I am not sure there can be indels in the primer binding site but primer slippage might lead to that (see this paper ).

I'll get back to you!

moonwatcher self-assigned this Dec 17, 2021

moonwatcher added the enhancement label Dec 17, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

demultiplexing based on primer #31

demultiplexing based on primer #31

FabianRoger commented Nov 5, 2021 •

edited

Loading

moonwatcher commented Nov 10, 2021

FabianRoger commented Nov 10, 2021

moonwatcher commented Nov 10, 2021 •

edited

Loading

moonwatcher commented Nov 10, 2021

FabianRoger commented Nov 12, 2021

moonwatcher commented Nov 12, 2021 •

edited

Loading

FabianRoger commented Nov 15, 2021

demultiplexing based on primer #31

demultiplexing based on primer #31

Comments

FabianRoger commented Nov 5, 2021 • edited Loading

moonwatcher commented Nov 10, 2021

FabianRoger commented Nov 10, 2021

moonwatcher commented Nov 10, 2021 • edited Loading

moonwatcher commented Nov 10, 2021

FabianRoger commented Nov 12, 2021

moonwatcher commented Nov 12, 2021 • edited Loading

FabianRoger commented Nov 15, 2021

FabianRoger commented Nov 5, 2021 •

edited

Loading

moonwatcher commented Nov 10, 2021 •

edited

Loading

moonwatcher commented Nov 12, 2021 •

edited

Loading