Skip to content

Latest commit

 

History

History
103 lines (79 loc) · 4.13 KB

README.md

File metadata and controls

103 lines (79 loc) · 4.13 KB

ptmismatch

The goal of ptmismatch (primer-template mismatch) is to find and summarize primer binding sites within a given set of sequences to estimate priming efficiency during a polymerase chain reaction (PCR).

Installation

You can install the development version of ptmismatch from GitHub with:

# install.packages("devtools")
devtools::install_github("medvir/ptmismatch")

Example

To find and list all primer-template matches, ptmismatch contains the function summarize_matches(). Load the package and open the documentation as follows:

library(ptmismatch)

?summarize_matches()

A primer (or pattern) and template (or subject) sequence is required. The primer sequence can be provided as string. IUPAC ambiguity codes are supported (e.g. “R” matches “A” and “G”). “I” are not supported, replace them with “N” instead.

The template sequence(s) can be provided in form of a fasta or fastq file. Ns in the template count as mismatch but all other ambiguity codes do not.

EV_fwd <- "GCTGCGYTGGCGGCC"

EV_sequences_fasta <- system.file("extdata", "Enterovirus_12059.fasta",
                                  package = "ptmismatch")

Search for the pattern:

matches_summary <-
  summarize_matches(pattern = EV_fwd, subject_filepath = EV_sequences_fasta,
                    subject_format = "fasta", max.mismatch = 1, with.indels = TRUE)
#> using Gonnet

matches_summary
#> # A tibble: 20 × 7
#>    seqID       pattern         strand start   end matched         matched_alig…¹
#>    <chr>       <chr>           <chr>  <dbl> <dbl> <chr>           <chr>         
#>  1 GQ865517.1  GCTGCGYTGGCGGCC +        358   371 CTGCGTTGGCGGCC  -CTGCGTTGGCGG…
#>  2 JN542510.1  GCTGCGYTGGCGGCC +        360   374 GCTGCGTTGGCGGCC GCTGCGTTGGCGG…
#>  3 JX514942.1  GCTGCGYTGGCGGCC +        339   353 GCTGCGTTGGCGGCC GCTGCGTTGGCGG…
#>  4 JX393302.1  GCTGCGYTGGCGGCC +        331   344 CTGCGTTGGCGGCC  -CTGCGTTGGCGG…
#>  5 JX961708.1  GCTGCGYTGGCGGCC +        359   372 CTGCGTTGGCGGCC  -CTGCGTTGGCGG…
#>  6 KC344833.1  GCTGCGYTGGCGGCC +        307   321 GCTGCGTTGGCGGCC GCTGCGTTGGCGG…
#>  7 KC785528.1  GCTGCGYTGGCGGCC +        315   329 GCTGCGTTGGCGGCC GCTGCGTTGGCGG…
#>  8 KC785530.1  GCTGCGYTGGCGGCC +        296   310 GCTGCGTTGGCGGCC GCTGCGTTGGCGG…
#>  9 KF990476.1  GCTGCGYTGGCGGCC +        361   375 GCTGCGCTGGCGGCC GCTGCGCTGGCGG…
#> 10 KF312882.1  GCTGCGYTGGCGGCC +        361   375 GCTGCGTTGGCGGCC GCTGCGTTGGCGG…
#> 11 KJ420749.1  GCTGCGYTGGCGGCC +        361   375 GCTGCGTTGGCGGCC GCTGCGTTGGCGG…
#> 12 NC_024073.1 GCTGCGYTGGCGGCC +        361   375 GCTGCGTTGGCGGCC GCTGCGTTGGCGG…
#> 13 KU587555.1  GCTGCGYTGGCGGCC +        377   390 CTGCGTTGGCGGCC  -CTGCGTTGGCGG…
#> 14 NC_029905.1 GCTGCGYTGGCGGCC +        377   390 CTGCGTTGGCGGCC  -CTGCGTTGGCGG…
#> 15 KU355876.1  GCTGCGYTGGCGGCC +        366   380 GCTGCGTTGGCGGCC GCTGCGTTGGCGG…
#> 16 KU355877.1  GCTGCGYTGGCGGCC +        363   377 GCTGCGTTGGCGGCC GCTGCGTTGGCGG…
#> 17 NC_030454.1 GCTGCGYTGGCGGCC +        366   380 GCTGCGTTGGCGGCC GCTGCGTTGGCGG…
#> 18 NC_038306.1 GCTGCGYTGGCGGCC +        361   375 GCTGCGTTGGCGGCC GCTGCGTTGGCGG…
#> 19 NC_038307.1 GCTGCGYTGGCGGCC +        361   375 GCTGCGTTGGCGGCC GCTGCGTTGGCGG…
#> 20 NC_038308.1 GCTGCGYTGGCGGCC +        359   373 GCTGCGTTGGCGGCC GCTGCGTTGGCGG…
#> # … with abbreviated variable name ¹​matched_aligned

The matches found can then be visualised for example as a sequence logo plog using the ggseqlogo package:

ggseqlogo::ggseqlogo(matches_summary$matched_aligned, method = "bits",
                     seq_type = "dna")

In this example it nicely shows that the Y at position 7 of the primer is justified by the occurrence of T and C at that position.

Limitations

The columns mismatches_index and mismatches_n are only included if with.indels = FALSE.