Skip to content

Latest commit

 

History

History
277 lines (183 loc) · 11.4 KB

README.md

File metadata and controls

277 lines (183 loc) · 11.4 KB

COJAC - CoOccurrence adJusted Analysis and Calling

Bioconda package Docker container

Description

The cojac package comprises a set of command-line tools to analyse co-occurrence of mutations on amplicons. It is useful, for example, for finding viral variants of concern in environmental samples, and has been designed to scan for the SARS-CoV-2 variants B.1.1.7 and 501.V2 in wastewater samples, as analyzed jointly by ETH Zurich, EPFL and Eawag.

The analysis requires the whole amplicon to be covered by sequencing read pairs. It currently works at the level of aligned reads, but we plan to be able to adjust confidence scores based on local (window) haplotypes (as generated, e.g., by ShoRAH, doi:10.1186/1471-2105-12-119).

Usage

Here are the available command-line tools:

command purpose
cooc-mutbamscan scan an alignment BAM/CRAM/SAM file for mutation co-occurrences and output a JSON or YAML file
cooc-colourmut display a JSON or YAML file as a coloured output on the terminal
cooc-pubmut render a JSON or YAML file to a table as in the publication

Use option -h / --help to see available command-line options:

cooc-mutbamscan --help
usage: cooc-mutbamscan [-h] (-s TSV | -a BAM/CRAM [BAM/CRAM ...]) [-p PATH] [-j JSON] [-y YAML] [-t TSV] [-d]

scan amplicon (covered by long read pairs) for mutation cooccurrence

optional arguments:
-h, --help show this help message and exit
-s TSV, --samples TSV
V-pipe samples list tsv
-a BAM/CRAM [BAM/CRAM ...], --alignments BAM/CRAM [BAM/CRAM ...]
alignment files
-p PATH, --prefix PATH
V-pipe work directory prefix for where to look at align files when using TSV samples list
-j JSON, --json JSON output results to as JSON file
-y YAML, --yaml YAML output results to as yaml file
-t TSV, --tsv TSV output results to as (raw) tsv file
-d, --dump dump the python object to the terminal
cooc-colourmut --help
usage: cooc-colourmut [-h] (-j JSON | -y YAML)

print coloured pretty table on terminal

optional arguments:
-h, --help show this help message and exit
-j JSON, --json JSON results generated by mutbamscan
-y YAML, --yaml YAML results generated by mutbamscan

see cooc-pubmut for a CSV file that can be imported into an article
cooc-pubmut --help
usage: cooc-pubmut [-h] (-j JSON | -y YAML) [-o CSV] [-e | -x] [-q]

make a pretty table

optional arguments:
-h, --help show this help message and exit
-j JSON, --json JSON results generated by mutbamscan
-y YAML, --yaml YAML results generated by mutbamscan
-o CSV, --output CSV name of (pretty) csv file to save the table into
-e, --escape use escape characters for newlines
-x, --excel use a semi-colon ';' instead of a comma ',' in the comma-separated-files as required by Microsoft Excel
-q, --quiet Run quietly: do not print the table

you need to open the CSV in a spreadsheet that understands linebreaks

Howto

Input data requirements

Analysis needs to be performed on SARS-CoV-2 samples sequenced using ARTIC V3 protocol (which produces ~400bp long amplicons), and sequenced with read settings that covers the totality of an amplicon (e.g.: paired end sequencing with read length 250). NOTE: this analysis method cannot work on read length much shorter than the amplicons (e.g.: it will not give reliable results for a read-length of 50).

Collect the co-occurrence data

There are currently two modes to collect the data about co-occurring mutations in reads: analysing stand-alone BAM/CRAM/SAM alignment files, or analysing the output of a cohort analysed with V-pipe (doi:10.1093/bioinformatics/btab015).

Standalone files

Provide a list of BAM files using the -a / --alignment option. Run:

cooc-mutbamscan -a sam1.bam sam2.bam -j cooc-test.json

Note: you can also use the -y / --yaml option to write to a YAML file instead of a JSON.

Analyzing a cohort with V-pipe

You can learn how to analyse fastq.gz files with V-pipe with this tutorial:

Run:

cooc-mutbamscan -t work/samples.tsv -p work/samples/ -j cooc-test.json

Display data on terminal

The default -d / --dump option of cooc-mutbamscan is not a very user-friendly experience to display the data. You can instead pass a JSON or YAML file to the display script. Run:

cooc-colourmut -j cooc-test.json

terminal screen shot

Render table for publication

And now, let’s go beyond our terminal and produce a table that can be included in a publication (see bibliography below for concrete example). Run:

cooc-pubmut -j cooc-test.json -o cooc-output.tsv

Note:

  • you can also output to comma-separated table (-o cooc-output.csv)
  • Microsoft Excel requires using option -x/--excel (using semi-colon instead of comma in comma-separated-value files). Some versions can also open TSV (but not the Office 365 web app).

You need to open the table with a spread-sheet that can understand line breaks, such as LibreOffice Calc, Google Docs Spreadsheet or, using special options (see above), Microsoft Excel.

72_UK 78_UK 92_UK 93_UK 76_SA 77_EU
sam1.bam 158 / 809
19.53%
2 / 452
0.44%
89 / 400
22.25%
344 / 758
45.38%
0 / 1090
0.00%
371 / 371
100.00%
sam2.bam 0 / 1121
0.00%
0 / 255
0.00%
58 / 432
13.43%
142 / 958
14.82%
0 / 1005
0.00%
1615 / 1615
100.00%

It is also possible to use the software pandoc to further convert the CSV to other formats. Run:

cooc-pubmut -j cooc-test.json -o cooc-output.csv
pandoc cooc-output.csv -o cooc-output.pdf
pandoc cooc-output.csv -o cooc-output.html
pandoc cooc-output.csv -o cooc-output.md

Installation

We recommend using bioconda software repositories for easy installation. You can find instruction to setup your bioconda environment at the following address:

In those instructions, please follow carefully the section 2. Set up channels.

If you use V-pipe’s quick_install.sh, it will set up an environment that you can activate, e.g.:

bash quick_install.sh -b sars-cov2 -p testing -w work
. ./testing/miniconda3/bin/activate

Prebuilt package

cojac and its dependencies are all available in the bioconda repository. We strongly advise you to install this pre-built package for a hassle-free experience.

You can install cojac in its own environment and activate it:

conda create -n cojac cojac
conda activate cojac
# test it
cooc-mutbamscan --help

And to update it to the latest version, run:

# activate the environment if not already active:
conda activate cojac
conda update cojac

Or you can add it to the current environment (e.g.: in environment base):

conda install cojac

Dependencies

If you want to install the software yourself, you can see the list of dependencies in conda_cojac_env.yaml.

We recommend using conda to install them:

conda env create -f conda_cojac_env.yaml
conda activate cojac
# now run from the cojac directory
./cooc-mutbamscan --help

cojac itself doesn't have a specific installer but you can copy its executables in your PATH (so you can call them without specifying their location), e.g.: into the conda environment:

# activate the environment if not already active:
conda activate cojac
cp cooc-* ${CONDA_PREFIX}/bin/
cooc-mutbamscan --help

Remove conda environment

You can remove the conda environment if you don't need it any more:

# exit the cojac environment first:
conda deactivate
conda env remove -n cojac

Additional notebooks

The subdirectory notebooks/ contains Jupyter and Rstudio notebooks used in the publication.

Upcoming features

  • bioconda package
  • further jupyter and rstudio code from the publication
  • Move hard-coded amplicons to BED input file
  • Move hard-coded mutations to YAML configuration

Long term goal:

  • Integration with ShoRAH amplicons

Contributions

Package developers:

Additional notebooks:

Corresponding author:

Citation

If you use this software in your research, please cite:

  • Katharina Jahn, David Dreifuss, Ivan Topolsky, Anina Kull, Pravin Ganesanandamoorthy, Xavier Fernandez-Cassi, Carola Bänziger, Elyse Stachler, Lara Fuhrmann, Kim Philipp Jablonski, Chaoran Chen, Catharine Aquino, Tanja Stadler, Christoph Ort, Tamar Kohn, Timothy R. Julian, Niko Beerenwinkel

    "Detection of SARS-CoV-2 variants in Switzerland by genomic analysis of wastewater samples."

    medRxiv 2021.01.08.21249379; doi:10.1101/2021.01.08.21249379

Contacts

If you experience problems running the software: