What is ShoRAH?

ShoRAH is an open source project for the analysis of next generation sequencing data. It is designed to analyse genetically heterogeneous samples. Its tools are written in different programming languages and provide error correction, haplotype reconstruction and estimation of the frequency of the different genetic variants present in a mixed sample.

More information here.

The software suite ShoRAH (Short Reads Assembly into Haplotypes) consists of several programs, the most imporant of which are:

Tool	What it does
`amplian.py`	amplicon based analysis
`dec.py`	local error correction based on diri_sampler
`diri_sampler`	Gibbs sampling for error correction via Dirichlet process mixture
`contain`	removal of redundant reads
`mm.py`	maximum matching haplotype construction
`freqEst`	EM algorithm for haplotype frequency
`snv.py`	detects single nucleotide variants, taking strand bias into account
`shorah.py`	wrapper for everything

Citation

If you use shorah, please cite the application note paper Zagordi et al. on BMC Bioinformatics.

General usage

Dependencies

shorah requires the following pieces of software:

Python 2 or Python 3, backward compatibility is provided as some current Linux distributions and OS X systems are still using 2.x as default. The required dependencies are:

a) Biopython, and b) NumPy. These packages can be downloaded using pip or anaconda
Perl, for some scripts
zlib, which is used by the bundled samtools for compressing bam files
pkg-config, for discovering dependencies, which most Unix-like systems include
GNU scientific library, for random number generation

In addition, if you want to bootstrap the git version of shorah instead of using the provided tarballs, you will need the GNU Autotools:

Autoconf 2.69
Automake 1.15
m4, which most Unix-like system include

Installation

We strongly recommend you use one of the versioned tarballs from the releases page. ShoRAH uses Autoconf and Automake, and these tarballs include all necessary scripts and files required for installation, whereas the git tree only contains the bare minimum of files required for bootstrapping.

Further, we strongly recommend you use a virtualenv for python installation that shares the same directory root as where you'd like to install shorah to. Not using a virtualenv means that the python dependencies will not be located in the installation root, which will likely require you to specify PYTHONPATH, making the installation more brittle.

Say for instance, you would like to install shorah to /usr/local/shorah. The first step consists of installing the required python dependencies. Create a virtualenv:

/opt/local/bin/virtualenv-3.6 /usr/local/shorah

where /opt/local/bin/virtualenv-3.6 is the virtualenv command for python 3.6 on MacPorts. Now install the python dependencies:

/usr/local/shorah/bin/pip install Biopython numpy

Now call the configure script from the shorah tarball, taking care to specify the absolute path of the python interpreter (or the relative one if it is in your PATH), as this gets inserted into the shebang line of all python scripts:

./configure --prefix=/usr/local/shorah PYTHON=/usr/local/shorah/bin/python3.6

The configure script finds the dependencies using pkg-config. Once it completes, run:

make -j4

where 4 specifies the number of compilation threads to use. Finally, after compilation, install using:

make install

All the programs should now be located in /usr/local/shorah/bin.

Boostrapping from git

If you opted to clone the git repository instead of downloading a prepared tarball, you will need to bootstrap the configure script:

autoreconf -vif

After this, you can run the configure script as described previously.

Windows users

You can install and run shorah with Cygwin. Please see the relevant paragraph on the documentation page.

Run

The input is a sorted bam file. Analysis can be performed in local or global mode.

Local analysis

The local analysis alone can be run invoking dec.py or amplian.py (program for the amplicon mode). They work by cutting window from the multiple sequence alignment, invoking diri_sampler on the windows and calling snv.py for the SNV calling. See the README file in directory amplicon_test.

Global analysis

The whole global reconstruction consists of the following steps:

error correction (i.e. local haplotype reconstruction);
SNV calling;
removal of redundant reads;
global haplotype reconstruction;
frequency estimation.

These can be run one after the other, or one can invoke shorah.py, that runs the whole process from bam file to frequency estimation and SNV calling.

Coding style

All changes to the C++ code in src/cpp should always be formatted according to the included .clang-format style by doing

clang-format -style=file -i src/cpp/*.[ch]pp

in the root of the repository.

All changes to the python code in src/shorah should always be formatted conforming to the PEP 8 style guide. To this end, we advise to use autopep8.

Name		Name	Last commit message	Last commit date
Latest commit History 185 Commits
build-aux		build-aux
examples		examples
m4		m4
src		src
.clang-format		.clang-format
.gitignore		.gitignore
.travis.yml		.travis.yml
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
Makefile.am		Makefile.am
README.md		README.md
configure.ac		configure.ac
meson.build		meson.build
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

What is ShoRAH?

Citation

General usage

Dependencies

Installation

Boostrapping from git

Windows users

Run

Local analysis

Global analysis

Coding style

About

Releases

Packages

Languages

License

sposadac/shorah

Folders and files

Latest commit

History

Repository files navigation

What is ShoRAH?

Citation

General usage

Dependencies

Installation

Boostrapping from git

Windows users

Run

Local analysis

Global analysis

Coding style

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages