HPC

Documentation current for HybPiper version 2.3.1

Running HybPiper On a Cluster

Using a High Performance Computing Cluster (HPCC) will greatly reduce the time required to run HybPiper.

Before you run HybPiper

Ensure the necessary dependencies have been installed. Whether the dependencies were installed locally or by a systems administrator, make sure the executables and Python packages are accessible in your $PATH. One way to check is by running:

hybpiper --check_dependencies

It is a good idea to test this both interactively (on the Head Node) and via a job submission script.

If you intend to run HybPiper on multiple samples, generate a text file that contains the names of all of your samples, one per line. Rename your read files if necessary so they retain a common pattern.

Local Scratch Space

Several users have reported strange behavior of HybPiper on HPCC systems. HybPiper may affect some HPCC systems because it creates a large number of intermediate files (roughly 130 files per gene per sample). Typically, these files are generated by one of the compute nodes, but is stored somewhere else. If HybPiper or one if its dependencies attempts to read a file that has not yet transferred to the storage drive, there will be errors.

One solution is to use Local Scratch Space, or a portion of the disk that is local to the compute node. Files can be generated temporarily, then copied to the user's storage space when HybPiper completes. How to access local scratch space will depend on your HPCC set up. Some systems use an automated $TMPDIR that will be automatically deleted when the job finishes. In other systems, the user must manually delete files before exiting the job.

Check with your HPCC System Administrator to see how to best utilize local storage space. Some systems may not support it. For example, on some systems a job may be moved to a new node mid-job, and HybPiper would lose access to previously generated files.

Job Submission Scripts

Torque/PBS

This script is written for the Torque/PBS Queuing System. It uses $TMPDIR to temporarily store the files. HybPiper is run sequentially on the sample names listed in namelist.txt. The output from one sample is copied to user storage (/home/username/hybpiper_results) and deleted before the next sample begins. The script requests a node with 12 free processors.

#!/bin/bash

#PBS -l nodes=1:ppn=12
#PBS -j oe
#PBS -q default
#PBS -o hybpiper.out

cd $TMPDIR

while read samplename
do
hybpiper assemble \
   -r "/home/username/reads/$samplename_R1.fastq" \
   "/home/username/reads/$samplename_R1.fastq" \
   -t /home/username/targets.fasta \
   --bwa \
   --prefix $samplename \
   --cpu 12
python /path/to/HybPiper/cleanup.py $samplename
cp -r $samplename /home/username/hybpiper_results/$samplename
rm -r $samplename
done < /home/username/namelist.txt

Compressing sample folders

See Wiki section here.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly