-
Notifications
You must be signed in to change notification settings - Fork 45
Documentation current for HybPiper version 2.3.1
Using a High Performance Computing Cluster (HPCC) will greatly reduce the time required to run HybPiper.
Ensure the necessary dependencies have been installed. Whether the dependencies were installed locally or by a systems administrator, make sure the executables and Python packages are accessible in your $PATH
. One way to check is by running:
hybpiper --check_dependencies
It is a good idea to test this both interactively (on the Head Node) and via a job submission script.
If you intend to run HybPiper on multiple samples, generate a text file that contains the names of all of your samples, one per line. Rename your read files if necessary so they retain a common pattern.
Several users have reported strange behavior of HybPiper on HPCC systems. HybPiper may affect some HPCC systems because it creates a large number of intermediate files (roughly 130 files per gene per sample). Typically, these files are generated by one of the compute nodes, but is stored somewhere else. If HybPiper or one if its dependencies attempts to read a file that has not yet transferred to the storage drive, there will be errors.
One solution is to use Local Scratch Space, or a portion of the disk that is local to the compute node. Files can be generated temporarily, then copied to the user's storage space when HybPiper completes. How to access local scratch space will depend on your HPCC set up. Some systems use an automated $TMPDIR
that will be automatically deleted when the job finishes. In other systems, the user must manually delete files before exiting the job.
Check with your HPCC System Administrator to see how to best utilize local storage space. Some systems may not support it. For example, on some systems a job may be moved to a new node mid-job, and HybPiper would lose access to previously generated files.
This script is written for the Torque/PBS Queuing System. It uses $TMPDIR
to temporarily store the files. HybPiper is run sequentially on the sample names listed in namelist.txt
. The output from one sample is copied to user storage (/home/username/hybpiper_results
) and deleted before the next sample begins. The script requests a node with 12 free processors.
#!/bin/bash
#PBS -l nodes=1:ppn=12
#PBS -j oe
#PBS -q default
#PBS -o hybpiper.out
cd $TMPDIR
while read samplename
do
hybpiper assemble \
-r "/home/username/reads/$samplename_R1.fastq" \
"/home/username/reads/$samplename_R1.fastq" \
-t /home/username/targets.fasta \
--bwa \
--prefix $samplename \
--cpu 12
python /path/to/HybPiper/cleanup.py $samplename
cp -r $samplename /home/username/hybpiper_results/$samplename
rm -r $samplename
done < /home/username/namelist.txt
See Wiki section here.