Some components that speed up and reduce resource cost for original ALLHiC.
- pysam
- numpy
- matplotlib
- jcvi
- h5py
git clone https://github.com/sc-zhang/ALLHiC_components.git
cd ALLHiC_components
chmod +x bin/*.*
# install ALLHiC_prune
cd src/
make && make install
ALLHiC_prune is used for prunning singals between allelic chromosomes, which was rewritten for speedup and mem reduce.
************************************************************************
Usage: ./ALLHiC_prune -i Allele.ctg.table -b sorted.bam
-h : help and usage.
-i : Allele.ctg.table
-b : sorted.bam
************************************************************************
partition_gmap.py is used for spliting bam and contig level fasta by chromosomes with allele table.
usage: partition_gmap.py [-h] -r REF -g ALLELETABLE [-b BAM] [-d WORKDIR]
[-t THREAD]
optional arguments:
-h, --help show this help message and exit
-r REF, --ref REF reference contig level assembly
-g ALLELETABLE, --alleletable ALLELETABLE
Allele.gene.table
-b BAM, --bam BAM bam file, default: prunning.bam
-d WORKDIR, --workdir WORKDIR
work directory, default: wrk_dir
-t THREAD, --thread THREAD
threads, default: 10
ALLHiC_partition.py is an experimental script for clustering contigs into haplotypes.
usage: ALLHiC_partition.py [-h] -r REF -b BAM -d BED -a ANCHORS -p POLY
[-e EXCLUDE] [-o OUT]
optional arguments:
-h, --help show this help message and exit
-r REF, --ref REF Contig level assembly fasta
-b BAM, --bam BAM Prunned bam file
-d BED, --bed BED dup.bed
-a ANCHORS, --anchors ANCHORS
anchors file with dup.mono.anchors
-p POLY, --poly POLY Ploid count of polyploid
-e EXCLUDE, --exclude EXCLUDE
A list file contains exclude contigs for partition,
default=""
-o OUT, --out OUT Output directory, default=workdir
ALLHiC_rescue.py is a new version of rescue use jcvi to prevent the collinear contigs be rescued to same group.
usage: ALLHiC_rescue.py [-h] -r REF -b BAM -c CLUSTER -n COUNTS -g GFF3 -j
JCVI [-e EXCLUDE] [-w WORKDIR]
optional arguments:
-h, --help show this help message and exit
-r REF, --ref REF Contig level assembly fasta
-b BAM, --bam BAM Unprunned bam
-c CLUSTER, --cluster CLUSTER
Cluster file of contigs
-n COUNTS, --counts COUNTS
count REs file
-g GFF3, --gff3 GFF3 Gff3 file generated by gmap cds to contigs
-j JCVI, --jcvi JCVI CDS file for jcvi, bed file with same prefix must
exist in the same position
-e EXCLUDE, --exclude EXCLUDE
cluster which need no rescue, default="", split by
comma
-w WORKDIR, --workdir WORKDIR
Work directory, default=wrkdir
ALLHiC_plot.py is used to plot heatmap of Hi-C singal, and compare with original version, it can reduce the usage of memory, and easier plot heatmap with other resolution.
# Notice: bam file must be indexed
usage: ALLHiC_plot.py [-h] -b BAM -l LIST [-a AGP] [-5 H5] [-m MIN_SIZE] [-s SIZE] [-c CMAP] [-o OUTDIR] [--line | --block] [--linecolor LINECOLOR] [-t THREAD]
options:
-h, --help show this help message and exit
-b BAM, --bam BAM Input bam file
-l LIST, --list LIST Chromosome list, contain: ID Length
-a AGP, --agp AGP Input AGP file, if bam file is a contig-level mapping, agp file is required
-5 H5, --h5 H5 h5 file of hic signal, optional, if not exist, it will be generate after reading hic signals, or it will be loaded for drawing other resolution of heatmap
-m MIN_SIZE, --min_size MIN_SIZE
Minium bin size of heatmap, default=50k
-s SIZE, --size SIZE Bin size of heatmap, can be a list separated by comma, default=500k, notice: it must be n times of min_size (n is integer) or we will adjust it to nearest one
-c CMAP, --cmap CMAP CMAP for drawing heatmap, default="YlOrRd"
-o OUTDIR, --outdir OUTDIR
Output directory, default=workdir
--line Draw dash line for each chromosome
--block Draw dash block for each chromosome
--linecolor LINECOLOR
Color of dash line or dash block, default="grey"
-t THREAD, --thread THREAD
Threads for reading bam, default=1
Other scripts are under development, and not recommend to use.