Skip to content

Some components that speed up and reduce resource cost for original ALLHiC

Notifications You must be signed in to change notification settings

Toney823/ALLHiC_components

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

43 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Introduction

Some components that speed up and reduce resource cost for original ALLHiC.

Dependencies

  • pysam
  • numpy
  • matplotlib
  • jcvi
  • h5py

Installation

git clone https://github.com/sc-zhang/ALLHiC_components.git
cd ALLHiC_components
chmod +x bin/*.*

# install ALLHiC_prune
cd src/
make && make install

Usage

ALLHiC_prune is used for prunning singals between allelic chromosomes, which was rewritten for speedup and mem reduce.

************************************************************************
    Usage: ./ALLHiC_prune -i Allele.ctg.table -b sorted.bam
      -h : help and usage.
      -i : Allele.ctg.table
      -b : sorted.bam
************************************************************************

partition_gmap.py is used for spliting bam and contig level fasta by chromosomes with allele table.

usage: partition_gmap.py [-h] -r REF -g ALLELETABLE [-b BAM] [-d WORKDIR]
                         [-t THREAD]

optional arguments:
  -h, --help            show this help message and exit
  -r REF, --ref REF     reference contig level assembly
  -g ALLELETABLE, --alleletable ALLELETABLE
                        Allele.gene.table
  -b BAM, --bam BAM     bam file, default: prunning.bam
  -d WORKDIR, --workdir WORKDIR
                        work directory, default: wrk_dir
  -t THREAD, --thread THREAD
                        threads, default: 10

ALLHiC_partition.py is an experimental script for clustering contigs into haplotypes.

usage: ALLHiC_partition.py [-h] -r REF -b BAM -d BED -a ANCHORS -p POLY
                           [-e EXCLUDE] [-o OUT]

optional arguments:
  -h, --help            show this help message and exit
  -r REF, --ref REF     Contig level assembly fasta
  -b BAM, --bam BAM     Prunned bam file
  -d BED, --bed BED     dup.bed
  -a ANCHORS, --anchors ANCHORS
                        anchors file with dup.mono.anchors
  -p POLY, --poly POLY  Ploid count of polyploid
  -e EXCLUDE, --exclude EXCLUDE
                        A list file contains exclude contigs for partition,
                        default=""
  -o OUT, --out OUT     Output directory, default=workdir

ALLHiC_rescue.py is a new version of rescue use jcvi to prevent the collinear contigs be rescued to same group.

usage: ALLHiC_rescue.py [-h] -r REF -b BAM -c CLUSTER -n COUNTS -g GFF3 -j
                        JCVI [-e EXCLUDE] [-w WORKDIR]

optional arguments:
  -h, --help            show this help message and exit
  -r REF, --ref REF     Contig level assembly fasta
  -b BAM, --bam BAM     Unprunned bam
  -c CLUSTER, --cluster CLUSTER
                        Cluster file of contigs
  -n COUNTS, --counts COUNTS
                        count REs file
  -g GFF3, --gff3 GFF3  Gff3 file generated by gmap cds to contigs
  -j JCVI, --jcvi JCVI  CDS file for jcvi, bed file with same prefix must
                        exist in the same position
  -e EXCLUDE, --exclude EXCLUDE
                        cluster which need no rescue, default="", split by
                        comma
  -w WORKDIR, --workdir WORKDIR
                        Work directory, default=wrkdir

ALLHiC_plot.py is used to plot heatmap of Hi-C singal, and compare with original version, it can reduce the usage of memory, and easier plot heatmap with other resolution.

# Notice: bam file must be indexed
usage: ALLHiC_plot.py [-h] -b BAM -l LIST [-a AGP] [-5 H5] [-m MIN_SIZE] [-s SIZE] [-c CMAP] [-o OUTDIR] [--line | --block] [--linecolor LINECOLOR] [-t THREAD]

options:
  -h, --help            show this help message and exit
  -b BAM, --bam BAM     Input bam file
  -l LIST, --list LIST  Chromosome list, contain: ID Length
  -a AGP, --agp AGP     Input AGP file, if bam file is a contig-level mapping, agp file is required
  -5 H5, --h5 H5        h5 file of hic signal, optional, if not exist, it will be generate after reading hic signals, or it will be loaded for drawing other resolution of heatmap
  -m MIN_SIZE, --min_size MIN_SIZE
                        Minium bin size of heatmap, default=50k
  -s SIZE, --size SIZE  Bin size of heatmap, can be a list separated by comma, default=500k, notice: it must be n times of min_size (n is integer) or we will adjust it to nearest one
  -c CMAP, --cmap CMAP  CMAP for drawing heatmap, default="YlOrRd"
  -o OUTDIR, --outdir OUTDIR
                        Output directory, default=workdir
  --line                Draw dash line for each chromosome
  --block               Draw dash block for each chromosome
  --linecolor LINECOLOR
                        Color of dash line or dash block, default="grey"
  -t THREAD, --thread THREAD
                        Threads for reading bam, default=1

Other scripts are under development, and not recommend to use.

About

Some components that speed up and reduce resource cost for original ALLHiC

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 80.3%
  • C++ 12.9%
  • Shell 5.9%
  • Makefile 0.9%