Skip to content

Available utilities

Manish Goel edited this page Dec 14, 2022 · 1 revision
usage: Collections of command-line functions to perform common pre-processing and analysis functions.
       [-h]
       {getchr,sampfa,exseq,getscaf,seqsize,filsize,subnuc,basrat,genome_ranges,get_homopoly,asstat,shannon,fachrid,faline,bamcov,pbamrc,splitbam,mapbp,bam2coords,ppileup,runsyri,syriidx,plthist,plotal,pltbar,asmreads,gfatofa,gfftrans,gffsort,vcfdp,getcol,smprow}
       ...

positional arguments:
  {getchr,sampfa,exseq,getscaf,seqsize,filsize,subnuc,basrat,genome_ranges,get_homopoly,asstat,shannon,fachrid,faline,bamcov,pbamrc,splitbam,mapbp,bam2coords,ppileup,runsyri,syriidx,plthist,plotal,pltbar,asmreads,gfatofa,gfftrans,gffsort,vcfdp,getcol,smprow}
    getchr              FASTA: Get specific chromosomes from the fasta file
    sampfa              FASTA: Sample random sequences from a fasta file
    exseq               FASTA: extract sequence from fasta
    getscaf             FASTA: generate scaffolds from a given genome
    seqsize             FASTA: get size of dna sequences in a fasta file
    filsize             FASTA: filter out smaller molecules
    subnuc              FASTA: Change character (in all sequences) in the fasta file
    basrat              FASTA: Calculate the ratio of every base in the genome
    genome_ranges       FASTA: Get a list of genomic ranges of a given size
    get_homopoly        FASTA: Find homopolymeric regions in a given fasta file
    asstat              FASTA: Get N50 values for the given list of chromosomes
    shannon             FASTA: Get Shanon entropy across the length of the chromosomes using sliding windows
    fachrid             FASTA: Change chromosome IDs
    faline              FASTA: Convert fasta file from single line to multi line or vice-versa
    bamcov              BAM: Get mean read-depth for chromosomes from a BAM file
    pbamrc              BAM: Run bam-readcount in a parallel manner by dividing the input bed file.
    splitbam            BAM: Split a BAM files based on TAG value. BAM file must be sorted using the TAG.
    mapbp               BAM: For a given reference coordinate get the corresponding base and position in the reads/segments mapping
                        the reference position
    bam2coords          BAM: Convert BAM/SAM file to alignment coords
    ppileup             BAM: Currently it is slower than just running mpileup on 1 CPU. Might be possible to optimize later. Run
                        samtools mpileup in parallel when pileup is required for specific positions by dividing the input bed file.
    runsyri             syri: Parser to align and run syri on two genomes
    syriidx             syri: Generates index for syri.out. Filters non-SR annotations, then bgzip, then tabix index
    plthist             Plot: Takes frequency output (like from uniq -c) and generates a histogram plot
    plotal              Plot: Visualise pairwise-whole genome alignments between multiple genomes
    pltbar              Plot: Generate barplot. Input: a two column file with first column as features and second column as values
    asmreads            GFA: For a given genomic region, get reads that constitute the corresponding assembly graph
    gfatofa             GFA: Convert a gfa file to a fasta file
    gfftrans            GFF: Get transcriptome (gene sequence) for all genes in a gff file. WARNING: THIS FUNCTION MIGHT HAVE BUGS.
    gffsort             GFF: Sort a GFF file based on the gene start positions
    vcfdp               VCF: Get DP and DP4 values from a VCF file.
    getcol              Table:Select columns from a TSV or CSV file using column names
    smprow              Table:Select random rows from a text file

optional arguments:
  -h, --help            show this help message and exit
Clone this wiki locally