Skip to content

collection of command-line functions used to perform multiple small frequently required analysis

License

Notifications You must be signed in to change notification settings

mnshgl0110/hometools

Repository files navigation

usage: Collections of command-line functions to perform common pre-processing and analysis functions. [-h]
                                                                                                      {getchr,sampfa,exseq,getscaf,seqsize,filsize,subnuc,basrat,genome_ranges,get_homopoly,asstat,shannon,fachrid,faline,revcompseq,splitfa,countkmer,bamcov,pbamrc,bamrc2af,splitbam,mapbp,bam2coords,ppileup,runsyri,syriidx,syri2bed,plthist,plotal,pltbar,asmreads,gfatofa,gfftrans,gffsort,vcfdp,getcol,smprow,xls2csv,gz2bgz,topmsa}
                                                                                                      ...

positional arguments:
  {getchr,sampfa,exseq,getscaf,seqsize,filsize,subnuc,basrat,genome_ranges,get_homopoly,asstat,shannon,fachrid,faline,revcompseq,splitfa,countkmer,bamcov,pbamrc,bamrc2af,splitbam,mapbp,bam2coords,ppileup,runsyri,syriidx,syri2bed,plthist,plotal,pltbar,asmreads,gfatofa,gfftrans,gffsort,vcfdp,getcol,smprow,xls2csv,gz2bgz,topmsa}
    getchr              FASTA: Get specific chromosomes from the fasta file
    sampfa              FASTA: Sample random sequences from a fasta file
    exseq               FASTA: extract sequence from fasta
    getscaf             FASTA: generate scaffolds from a given genome
    seqsize             FASTA: get size of dna sequences in a fasta file
    filsize             FASTA: filter out smaller molecules
    subnuc              FASTA: Change character (in all sequences) in the fasta file
    basrat              FASTA: Calculate the ratio of every base in the genome
    genome_ranges       FASTA: Get a list of genomic ranges of a given size
    get_homopoly        FASTA: Find homopolymeric regions in a given fasta file
    asstat              FASTA: Get N50 values for the given list of chromosomes
    shannon             FASTA: Get Shanon entropy across the length of the chromosomes using sliding windows
    fachrid             FASTA: Change chromosome IDs
    faline              FASTA: Convert fasta file from single line to multi line or vice-versa
    revcompseq          FASTA: Reverse complement specific chromosomes in a fasta file
    splitfa             FASTA: Split fasta files to individual sequences. Each sequence is saved in a separate file.
    countkmer           FASTA: Count the number of occurence of a given kmer in a fasta file.
    bamcov              BAM: Get mean read-depth for chromosomes from a BAM file
    pbamrc              BAM: Run bam-readcount in a parallel manner by dividing the input bed file.
    bamrc2af            BAM: Reads the output of pbamrc and a corresponding VCF file and saves the allele frequencies of the ref/alt alleles.
    splitbam            BAM: Split a BAM files based on TAG value. BAM file must be sorted using the TAG.
    mapbp               BAM: For a given reference coordinate get the corresponding base and position in the reads/segments mapping the reference position
    bam2coords          BAM: Convert BAM/SAM file to alignment coords
    ppileup             BAM: Currently it is slower than just running mpileup on 1 CPU. Might be possible to optimize later. Run samtools mpileup in parallel when pileup
                        is required for specific positions by dividing the input bed file.
    runsyri             syri: Parser to align and run syri on two genomes
    syriidx             syri: Generates index for syri.out. Filters non-SR annotations, then bgzip, then tabix index
    syri2bed            syri: Converts syri output to bedpe format
    plthist             Plot: Takes frequency output (like from uniq -c) and generates a histogram plot
    plotal              Plot: Visualise pairwise-whole genome alignments between multiple genomes
    pltbar              Plot: Generate barplot. Input: a two column file with first column as features and second column as values
    asmreads            GFA: For a given genomic region, get reads that constitute the corresponding assembly graph
    gfatofa             GFA: Convert a gfa file to a fasta file
    gfftrans            GFF: Get transcriptome (gene sequence) for all genes in a gff file. WARNING: THIS FUNCTION MIGHT HAVE BUGS.
    gffsort             GFF: Sort a GFF file based on the gene start positions
    vcfdp               VCF: Get DP and DP4 values from a VCF file.
    getcol              Table:Select columns from a TSV or CSV file using column names
    smprow              Table:Select random rows from a text file
    xls2csv             Table:Convert excel tables to .tsv/.csv
    gz2bgz              Misc:converts a gz to bgzip
    topmsa              Misc:Create topological MSA for non-duplicated nodes (genomic regions).


optional arguments:
  -h, --help            show this help message and exit

About

collection of command-line functions used to perform multiple small frequently required analysis

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages