GitHub - mnshgl0110/hometools: collection of command-line functions used to perform multiple small frequently required analysis

mnshgl0110 / hometools Public

Notifications You must be signed in to change notification settings
Fork 2
Star 6

collection of command-line functions used to perform multiple small frequently required analysis

Notifications

Name		Name	Last commit message	Last commit date
Latest commit History 141 Commits
bin		bin
docs/source		docs/source
hometools		hometools
test		test
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README		README
documentation.md		documentation.md
environment.yml		environment.yml
make.bat		make.bat
requirements.txt		requirements.txt
setup.py		setup.py

Repository files navigation

usage: Collections of command-line functions to perform common pre-processing and analysis functions. [-h]
                                                                                                      {getchr,sampfa,exseq,getscaf,seqsize,filsize,subnuc,basrat,genome_ranges,get_homopoly,asstat,shannon,fachrid,faline,revcompseq,splitfa,countkmer,bamcov,pbamrc,bamrc2af,splitbam,mapbp,bam2coords,ppileup,runsyri,syriidx,syri2bed,plthist,plotal,pltbar,asmreads,gfatofa,gfftrans,gffsort,vcfdp,getcol,smprow,xls2csv,gz2bgz,topmsa}
                                                                                                      ...

positional arguments:
  {getchr,sampfa,exseq,getscaf,seqsize,filsize,subnuc,basrat,genome_ranges,get_homopoly,asstat,shannon,fachrid,faline,revcompseq,splitfa,countkmer,bamcov,pbamrc,bamrc2af,splitbam,mapbp,bam2coords,ppileup,runsyri,syriidx,syri2bed,plthist,plotal,pltbar,asmreads,gfatofa,gfftrans,gffsort,vcfdp,getcol,smprow,xls2csv,gz2bgz,topmsa}
    getchr              FASTA: Get specific chromosomes from the fasta file
    sampfa              FASTA: Sample random sequences from a fasta file
    exseq               FASTA: extract sequence from fasta
    getscaf             FASTA: generate scaffolds from a given genome
    seqsize             FASTA: get size of dna sequences in a fasta file
    filsize             FASTA: filter out smaller molecules
    subnuc              FASTA: Change character (in all sequences) in the fasta file
    basrat              FASTA: Calculate the ratio of every base in the genome
    genome_ranges       FASTA: Get a list of genomic ranges of a given size
    get_homopoly        FASTA: Find homopolymeric regions in a given fasta file
    asstat              FASTA: Get N50 values for the given list of chromosomes
    shannon             FASTA: Get Shanon entropy across the length of the chromosomes using sliding windows
    fachrid             FASTA: Change chromosome IDs
    faline              FASTA: Convert fasta file from single line to multi line or vice-versa
    revcompseq          FASTA: Reverse complement specific chromosomes in a fasta file
    splitfa             FASTA: Split fasta files to individual sequences. Each sequence is saved in a separate file.
    countkmer           FASTA: Count the number of occurence of a given kmer in a fasta file.
    bamcov              BAM: Get mean read-depth for chromosomes from a BAM file
    pbamrc              BAM: Run bam-readcount in a parallel manner by dividing the input bed file.
    bamrc2af            BAM: Reads the output of pbamrc and a corresponding VCF file and saves the allele frequencies of the ref/alt alleles.
    splitbam            BAM: Split a BAM files based on TAG value. BAM file must be sorted using the TAG.
    mapbp               BAM: For a given reference coordinate get the corresponding base and position in the reads/segments mapping the reference position
    bam2coords          BAM: Convert BAM/SAM file to alignment coords
    ppileup             BAM: Currently it is slower than just running mpileup on 1 CPU. Might be possible to optimize later. Run samtools mpileup in parallel when pileup
                        is required for specific positions by dividing the input bed file.
    runsyri             syri: Parser to align and run syri on two genomes
    syriidx             syri: Generates index for syri.out. Filters non-SR annotations, then bgzip, then tabix index
    syri2bed            syri: Converts syri output to bedpe format
    plthist             Plot: Takes frequency output (like from uniq -c) and generates a histogram plot
    plotal              Plot: Visualise pairwise-whole genome alignments between multiple genomes
    pltbar              Plot: Generate barplot. Input: a two column file with first column as features and second column as values
    asmreads            GFA: For a given genomic region, get reads that constitute the corresponding assembly graph
    gfatofa             GFA: Convert a gfa file to a fasta file
    gfftrans            GFF: Get transcriptome (gene sequence) for all genes in a gff file. WARNING: THIS FUNCTION MIGHT HAVE BUGS.
    gffsort             GFF: Sort a GFF file based on the gene start positions
    vcfdp               VCF: Get DP and DP4 values from a VCF file.
    getcol              Table:Select columns from a TSV or CSV file using column names
    smprow              Table:Select random rows from a text file
    xls2csv             Table:Convert excel tables to .tsv/.csv
    gz2bgz              Misc:converts a gz to bgzip
    topmsa              Misc:Create topological MSA for non-duplicated nodes (genomic regions).


optional arguments:
  -h, --help            show this help message and exit