-
Notifications
You must be signed in to change notification settings - Fork 2
collection of command-line functions used to perform multiple small frequently required analysis
License
mnshgl0110/hometools
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
usage: Collections of command-line functions to perform common pre-processing and analysis functions. [-h] {getchr,sampfa,exseq,getscaf,seqsize,filsize,subnuc,basrat,genome_ranges,get_homopoly,asstat,shannon,fachrid,faline,revcompseq,splitfa,countkmer,bamcov,pbamrc,bamrc2af,splitbam,mapbp,bam2coords,ppileup,runsyri,syriidx,syri2bed,plthist,plotal,pltbar,asmreads,gfatofa,gfftrans,gffsort,vcfdp,getcol,smprow,xls2csv,gz2bgz,topmsa} ... positional arguments: {getchr,sampfa,exseq,getscaf,seqsize,filsize,subnuc,basrat,genome_ranges,get_homopoly,asstat,shannon,fachrid,faline,revcompseq,splitfa,countkmer,bamcov,pbamrc,bamrc2af,splitbam,mapbp,bam2coords,ppileup,runsyri,syriidx,syri2bed,plthist,plotal,pltbar,asmreads,gfatofa,gfftrans,gffsort,vcfdp,getcol,smprow,xls2csv,gz2bgz,topmsa} getchr FASTA: Get specific chromosomes from the fasta file sampfa FASTA: Sample random sequences from a fasta file exseq FASTA: extract sequence from fasta getscaf FASTA: generate scaffolds from a given genome seqsize FASTA: get size of dna sequences in a fasta file filsize FASTA: filter out smaller molecules subnuc FASTA: Change character (in all sequences) in the fasta file basrat FASTA: Calculate the ratio of every base in the genome genome_ranges FASTA: Get a list of genomic ranges of a given size get_homopoly FASTA: Find homopolymeric regions in a given fasta file asstat FASTA: Get N50 values for the given list of chromosomes shannon FASTA: Get Shanon entropy across the length of the chromosomes using sliding windows fachrid FASTA: Change chromosome IDs faline FASTA: Convert fasta file from single line to multi line or vice-versa revcompseq FASTA: Reverse complement specific chromosomes in a fasta file splitfa FASTA: Split fasta files to individual sequences. Each sequence is saved in a separate file. countkmer FASTA: Count the number of occurence of a given kmer in a fasta file. bamcov BAM: Get mean read-depth for chromosomes from a BAM file pbamrc BAM: Run bam-readcount in a parallel manner by dividing the input bed file. bamrc2af BAM: Reads the output of pbamrc and a corresponding VCF file and saves the allele frequencies of the ref/alt alleles. splitbam BAM: Split a BAM files based on TAG value. BAM file must be sorted using the TAG. mapbp BAM: For a given reference coordinate get the corresponding base and position in the reads/segments mapping the reference position bam2coords BAM: Convert BAM/SAM file to alignment coords ppileup BAM: Currently it is slower than just running mpileup on 1 CPU. Might be possible to optimize later. Run samtools mpileup in parallel when pileup is required for specific positions by dividing the input bed file. runsyri syri: Parser to align and run syri on two genomes syriidx syri: Generates index for syri.out. Filters non-SR annotations, then bgzip, then tabix index syri2bed syri: Converts syri output to bedpe format plthist Plot: Takes frequency output (like from uniq -c) and generates a histogram plot plotal Plot: Visualise pairwise-whole genome alignments between multiple genomes pltbar Plot: Generate barplot. Input: a two column file with first column as features and second column as values asmreads GFA: For a given genomic region, get reads that constitute the corresponding assembly graph gfatofa GFA: Convert a gfa file to a fasta file gfftrans GFF: Get transcriptome (gene sequence) for all genes in a gff file. WARNING: THIS FUNCTION MIGHT HAVE BUGS. gffsort GFF: Sort a GFF file based on the gene start positions vcfdp VCF: Get DP and DP4 values from a VCF file. getcol Table:Select columns from a TSV or CSV file using column names smprow Table:Select random rows from a text file xls2csv Table:Convert excel tables to .tsv/.csv gz2bgz Misc:converts a gz to bgzip topmsa Misc:Create topological MSA for non-duplicated nodes (genomic regions). optional arguments: -h, --help show this help message and exit
About
collection of command-line functions used to perform multiple small frequently required analysis
Resources
License
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published