-
Notifications
You must be signed in to change notification settings - Fork 2
/
README
51 lines (48 loc) · 4.65 KB
/
README
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
usage: Collections of command-line functions to perform common pre-processing and analysis functions. [-h]
{getchr,sampfa,exseq,getscaf,seqsize,filsize,subnuc,basrat,genome_ranges,get_homopoly,asstat,shannon,fachrid,faline,revcompseq,splitfa,countkmer,bamcov,pbamrc,bamrc2af,splitbam,mapbp,bam2coords,ppileup,runsyri,syriidx,syri2bed,plthist,plotal,pltbar,asmreads,gfatofa,gfftrans,gffsort,vcfdp,getcol,smprow,xls2csv,gz2bgz,topmsa}
...
positional arguments:
{getchr,sampfa,exseq,getscaf,seqsize,filsize,subnuc,basrat,genome_ranges,get_homopoly,asstat,shannon,fachrid,faline,revcompseq,splitfa,countkmer,bamcov,pbamrc,bamrc2af,splitbam,mapbp,bam2coords,ppileup,runsyri,syriidx,syri2bed,plthist,plotal,pltbar,asmreads,gfatofa,gfftrans,gffsort,vcfdp,getcol,smprow,xls2csv,gz2bgz,topmsa}
getchr FASTA: Get specific chromosomes from the fasta file
sampfa FASTA: Sample random sequences from a fasta file
exseq FASTA: extract sequence from fasta
getscaf FASTA: generate scaffolds from a given genome
seqsize FASTA: get size of dna sequences in a fasta file
filsize FASTA: filter out smaller molecules
subnuc FASTA: Change character (in all sequences) in the fasta file
basrat FASTA: Calculate the ratio of every base in the genome
genome_ranges FASTA: Get a list of genomic ranges of a given size
get_homopoly FASTA: Find homopolymeric regions in a given fasta file
asstat FASTA: Get N50 values for the given list of chromosomes
shannon FASTA: Get Shanon entropy across the length of the chromosomes using sliding windows
fachrid FASTA: Change chromosome IDs
faline FASTA: Convert fasta file from single line to multi line or vice-versa
revcompseq FASTA: Reverse complement specific chromosomes in a fasta file
splitfa FASTA: Split fasta files to individual sequences. Each sequence is saved in a separate file.
countkmer FASTA: Count the number of occurence of a given kmer in a fasta file.
bamcov BAM: Get mean read-depth for chromosomes from a BAM file
pbamrc BAM: Run bam-readcount in a parallel manner by dividing the input bed file.
bamrc2af BAM: Reads the output of pbamrc and a corresponding VCF file and saves the allele frequencies of the ref/alt alleles.
splitbam BAM: Split a BAM files based on TAG value. BAM file must be sorted using the TAG.
mapbp BAM: For a given reference coordinate get the corresponding base and position in the reads/segments mapping the reference position
bam2coords BAM: Convert BAM/SAM file to alignment coords
ppileup BAM: Currently it is slower than just running mpileup on 1 CPU. Might be possible to optimize later. Run samtools mpileup in parallel when pileup
is required for specific positions by dividing the input bed file.
runsyri syri: Parser to align and run syri on two genomes
syriidx syri: Generates index for syri.out. Filters non-SR annotations, then bgzip, then tabix index
syri2bed syri: Converts syri output to bedpe format
plthist Plot: Takes frequency output (like from uniq -c) and generates a histogram plot
plotal Plot: Visualise pairwise-whole genome alignments between multiple genomes
pltbar Plot: Generate barplot. Input: a two column file with first column as features and second column as values
asmreads GFA: For a given genomic region, get reads that constitute the corresponding assembly graph
gfatofa GFA: Convert a gfa file to a fasta file
gfftrans GFF: Get transcriptome (gene sequence) for all genes in a gff file. WARNING: THIS FUNCTION MIGHT HAVE BUGS.
gffsort GFF: Sort a GFF file based on the gene start positions
vcfdp VCF: Get DP and DP4 values from a VCF file.
getcol Table:Select columns from a TSV or CSV file using column names
smprow Table:Select random rows from a text file
xls2csv Table:Convert excel tables to .tsv/.csv
gz2bgz Misc:converts a gz to bgzip
topmsa Misc:Create topological MSA for non-duplicated nodes (genomic regions).
optional arguments:
-h, --help show this help message and exit