Contents
Miscellaneous scripts for bioinformatics that dont merit their own repo. All under MIT License unless otherwise specified.
Abnormal nucleotide frequency tends to throw off normal procedures for estimating evolutionary models. A practical situation is when calculating the Ks values for the grass genes where a significant portion of them are high-GC genes (see details here). In the case of high GC genes, most of the substitutions will be either G or C, therefore the Jukes-Cantor correction will under-estimate the Ks values. The codon models in PAML, on the contrary, tend to over-estimate Ks values. The Ks calculator we want to implement here, ignores the inference of models (where it is difficult anyway, since you have very few sites to estimate the parameters in the model). Instead, we ask this: given biased substitutions, lengths, run simulations and try to fit an evolutionary model based on the simulations.