This is a prototype of a statistical library for Ruby. Starting out, the purpose of the library is to be readable (for people studying statistics), to be well-tested (against R and Python statistical functions), and to be useful for Small Data. Big Data can come later, if I have enough fun. With stats
, I aim to create an API that makes statistics intuitive and harder to mess up. For example, I'd like to take a stab at an assumption framework that can tag specific functions with assumptions that will throw warnings if they're not met.
Once this is stable and fully tested (it is so far for all the functions listed below), I'll consider publishing it as a gem. Until then, you can play around with master
:
brew install gsl
git clone https://github.com/davejacobs/stats.git
cd stats
bundle
I've started integrating R into my tests to make testing as easy and repeatable as possible. I'm also planning to incorporate something like Randly to expand the values that I test.
To run tests:
brew install homebrew/science/r
rspec
- Get Ruby GSL bindings (
gem install gsl
) to work on Ruby 2.0/OS X - Implement gemspec so this is installable via git URL
I've added a wrapper around GSL distribution functions, for more intuitive access and testing.
- Normal distribution - PDF & CDF
- Chi square distribution - PDF & CDF
- T distribution - PDF & CDF
- F distribution - PDF & CDF
- Mean, arithmetic
- Mean, geometric
- Median
- Mode
- Variance
- Standard deviation
- Standard error of the mean (for samples only)
- Relative standard error of the mean (for samples only)
- Coefficient of variation
- Chi square
- T-test, single sample
- T-test, two-sample
- T-test, repeated measures
- Wilcoxon rank sum test
- Wilcoxon signed rank test
- Median test
- Kruskall-Wallis H test
- Friedman test
- ANOVA, one-way
- Factorial ANOVA, two-way
- Factorial ANOVA, three-way
- ANOVA, repeated measures
- MANOVA
- ANCOVA
- Welch's ANOVA
- Fisher's least significant difference
- Linear regression
- Multiple linear regression
- Pearson's correlation
- Spearman correlation
- Basic assumption framework
- Confidence intervals (general idea)
- Basic data structures
- Significance methods on data structures
- Test using R integration and something like Rantly
- How to choose the right statistical test
- Wilkinson's Statistics Quiz (RTF)
- Assessing the reliability of statistical software