Skip to content

Latest commit

 

History

History
217 lines (156 loc) · 6.35 KB

README.md

File metadata and controls

217 lines (156 loc) · 6.35 KB

minerva

R package for Maximal Information-Based Nonparametric Exploration computation

Install

  • Latest cran release
install.packages("minerva")
  • Development version
devtools::install_github('rsamantha/minerva')

Usage

  • Basic usage with helper function mine.
library(minerva)

x <- 0:200 / 200
y <- sin(10 * pi * x) + x
mine(x,y, n.cores=1)
  • Compute a single measure from the MINE suite using mine_stat.
    • Available mesures are: mic, mas, mev, mcn, tic, gmic
x <- 0:200 / 200
y <- sin(10 * pi * x) + x
mine_stat(x, y, measure="mic")
  • To compute the mic-r2 measure use the cor R function:
x <- 0:200 / 200
y <- sin(10 * pi * x) + x

r2 <- cor(x, y)
mm <- mine_stat(x, y, measure="mic")
mm - r2**2

## mine(x, y, n.cores=1)[[5]]

Compute statistic on matrices

  • All features in a single matrix (mine_compute_pstat).
  • All possible combination of features between two matrices (mine_compute_cstat).
    • When comparing two matrices the function check for euquality of number of rows of the two matrices. If the matrices have different number of rows then an error is thrown.
x <- matrix(rnorm(1000), ncol=10, nrow=10)
y <- as.matrix(rnorm(1000), ncol=10, nrow=20)

## Compare feature of the same matrix
pstats(x)

## Compare features of matrix x with feature in matrix y
cstats(x, y)

Mictools pipeline

This is inspired to the original implementation by Albanese et al. available in python here: https://github.com/minepy/mictools.

Reading the data from mictool repository

datasaurus <- read.table("https://raw.githubusercontent.com/minepy/mictools/master/examples/datasaurus.txt", 
header=TRUE, row.names=1, as.is=TRUE, stringsAsFactors=FALSE)
datasaurus.m <- t(datasaurus)

Compute null distribution for tic_e

Automatically compute:

  • tic_e null distribution based on permutations.
  • histogram of the distribution with cumulative distribution.
  • Observed values of tic_e for each pair of variable in datasaurus.
  • Observed distribution of tic_e.
  • P-value for each variable pair association.
ticnull <- mictools(datasaurus.m, nperm=10000, seed=1234)

## Get the names of the named list
names(ticnull)
##[1]  "tic"      "nulldist" "obstic"   "obsdist"  "pval"

Null Distribution
ticnull$nulldist
BinStart BinEnd NullCount NullCumSum
0e+00 1e-04 0 1e+05
1e-04 2e-04 0 1e+05
2e-04 3e-04 0 1e+05
3e-04 4e-04 0 1e+05
4e-04 5e-04 0 1e+05
5e-04 6e-04 0 1e+05
... ... .... ....
Observed distribution
ticnull$obsdist
BinStart BinEnd Count CountCum
0e+00 1e-04 0 325
1e-04 2e-04 0 325
2e-04 3e-04 0 325
3e-04 4e-04 0 325
4e-04 5e-04 0 325
5e-04 6e-04 0 325
... ... .... ....

Plot tic_e and pvalue distribution.

hist(ticnull$tic)

hist(ticenull$pval, breaks=50, freq=FALSE)

Use p.adjust.method to use a different pvalue correction method, or use the qvalue package to use Storey's qvalue.

## Correct pvalues using qvalue
qobj <- qvalue(ticnull$pval$pval)

## Add column in the pval data.frame
ticnull$pval$qvalue <- qobj$qvalue
ticnull$pval

Same table as above with the qvalue column added at the end.

pval I1 I2 Var1 Var2 adj.P.Val qvalue
0.5202 1 2 away_x bullseye_x 0.95 1
0.9533 1 3 away_x circle_x 0.99 1
0.0442 1 4 away_x dino_x 0.52 0
0.6219 1 5 away_x dots_x 0.95 1
0.8922 1 6 away_x h_lines_x 0.98 1
0.3972 1 7 away_x high_lines_x 0.91 1
... ... ... ... ... ... ....

Strenght of the association (MIC)

## Use columns of indexes and FDR adjusted pvalue 
micres <- mic_strength(datasaurus.m, ticnull$pval, pval.col=c(6, 2, 3))
TicePval MIC I1 I2
0.0457 0.42 2 15
0.0000 0.63 3 16
0.0196 0.50 5 18
0.0162 0.36 9 22
0.0000 0.63 10 23
0.0000 0.57 13 26
... ... ... ...

Association strength computed based on the qvalue adjusted pvalue

## Use qvalue adjusted pvalue 
micresq <- mic_strength(datasaurus.m, ticnull$pval, pval.col=c("qvalue", "Var1", "Var2"))
TicePval MIC I1 I2
0.0401 0.42 bullseye_x bullseye_y
0.0000 0.63 circle_x circle_y
0.0172 0.50 dots_x dots_y
0.0143 0.36 slant_up_x slant_up_y
0.0000 0.63 star_x star_y
0.0000 0.57 x_shape_x x_shape_y
... ... ... ...

Citing minepy/minerva and mictools

minepy2013 Davide Albanese, Michele Filosi, Roberto Visintainer, Samantha Riccadonna, Giuseppe Jurman and Cesare Furlanello. minerva and minepy:a C engine for the MINE suite and its R, Python and MATLAB wrappers. Bioinformatics (2013) 29(3): 407-408 first published online December 14, 2012
mictools2018 Davide Albanese, Samantha Riccadonna, Claudio Donati, Pietro Franceschi. A practical tool for maximal information coefficient analysis. GigaScience (2018)