Provide implementation of Cramer's V. #56

rnowling · 2016-03-29T11:13:49Z

Closes issue #52.

erikerlandson · 2016-03-29T11:52:11Z

we should consider whether math3 libs have cramer's V, or at least chi-sq

rnowling · 2016-03-29T11:59:23Z

math3 does not implement Cramer's V:

https://commons.apache.org/proper/commons-math/javadocs/api-3.6.1/org/apache/commons/math3/stat/correlation/package-summary.html

They have a chi-squared stat but the calculation is not quite the same -- I spent quite a bit of time trying to get Cramer's V to work with the chi-squared calculation more commonly used. Gave up and just used what is on the Wikipedia page for Cramer's V.

https://commons.apache.org/proper/commons-math/javadocs/api-3.6.1/index.html

rnowling · 2016-03-29T12:07:01Z

Build failure was in SplitSampleSpec, not Cramer's V -- rerunning.

erikerlandson · 2016-04-06T22:11:59Z

src/main/scala/com/redhat/et/silex/statistics/cramersv.scala

+    } else if (values1.size == 0 || set1.size == 1 || set2.size == 1) {
+      0.0
+    } else {
+      val pairCounts = values1.zip(values2)


Does this function assume values1.length == values2.length? (if so that should be checked). Note that zip will not error out if the lengths are different.
Similarly, does it assume set1.size == set2.size?

Is the logical shape of the input data ordered pairs?
Like Seq((t1, u1), (t2, u2), ...)

If so, maybe support input modes where you give two sequences (like you already have) but also support input pairs like above

values1 and values2 are assumed to be the same length but set1 and set2 are not -- they are the range over each sampled variable.

I was thinking I could change the public interface to support Seq[(T, U)] to enforce equal lengths but keep a private interface with values1, values2 to make the permutationTest a bit more efficient.

Would you suggest just providing both interfaces and add some error checking code? (e.g., throw an exception if the lengths are unequal)

rnowling · 2016-04-07T15:05:18Z

I updated the code to avoid the usage of the optional RNG -- requires a seed now. I could change that to Option[Long] or Option[Random] if you think that's better.

erikerlandson · 2016-04-07T16:29:14Z

Regarding interface, I think a signature like the following is good
def apply[T, U](data: Iterable[(T,U)], seed: Long = defaultSeedValue)

With Iterable, technically you'll need to check the hasDefiniteSize property to verify it's finite.

rnowling · 2016-04-11T13:30:48Z

Thanks! I'd be happy to make those changes. Do you want me to use apply or cramersV? (Per your earlier review comment?)

erikerlandson · 2016-04-28T23:28:21Z

src/main/scala/com/redhat/et/silex/statistics/cramersv.scala

+   * @param seed (optional) Seed for the Random number generator used to generate permutations
+   * @return p-value giving the probability of getting a lower association value
+   */
+  def permutationTest[T, U](values : Seq[(T, U)], rounds : Int, seed : Long = -1) : Double = {


Suggestion: rename this method to something like pValueEstimate

…t to utils, and add extra guard to Cramer's V.

erikerlandson · 2016-05-02T17:26:22Z

src/main/scala/com/redhat/et/silex/statistics/cramersv.scala

+
+    val pvalue = worseCount.toDouble / rounds.toDouble
+
+    pvalue


Is the null hypothesis for a Cramer's V test "the samples are uncorrelated?" (this would be standard, i think). If so, I think possibly this function ought to return 1.0 - worse/rounds, so that in a perfectly correlated case the returned p-value approaches zero, that is it is rejecting the null hypothesis with a very small p value

erikerlandson · 2016-05-02T20:27:22Z

src/main/scala/com/redhat/et/silex/statistics/cramersv.scala

+
+      val minDim = math.min(set1.size - 1, set1.size - 1).toDouble
+
+      val v = math.sqrt(chi2 / nObs / minDim)


Do you think it's worthwhile to apply this bias correction?
https://en.wikipedia.org/wiki/Cram%C3%A9r's_V#Bias_correction
They make it sound like a good idea but I don't have any experience with it.

erikerlandson · 2016-05-02T20:29:19Z

src/main/scala/com/redhat/et/silex/statistics/cramersv.scala

+  }
+
+  /**
+   * Perform a permutation test to get a p-value indicating the probability of getting


The wikipedia page on Cramer's V metions this:
"The p-value for the significance of V is the same one that is calculated using the Pearson's chi-squared test."

Does this mean p-vals can be computed exactly, or is there a niche for the permutation-based estimator?

rnowling · 2016-05-02T20:47:57Z

Can we push those two to future issues? Also note that the CI failures weren't in the Cramer's V code

erikerlandson · 2016-05-02T20:57:49Z

@willb This LGTM

rnowling added 3 commits March 29, 2016 06:12

Provide implementation of Cramer's V.

7f54e3a

Closes issue #52.

Remove leftover commented-out line

87788ff

Move divide outside of loop. Yay distributive law!

495a247

erikerlandson reviewed Apr 6, 2016
View reviewed changes

rnowling added 3 commits April 7, 2016 09:27

Address Erik's suggestions

0ab8919

Replace redundant counting code with a function

cb60324

Use required seed instead of optional RNG to make Scala 2.11 happy

0ab8ea1

Address some of Erik's comments

c4d7af5

erikerlandson reviewed Apr 28, 2016
View reviewed changes

Update Cramer's V permutation test name, test docs, move cross produc…

c2f0579

…t to utils, and add extra guard to Cramer's V.

erikerlandson reviewed May 2, 2016
View reviewed changes

rnowling added 4 commits May 2, 2016 14:58

Use values instead of zipping

9345c59

Updated tests, should have been in earlier commit

efacb4c

Use Random.nextLong instead of -1 for the default seed

034351c

pValueEstimate should return probability of higher association, not less

3cd0bdd

erikerlandson reviewed May 2, 2016
View reviewed changes

This was referenced May 2, 2016

Investigate exact p-value computation for Cramer's V (#56) #63

Open

Investigate bias correction for Cramer's V (#56) #64

Open

willb closed this in dfb7dba May 6, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Provide implementation of Cramer's V. #56

Provide implementation of Cramer's V. #56

rnowling commented Mar 29, 2016

erikerlandson commented Mar 29, 2016

rnowling commented Mar 29, 2016

rnowling commented Mar 29, 2016

erikerlandson Apr 6, 2016

erikerlandson Apr 6, 2016

rnowling Apr 7, 2016

rnowling commented Apr 7, 2016

erikerlandson commented Apr 7, 2016

rnowling commented Apr 11, 2016

erikerlandson Apr 28, 2016

erikerlandson May 2, 2016

erikerlandson May 2, 2016

erikerlandson May 2, 2016

rnowling commented May 2, 2016

erikerlandson commented May 2, 2016


		val minDim = math.min(set1.size - 1, set1.size - 1).toDouble

		val v = math.sqrt(chi2 / nObs / minDim)

Provide implementation of Cramer's V. #56

Provide implementation of Cramer's V. #56

Conversation

rnowling commented Mar 29, 2016

erikerlandson commented Mar 29, 2016

rnowling commented Mar 29, 2016

rnowling commented Mar 29, 2016

erikerlandson Apr 6, 2016

Choose a reason for hiding this comment

erikerlandson Apr 6, 2016

Choose a reason for hiding this comment

rnowling Apr 7, 2016

Choose a reason for hiding this comment

rnowling commented Apr 7, 2016

erikerlandson commented Apr 7, 2016

rnowling commented Apr 11, 2016

erikerlandson Apr 28, 2016

Choose a reason for hiding this comment

erikerlandson May 2, 2016

Choose a reason for hiding this comment

erikerlandson May 2, 2016

Choose a reason for hiding this comment

erikerlandson May 2, 2016

Choose a reason for hiding this comment

rnowling commented May 2, 2016

erikerlandson commented May 2, 2016