Covariance-related functions for general AbstractArray #599

yha · 2020-09-13T17:59:19Z

This is mostly changing DenseMatrix in method signatures to AbstractMatrix, and running the tests also on a sparse array and a custom-typed array.
Other changes:

Specialized _symmetrize! for sparse matrices
Fixed bug with zero variances in cor2cov! discovered when testing with sparse matrix
Added mean_and_cov(vector). I didn't add mean_and_cov(vector, weights) since that would need a fix to cov(x, w::AbstractWeights) dispatches on cov(X, Y) fallback #409, which has a long discussion pointing to JuliaLang/Statistics.jl#2

nalimilan

Thanks!

src/cov.jl

nalimilan · 2020-10-02T13:11:47Z

test/cov.jl

@@ -87,27 +100,32 @@ weight_funcs = (weights, aweights, fweights, pweights)
        @testset "Mean and covariance" begin
            (m, C) = mean_and_cov(X; corrected=false)
            @test m == mean(X, dims=1)
-            @test C == cov(X, dims=1, corrected=false)
+            @test C ≈ cov(X, dims=1, corrected=false)


Any reason to use ≈ rather than isequal?

Strict equality fails for sparse matrices because cov is specialized for SparseMatrixCSC in Statistics (https://github.com/JuliaLang/Statistics.jl/blob/b384104d35ff0e7cf311485607b177223ed72b9a/src/Statistics.jl#L1058), but mean_and_cov uses covm rather than cov.
I think if we want to achieve strict equality here, the fix should be in Statistics (specializing covm rather than cov)

test/cov.jl

nalimilan · 2020-10-02T13:14:50Z

src/cov.jl

+    @inbounds for i in CartesianIndices(size(C))
+        si = s[i[1]] * s[i[2]]
+        # the covariance is 0 when si==0, although C[i] is NaN in this case
+        C[i] = iszero(si) ? zero(eltype(C)) : C[i] * si


In case eltype(C) isn't a concrete type:

Suggested change

C[i] = iszero(si) ? zero(eltype(C)) : C[i] * si

Ci = C[i]

C[i] = iszero(si) ? zero(Ci) : Ci * si

Can you explain a bit more what happens here?

If one of the variables has zero variance, its covariance with any other variable is 0, but the correlation is undefined, and the result of cor would have NaN in that element. In that case Ci == NaN, si == 0, but the correct output is 0, not NaN.
For example,

X = [1:3 ones(3)] cov(X) == [1 0; 0 0] cor(X) == [1 NaN; NaN 1] cor2cov(cor(X), std.(eachcol(X))) # previously [1 NaN; NaN 0], which is incorrect

yha · 2020-10-15T09:54:42Z

Rebased

yha · 2021-01-17T23:32:23Z

bump

nalimilan reviewed Oct 2, 2020

View reviewed changes

Covariance-related functions for general AbstractArray

be9a11b

yha force-pushed the cov-general-array branch from 1edacde to be9a11b Compare October 15, 2020 09:36

yha added 2 commits October 15, 2020 12:42

Review changes

f4a47e9

Fix tests for julia 1.0

34c876e

yha requested a review from nalimilan December 10, 2020 13:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Covariance-related functions for general AbstractArray #599

Covariance-related functions for general AbstractArray #599

yha commented Sep 13, 2020

nalimilan left a comment

nalimilan Oct 2, 2020

yha Oct 2, 2020

nalimilan Oct 2, 2020

yha Oct 2, 2020

yha commented Oct 15, 2020

yha commented Jan 17, 2021

	C[i] = iszero(si) ? zero(eltype(C)) : C[i] * si
	Ci = C[i]
	C[i] = iszero(si) ? zero(Ci) : Ci * si

Covariance-related functions for general AbstractArray #599

Are you sure you want to change the base?

Covariance-related functions for general AbstractArray #599

Conversation

yha commented Sep 13, 2020

nalimilan left a comment

Choose a reason for hiding this comment

nalimilan Oct 2, 2020

Choose a reason for hiding this comment

yha Oct 2, 2020

Choose a reason for hiding this comment

nalimilan Oct 2, 2020

Choose a reason for hiding this comment

yha Oct 2, 2020

Choose a reason for hiding this comment

yha commented Oct 15, 2020

yha commented Jan 17, 2021