Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Covariance-related functions for general AbstractArray #599

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

yha
Copy link
Contributor

@yha yha commented Sep 13, 2020

This is mostly changing DenseMatrix in method signatures to AbstractMatrix, and running the tests also on a sparse array and a custom-typed array.
Other changes:

  • Specialized _symmetrize! for sparse matrices
  • Fixed bug with zero variances in cor2cov! discovered when testing with sparse matrix
  • Added mean_and_cov(vector). I didn't add mean_and_cov(vector, weights) since that would need a fix to cov(x, w::AbstractWeights) dispatches on cov(X, Y) fallback #409, which has a long discussion pointing to JuliaLang/Statistics.jl#2

Copy link
Member

@nalimilan nalimilan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

src/cov.jl Outdated Show resolved Hide resolved
@@ -87,27 +100,32 @@ weight_funcs = (weights, aweights, fweights, pweights)
@testset "Mean and covariance" begin
(m, C) = mean_and_cov(X; corrected=false)
@test m == mean(X, dims=1)
@test C == cov(X, dims=1, corrected=false)
@test C cov(X, dims=1, corrected=false)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason to use rather than isequal?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Strict equality fails for sparse matrices because cov is specialized for SparseMatrixCSC in Statistics (https://github.com/JuliaLang/Statistics.jl/blob/b384104d35ff0e7cf311485607b177223ed72b9a/src/Statistics.jl#L1058), but mean_and_cov uses covm rather than cov.
I think if we want to achieve strict equality here, the fix should be in Statistics (specializing covm rather than cov)

test/cov.jl Outdated Show resolved Hide resolved
@inbounds for i in CartesianIndices(size(C))
si = s[i[1]] * s[i[2]]
# the covariance is 0 when si==0, although C[i] is NaN in this case
C[i] = iszero(si) ? zero(eltype(C)) : C[i] * si
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In case eltype(C) isn't a concrete type:

Suggested change
C[i] = iszero(si) ? zero(eltype(C)) : C[i] * si
Ci = C[i]
C[i] = iszero(si) ? zero(Ci) : Ci * si

Can you explain a bit more what happens here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If one of the variables has zero variance, its covariance with any other variable is 0, but the correlation is undefined, and the result of cor would have NaN in that element. In that case Ci == NaN, si == 0, but the correct output is 0, not NaN.
For example,

X = [1:3 ones(3)]
cov(X) == [1 0; 0 0]
cor(X) == [1 NaN; NaN 1]
cor2cov(cor(X), std.(eachcol(X))) # previously [1 NaN; NaN 0], which is incorrect

@yha
Copy link
Contributor Author

yha commented Oct 15, 2020

Rebased

@yha yha requested a review from nalimilan December 10, 2020 13:41
@yha
Copy link
Contributor Author

yha commented Jan 17, 2021

bump

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants