-
Notifications
You must be signed in to change notification settings - Fork 194
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Covariance-related functions for general AbstractArray #599
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
@@ -87,27 +100,32 @@ weight_funcs = (weights, aweights, fweights, pweights) | |||
@testset "Mean and covariance" begin | |||
(m, C) = mean_and_cov(X; corrected=false) | |||
@test m == mean(X, dims=1) | |||
@test C == cov(X, dims=1, corrected=false) | |||
@test C ≈ cov(X, dims=1, corrected=false) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason to use ≈
rather than isequal
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Strict equality fails for sparse matrices because cov
is specialized for SparseMatrixCSC
in Statistics
(https://github.com/JuliaLang/Statistics.jl/blob/b384104d35ff0e7cf311485607b177223ed72b9a/src/Statistics.jl#L1058), but mean_and_cov
uses covm
rather than cov
.
I think if we want to achieve strict equality here, the fix should be in Statistics
(specializing covm
rather than cov
)
@inbounds for i in CartesianIndices(size(C)) | ||
si = s[i[1]] * s[i[2]] | ||
# the covariance is 0 when si==0, although C[i] is NaN in this case | ||
C[i] = iszero(si) ? zero(eltype(C)) : C[i] * si |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In case eltype(C)
isn't a concrete type:
C[i] = iszero(si) ? zero(eltype(C)) : C[i] * si | |
Ci = C[i] | |
C[i] = iszero(si) ? zero(Ci) : Ci * si |
Can you explain a bit more what happens here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If one of the variables has zero variance, its covariance with any other variable is 0, but the correlation is undefined, and the result of cor
would have NaN
in that element. In that case Ci == NaN
, si == 0
, but the correct output is 0, not NaN
.
For example,
X = [1:3 ones(3)]
cov(X) == [1 0; 0 0]
cor(X) == [1 NaN; NaN 1]
cor2cov(cor(X), std.(eachcol(X))) # previously [1 NaN; NaN 0], which is incorrect
1edacde
to
be9a11b
Compare
Rebased |
bump |
This is mostly changing
DenseMatrix
in method signatures toAbstractMatrix
, and running the tests also on a sparse array and a custom-typed array.Other changes:
_symmetrize!
for sparse matricescor2cov!
discovered when testing with sparse matrixmean_and_cov(vector)
. I didn't addmean_and_cov(vector, weights)
since that would need a fix to cov(x, w::AbstractWeights) dispatches on cov(X, Y) fallback #409, which has a long discussion pointing to JuliaLang/Statistics.jl#2