-
Notifications
You must be signed in to change notification settings - Fork 194
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Weighted mean with function #886
base: master
Are you sure you want to change the base?
Conversation
Adds the method for a weighted mean of elements transformed by a function. - Added `mean(f, itr, weights)` - Added tests for the method
Added tests for UnitWeights
src/weights.jl
Outdated
mean(f, A::AbstractArray, w::AbstractWeights; dims::Union{Colon,Int}=:) = | ||
_mean(f.(A), w, dims) | ||
|
||
function mean(f, A::AbstractArray, w::UnitWeights; dims::Union{Colon,Int}=:) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a slightly more generic version of the same thing, which is less likely to require maintenance in the future (if we add additional keyword arguments to mean
). I suggest using this pattern when you can.
function mean(f, A::AbstractArray, w::UnitWeights; dims::Union{Colon,Int}=:) | |
function mean(f, A::AbstractArray, w::UnitWeights; kwargs...) |
src/weights.jl
Outdated
function mean(f, A::AbstractArray, w::UnitWeights; dims::Union{Colon,Int}=:) | ||
a = (dims === :) ? length(A) : size(A, dims) | ||
a != length(w) && throw(DimensionMismatch("Inconsistent array dimension.")) | ||
return mean(f.(A), dims=dims) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Avoid memory allocation by using mean(f, A)
instead of mean(f.(A))
. Remember that f.(A)
creates an extra array, which is slow. Memory access is usually the biggest bottleneck on modern CPUs. mean(f.(A))
is 2 separate operations: The first one creates a new array, f.(A)
, and the second calculates its mean. mean(f, A)
calculates the mean of (f(x) for x in A)
directly, as one operation, without creating a new array.
return mean(f.(A), dims=dims) | |
return mean(f, A; dims) |
I'd also suggest making it slightly more generic, as
return mean(f.(A), dims=dims) | |
return mean(f, A; kwargs...) |
src/weights.jl
Outdated
@@ -682,6 +682,31 @@ function mean(A::AbstractArray, w::UnitWeights; dims::Union{Colon,Int}=:) | |||
return mean(A, dims=dims) | |||
end | |||
|
|||
""" | |||
mean(f, A::AbstractArray, w::AbstractWeights[, dims::Int]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mean(f, A::AbstractArray, w::AbstractWeights[, dims::Int]) | |
mean(f, A::AbstractArray, w::AbstractWeights[; dims]) |
dims
shouldn't be required to be an integer.
src/weights.jl
Outdated
``` | ||
""" | ||
mean(f, A::AbstractArray, w::AbstractWeights; dims::Union{Colon,Int}=:) = | ||
_mean(f.(A), w, dims) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Avoid memory allocation by using mean(f, A)
instead of mean(f.(A))
. Remember that f.(A)
creates an extra array, which is slow. Memory access is usually the biggest bottleneck on modern CPUs. mean(f.(A))
is 2 separate operations: The first one creates a new array, f.(A)
, and the second calculates its mean. mean(f, A)
calculates the mean of (f(x) for x in A)
directly, as one operation, without creating a new array.
_mean(f.(A), w, dims) | |
_mean(f, A; dims) |
I'd also suggest making it slightly more generic, as
_mean(f.(A), w, dims) | |
_mean(f, A; kwargs...) |
(See below for more details.)
- Add keyword arguments for the weights - Modified functions to use `Iterators.map` - Add more tests
src/weights.jl
Outdated
``` | ||
""" | ||
mean(f, A, w::AbstractWeights; kwargs...) = | ||
mean(broadcast(f, A), w; kwargs...) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Like I said, check the method for weighted sem
to see how to use Broadcasting.broadcasted
correctly. broadcast
is not the same thing as Broadcasting.broadcasted
. The latter is lazy.
All checks not passed
- Removed implementation for multi-dimensional array - Updated documentations - Updated tests
src/weights.jl
Outdated
return sum(Broadcast.broadcasted(f, A, w) do f, a_i, wg | ||
return f(a_i) * wg | ||
end) / sum(w) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should use Broadcast.instantiate
. Moreover, to me it seems this should be changed to
return sum(Broadcast.broadcasted(f, A, w) do f, a_i, wg | |
return f(a_i) * wg | |
end) / sum(w) | |
return sum(Broadcast.instantiate(Broadcast.broadcasted(A, w) do a_i, wg | |
return f(a_i) * wg | |
end)) / sum(w) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using Broadcast.instantiate
causes extra allocations, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't tried. But without Broadcast.instantiate
it will be slow and fall back to Cartesian indexing. See, e.g., JuliaLang/julia#31020 ("we require Broadcast.instantiate for fast reduce").
Used `Broadcast.instantiate` as requested to overcome falling back to Cartesian indexing
src/weights.jl
Outdated
``` | ||
""" | ||
mean(f, A::AbstractArray, w::AbstractWeights) = | ||
_funcweightedmean(f, A, w) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't remember, was there any particular reason for introducing _funcweightedmean
instead of implementing two methods for mean
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No there wasn't any such reason. I am removing this function, and implementing it as a method for mean
Instead deploy it as a method for `mean`
@devmotion @ParadaCarleton gentle ping, is this PR ready to merge? |
This PR addresses #868 . Adds the method for a weighted mean of elements transformed by a function.
mean(f, itr, weights)