Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Weighted mean with function #886

Open
wants to merge 8 commits into
base: master
Choose a base branch
from

Conversation

itsdebartha
Copy link
Contributor

This PR addresses #868 . Adds the method for a weighted mean of elements transformed by a function.

  • Added mean(f, itr, weights)
  • Added tests for the method

Adds the method for a weighted mean of elements transformed by a function.

- Added `mean(f, itr, weights)`
- Added tests for the method
Added tests for UnitWeights
src/weights.jl Outdated
mean(f, A::AbstractArray, w::AbstractWeights; dims::Union{Colon,Int}=:) =
_mean(f.(A), w, dims)

function mean(f, A::AbstractArray, w::UnitWeights; dims::Union{Colon,Int}=:)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a slightly more generic version of the same thing, which is less likely to require maintenance in the future (if we add additional keyword arguments to mean). I suggest using this pattern when you can.

Suggested change
function mean(f, A::AbstractArray, w::UnitWeights; dims::Union{Colon,Int}=:)
function mean(f, A::AbstractArray, w::UnitWeights; kwargs...)

src/weights.jl Outdated
function mean(f, A::AbstractArray, w::UnitWeights; dims::Union{Colon,Int}=:)
a = (dims === :) ? length(A) : size(A, dims)
a != length(w) && throw(DimensionMismatch("Inconsistent array dimension."))
return mean(f.(A), dims=dims)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Avoid memory allocation by using mean(f, A) instead of mean(f.(A)). Remember that f.(A) creates an extra array, which is slow. Memory access is usually the biggest bottleneck on modern CPUs. mean(f.(A)) is 2 separate operations: The first one creates a new array, f.(A), and the second calculates its mean. mean(f, A) calculates the mean of (f(x) for x in A) directly, as one operation, without creating a new array.

Suggested change
return mean(f.(A), dims=dims)
return mean(f, A; dims)

I'd also suggest making it slightly more generic, as

Suggested change
return mean(f.(A), dims=dims)
return mean(f, A; kwargs...)

src/weights.jl Outdated
@@ -682,6 +682,31 @@ function mean(A::AbstractArray, w::UnitWeights; dims::Union{Colon,Int}=:)
return mean(A, dims=dims)
end

"""
mean(f, A::AbstractArray, w::AbstractWeights[, dims::Int])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
mean(f, A::AbstractArray, w::AbstractWeights[, dims::Int])
mean(f, A::AbstractArray, w::AbstractWeights[; dims])

dims shouldn't be required to be an integer.

src/weights.jl Outdated
```
"""
mean(f, A::AbstractArray, w::AbstractWeights; dims::Union{Colon,Int}=:) =
_mean(f.(A), w, dims)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Avoid memory allocation by using mean(f, A) instead of mean(f.(A)). Remember that f.(A) creates an extra array, which is slow. Memory access is usually the biggest bottleneck on modern CPUs. mean(f.(A)) is 2 separate operations: The first one creates a new array, f.(A), and the second calculates its mean. mean(f, A) calculates the mean of (f(x) for x in A) directly, as one operation, without creating a new array.

Suggested change
_mean(f.(A), w, dims)
_mean(f, A; dims)

I'd also suggest making it slightly more generic, as

Suggested change
_mean(f.(A), w, dims)
_mean(f, A; kwargs...)

(See below for more details.)

test/weights.jl Show resolved Hide resolved
src/weights.jl Outdated Show resolved Hide resolved
test/weights.jl Outdated Show resolved Hide resolved
- Add keyword arguments for the weights
- Modified functions to use `Iterators.map`
- Add more tests
src/weights.jl Outdated Show resolved Hide resolved
src/weights.jl Outdated Show resolved Hide resolved
src/weights.jl Outdated
```
"""
mean(f, A, w::AbstractWeights; kwargs...) =
mean(broadcast(f, A), w; kwargs...)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like I said, check the method for weighted sem to see how to use Broadcasting.broadcasted correctly. broadcast is not the same thing as Broadcasting.broadcasted. The latter is lazy.

All checks not passed
- Removed implementation for multi-dimensional array
- Updated documentations
- Updated tests
src/weights.jl Outdated
Comment on lines 704 to 706
return sum(Broadcast.broadcasted(f, A, w) do f, a_i, wg
return f(a_i) * wg
end) / sum(w)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should use Broadcast.instantiate. Moreover, to me it seems this should be changed to

Suggested change
return sum(Broadcast.broadcasted(f, A, w) do f, a_i, wg
return f(a_i) * wg
end) / sum(w)
return sum(Broadcast.instantiate(Broadcast.broadcasted(A, w) do a_i, wg
return f(a_i) * wg
end)) / sum(w)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using Broadcast.instantiate causes extra allocations, no?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't tried. But without Broadcast.instantiate it will be slow and fall back to Cartesian indexing. See, e.g., JuliaLang/julia#31020 ("we require Broadcast.instantiate for fast reduce").

Used `Broadcast.instantiate` as requested to overcome falling back to Cartesian indexing
src/weights.jl Outdated
```
"""
mean(f, A::AbstractArray, w::AbstractWeights) =
_funcweightedmean(f, A, w)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't remember, was there any particular reason for introducing _funcweightedmean instead of implementing two methods for mean?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No there wasn't any such reason. I am removing this function, and implementing it as a method for mean

Instead deploy it as a method for `mean`
@itsdebartha
Copy link
Contributor Author

@devmotion @ParadaCarleton gentle ping, is this PR ready to merge?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants