-
Notifications
You must be signed in to change notification settings - Fork 421
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Undeprecate quantile() for arrays #1159
base: master
Are you sure you want to change the base?
Conversation
Needed by StatsBase.jl for arrays More info: JuliaStats/StatsBase.jl#586 (comment)
Codecov Report
@@ Coverage Diff @@
## master #1159 +/- ##
=======================================
Coverage 79.91% 79.91%
=======================================
Files 115 115
Lines 5905 5905
=======================================
Hits 4719 4719
Misses 1186 1186
Continue to review full report at Codecov.
|
Thanks. Can you add a test for the method too? |
@nalimilan Sorry, I only just saw this. Where would I add a test? Also, are there any basic methods/tools or best practices for tests in Julia that I might not be aware of? |
Tests quantile(collection) by way of iqr(collection). Fixes JuliaStats/Distributions#1150 See JuliaStats#1159 (comment)
Sorry, I meant testing |
@ron-wolf Bump. Would you be able to add a tests as suggested in #1159 (comment)? |
Bump. |
Looking back on this I was testing it on a distribution after all, not a collection. (If you’ll notice, I’m calling |
Actually nevermind, I have completely lost track of what has to be changed and why. Why can't the definition for |
I think I may have found a better solution to the point brought up in this comment?: # we specify AbstractVector{<:Real} as the type for a discrete
# distribution because it is the most generic type that describes
# a sortable sample: a discrete sequence of ordinal values that
# can be sampled at certain positions
quantile(x::AbstractVector{<:Real}, p::Real; sorted=false) = let
# ensure x is sorted, then sample it at the location/input p
sorted || x = sort!(collect(x))
... # do your sampling magic!
end
quantile(x::Any, p::Real) = ...
# dispatch on discrete distributions, both sorted & unsorted
quantile(x::AbstractVector{<:Real}, p::AbstractWeights; sorted=false) = let
# ensure x is sorted, then sample it at each of the locations/inputs in p
sorted || x = sort!(collect(x))
quantile.(Ref(x), p; sorted=true)
end
# a quick re-dispatch to broadcasting; instead of deprecating
# this, we just silently correct to the right behavior so
# end-users don't have to worry about the distinction
@inline quantile(x::Any, p::AbstractWeights) = quantile.(Ref(x), p)
# TODO: is there some way to deprecate
# quantile.(::AbstractVector{<:Real}, p::AbstractWeights),
# which is now radically inefficient? which would fix our efficiency woes, while also allowing this simple definition to remain in place: iqr(x) = let
q = quantile(x, [.25, .75])
q[2] - q[1]
end Are there any other functions which, like you said, have this optimization of sorting and then sampling at each point? If so, perhaps it would be best to enumerate them all in this issue, and then collaborate to write a define-them-all-in-one-go metaprogramming loop. |
what's the status here? |
It seems in Distributions there is no performance improvement if |
Actually there would be a benefit for Distribution, as currently a deprecation warning is printed when calling |
Another possible approach would be to add something like function StatsBase.iqr(d::Distribution)
q = map(Base.Fix1(quantile, d), (0.25, 0.75))
return q[2] - q[1]
end or even just function StatsBase.iqr(d::Distribution)
q1 = quantile(d, 0.25)
q2 = quantile(d, 0.75)
return q2 - q1
end to Distributions, it seems? |
Yeah that would be simpler fix. Not sure whether other functions would need the same change. |
In the interest of getting this done, I think Step 1 is to figure out what state we want the code to end up in—in a module-agnostic fashion. Step 2, then, would be figuring out what belongs to Distributions vs. StatsBase; this can be considered a secondary concern. Regarding the actual code to be written, how do we feel about the merits of the two outlines we’ve developed thus far? To reiterate, this code snippet communicates the main thrust of @devmotion’s sketch: StatsBase.iqr(d::Distribution) = quantile(d, 0.75) - quantile(d, 0.25) And this was my sketch: # we specify AbstractVector{<:Real} as the type for a discrete
# distribution because it is the most generic type that describes
# a sortable sample: a discrete sequence of ordinal values that
# can be sampled at certain positions
quantile(x::AbstractVector{<:Real}, p::Real; sorted=false) = let
# ensure x is sorted, then sample it at the location/input p
sorted || x = sort!(collect(x))
... # do your sampling magic!
end
quantile(x::Any, p::Real) = ...
# dispatch on discrete distributions, both sorted & unsorted
quantile(x::AbstractVector{<:Real}, p::AbstractWeights; sorted=false) = let
# ensure x is sorted, then sample it at each of the locations/inputs in p
sorted || x = sort!(collect(x))
quantile.(Ref(x), p; sorted=true)
end
# a quick re-dispatch to broadcasting; instead of deprecating
# this, we just silently correct to the right behavior so
# end-users don't have to worry about the distinction
@inline quantile(x::Any, p::AbstractWeights) = quantile.(Ref(x), p)
# TODO: is there some way to deprecate
# quantile.(::AbstractVector{<:Real}, p::AbstractWeights),
# which is now radically inefficient? Any thoughts on whether one of these sketches, or a different sketch, would be the best way forward? |
AFAICT that's the best solution. |
Okay. Can someone author a PR over here for me to accept, so as to have the full changeset show up in this PR? |
Fixes #1150 by undeprecating omission of the broadcast operator
.
when usingquantile
,quantile!
, and_quantile!
.For reason, see this
StatsBase
comment. For full discussion, see JuliaStats/StatsBase.jl#586 and JuliaStats/Distributions.jl#1150.