Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Making KernelDensity.pdf interface consistent with Distributions.pdf #122

Open
wants to merge 18 commits into
base: master
Choose a base branch
from

Conversation

jaksle
Copy link
Contributor

@jaksle jaksle commented Feb 7, 2024

This solves issue #120 as well as implements functionality of PR #102, but for any dimension, not only 2D matrices as in #102.

There is one breaking change: pdf(k::UnivariateKDE, v::AbstractVector) becomes undefined and pdf(k::InterpolateKDE, v::AbstractVector) now tries to calculate pdf at multi-dimensional point v instead of treating v as a collection of 1D points. I am open to discussion but I am afraid the old version is unsalvageable. There is no such method in Distributions.jl and the existence of it conflicts with implementing pdf methods for multi-dimensional KDEs. The same functionality is now available using pdf.(k, xs) as in Distributions.

I left methods pdf(k2d::BivariateKDE, x,y) and pdf(k::UnivariateKDE, xs, ys) which do not have Distributions equivalents, but are mostly harmless.

Turning on broadcast is just line 15 in KernelDensity.jl file which turns all KDE objects into scalars. But, broadcast was dysfunctional in that state because there was no constant propagation, so the efficiency was atrocious. (I am not completely sure why, but it my be because functors from Interpolation.jl have custom broadcast.) The proposed solution is to extend custom broadcast to pdf too, interp.jl line 37, where it gets redirected to Interpolations functor broadcast. Now calculating pdf for multiple datapoints is actually even a litte faster than before.

This is supposed to be a small PR only with changes absolutely necessary to fix the interface. But there are additional changes which can be made to clear the situation more:

  • InterpKDE type unfortunately does not contain information about the dimension the KDE is in, so we cannot write dimension-specific methods and errors caused by inconsistent dimensions look horrible. This can be fixed leaving the interface InterpKDE but making internal type InterpKDE{N} or something similar.
  • Bad choice was taken when defining kde(M::Matrix), i.e. that the datapoints are rows. Alas, in Distributions datapoints are columns, in pdf method too. Making KernelDensity.pdf act on rows would be insane, so we are left with very uncomfortable situation where kde and pdf methods have contradictory interface. I do not fix this because that would be another breaking change.

@jaksle
Copy link
Contributor Author

jaksle commented Feb 7, 2024

Currently it errs in Julia 1.0 due to eachslice being not available in this version. Pity, using it was a simple solution, but can be replaced. If I understand correctly, it can also be imported from Compat.jl in Julia 1.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant