-
Notifications
You must be signed in to change notification settings - Fork 421
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handle degenerate Beta distribution cases #1881
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #1881 +/- ##
==========================================
+ Coverage 85.99% 86.09% +0.10%
==========================================
Files 144 144
Lines 8666 8712 +46
==========================================
+ Hits 7452 7501 +49
+ Misses 1214 1211 -3 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My gut feeling is that it would be much simpler to ensure in the constructor that alpha and beta are finite positive numbers.
This is true. On the other hand, that would also be a breaking change. Additionally, an argument might emerge that Beta(0,b) and Beta(a, 0) are worth supporting, as they can occasionally show up in Bayesian inference. It happens that StatsFuns.jl already supports these two cases (your work, actually). There's also Beta(0,0), or Haldane's prior, which apparently behaves like Bernoulli(0.5). If we do eventually support Beta(0,b) and Beta(a, 0) and alter methods like On the StatsFuns.jl side, it has been pointed out by @andreasnoack that Beta(Inf, Inf) is only Dirac(0.5) if the constraint that α = β is observed. However, R does treat Beta(Inf, Inf) as Dirac(0.5). R handles the following edge cases:
There is one major disagreement between the R implementation and my implementation. I believe that degenerate distributions should have support at only one point (i.e. support(Beta(Inf, 1)) = [1, 1]) while R maintains the normal support. If we go with the R route, it'll actually make the performance implications mostly negligible. But I personally think that |
It might be technically breaking but IMO it can be argued that it's not actually breaking since these cases weren't handled correctly anyway. |
Combined with JuliaStats/StatsFuns.jl#167 , this pull request will significantly improve handling of degenerate beta distributions.
These degenerate beta distributions are the following:
There are two other degenerate beta distributions that can occur when alpha or beta are 0, but Distributions.jl does not currently allow these to be constructed normally. I have not added checks for these cases. My priority on this pull request is to fix inaccurate behavior for things that are allegedly supported by Distributions.jl, not to add even more allegedly-supported edge cases.
This pull request is not directly dependent on JuliaStats/StatsFuns.jl#167; all tests in this pull request should pass regardless of whether StatsFuns.jl is accepted. However, functions like
cdf
,pdf
,quantile
will continue to be inaccurate until then.median
will continue to hang onmedian(Beta(Inf, 1)
because it is dependent onquantile
, but it will work (via generic fallback) fine once the changes to StatsFuns are made.Between these two pull requests, degenerate Beta distribution cases will behave more or less identically to corresponding Dirac distributions, EXCEPT with one notable difference;
pdf(Dirac(1), 1) == 1
because it is a pmf (as we treatDirac
as discrete) whilepdf(Beta(Inf, 1), 1) == NaN
because it is, by definition, a value which integrates to 1 over a 0-length range.Exceptions
I don't work with KL divergence enough to properly comprehend implementing support for degenerate beta distributions. Someone else can figure this one out.
Performance Regressions
Between this and JuliaStats/StatsFuns.jl#167, I expect almost every single method involving the Beta distribution to experience performance regressions. I believe this is worth it, however, because the status quo often returns inaccurate results or causes hangs (e.g.
median(Beta(Inf, 1)
).I have tried to minimize the performance regressions, but we're going from things like
minimum(::Beta) = 0
(which gets compiled out to just a constant if the type is known) to multiple branches (as there are 3 distinct degenerate cases). These are necessary sacrifices for accurate behavior.Tests
I have placed new tests in a new file,
degenerate.jl
. There are many other degenerate distributions which are allegedly supported by Distributions.jl currently, but currently have many issues (e.g.Bernoulli(0)
,Bernoulli(1)
,Binomial(n, 0)
,Binomial(n, 1)
. I fully intend to add more tests to that file in the future.