-
Notifications
You must be signed in to change notification settings - Fork 219
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace old Gibbs sampler with the experimental one. #2328
base: master
Are you sure you want to change the base?
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #2328 +/- ##
===========================================
- Coverage 76.81% 30.10% -46.72%
===========================================
Files 22 20 -2
Lines 1570 1485 -85
===========================================
- Hits 1206 447 -759
- Misses 364 1038 +674 ☔ View full report in Codecov by Sentry. |
Pull Request Test Coverage Report for Build 11688834613Details
💛 - Coveralls |
|
||
The old Gibbs constructor relied on being called with several subsamplers, and each of the constructors of the subsamplers would take as arguments the symbols for the variables that they are to sample, e.g. `Gibbs(HMC(:x), MH(:y))`. This constructor has been deprecated, and will be removed in the future. The new constructor works by assigning samplers to either symbols or `VarNames`, e.g. `Gibbs(; x=HMC(), y=MH())` or `Gibbs(@varname(x) => HMC(), @varname(y) => MH())`. This allows more granular specification of which sampler to use for which variable. | ||
|
||
Likewise, the old constructor for calling one subsampler more often than another, `Gibbs((HMC(:x), 2), (MH(:y), 1))` has been deprecated. The new way to achieve this effect is to list the same sampler multiple times, e.g. as `hmc = HMC(); mh = MH(); Gibbs(@varname(x) => hmc, @varname(x) => hmc, @varname(y) => mh)`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. We had a chat about a closely related issue with @torfjelde too, I'll rework the interface around this a bit.
@torfjelde, if you have a moment to take a look at the one remaining test failure, would be interested in your thoughts. We are sampling for a model with two vector variables, @testset "dynamic model" begin
@model function imm(y, alpha, ::Type{M}=Vector{Float64}) where {M}
N = length(y)
rpm = DirichletProcess(alpha)
z = zeros(Int, N)
cluster_counts = zeros(Int, N)
fill!(cluster_counts, 0)
for i in 1:N
z[i] ~ ChineseRestaurantProcess(rpm, cluster_counts)
cluster_counts[z[i]] += 1
end
Kmax = findlast(!iszero, cluster_counts)
m = M(undef, Kmax)
for k in 1:Kmax
m[k] ~ Normal(1.0, 1.0)
end
end
model = imm(Random.randn(100), 1.0)
# https://github.com/TuringLang/Turing.jl/issues/1725
# sample(model, Gibbs(MH(:z), HMC(0.01, 4, :m)), 100);
sample(model, Gibbs(; z=PG(10), m=HMC(0.01, 4; adtype=adbackend)), 100)
end |
Will have a look at this in a bit @mhauru (just need to do some grocery shopping 😬 ) |
Think I found the error: if the number of Lines 57 to 65 in 6f9679a
doesn't hit the |
I'm a bit uncertain how we should best handle this @yebai @mhauru The first partially viable idea that comes to mind is to But this would not quite be equivalent to the current implementation of Another way is to explicitly add the
Thoughts? |
I lean towards the above approach and (maybe later) provide explicit APIs to inference algorithms. This will enable us to handle reversible jumps (varying model dimensions) in MCMC more flexibly. At the moment, this is only possible in particle Gibbs; if it happens in HMC/MH, inference will likely fail (silently) EDIT: we can keep |
This does however complicate the new Gibbs sampling procedure quite drastically 😕 And it makes me bring up a question I really didn't think I'd be asking: is it then actually preferable to the current I guess we should first have a go at implementing this for the new Another point to add to the conversation that @mhauru brought to my attention the other day: we also want to support stuff like
So all in all, immediate things we need to address with Gibbs:
|
I've been trying to think of a way to fix this, that would also fix the problem where different Gibbs subsamplers can't sample the same variables (e.g. you can't first sample
Point 3. is maybe undesirable, but I think it’s minor compared to all the The only problem I see with this is combining the local state from the previous iteration of the current subsampler with the global The great benefit of sticking to one, global |
I can imagine two different philosophies to implementing a Gibbs sampler:
My above proposal would essentially be doing 2., but using code that's very much like the new sampler, where the information about which sampler modifies which variables is in the sampler/ The reason I'm leaning towards 2. is that 1. seems to run to some fundamental issues in cases where either
Both of those situations quite deeply violate the idea that the different subsamplers can operate mostly independently of each other. Any thoughts very welcome, I'm still very much trying to understand the landscape of the problem. |
Thanks, @mhauru, for the excellent summary of the problem and proposals. Storing conditioned variables in a context, like In addition, it's worth mentioning that we currently have two mechanisms for passing observations to models, i.e. (1) via model arguments, e.g. Among these options, (1) will hardcode observation information directly in the model while (2) stores them in a context. You could look at the DynamicPPL codebase for a more detailed picture of how it works. We want to unify these options, perhaps towards using (2) only. This Gibbs refactoring could be an excellent starting point for a |
Overall, I'm also in favour of this @mhauru 👍 I think your reasoning is solid here. The only other "option" I'm seeing is to keep track of which variables correpond to which varinfos (with each varinfo only containing the relevant information), but then we're effectively just re-implementing a lot of the functionality that is already provided in The only "issue" is that this does mean we have to support this "link / transform only part of the Doulby however, I think we can make this much nicer than the current approach by simply making all these But yeah, don't see how we can take approach (1) in a "nice" way, and so I'm also in favour of just trying to make (2) as painless as possible to maintain. |
Thanks for the comments both, this is very helpful.
Yeah, I think this is the way to go. |
I made some progress, but it's not done yet, maybe hold off reviewing for now. Also depends on TuringLang/DynamicPPL.jl#694. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't suggest changes because the file isn't changed, but I think these lines should be removed.
Line 23 in 24e6870
- A set of one or more symbols to sample with `MH` in conjunction with `Gibbs`, i.e. `Gibbs(MH(:m), PG(10, :s))` |
Lines 44 to 51 in 24e6870
Alternatively, you can specify particular parameters to sample if you want to combine sampling | |
from multiple samplers: | |
```julia | |
# Samples s² with MH and m with PG | |
chain = sample(gdemo(1.5, 2.0), Gibbs(MH(:s²), PG(10, :m)), 1_000) | |
mean(chain) | |
``` |
(I also checked for the other samplers, but MH is the only one that suffers from the success of having very thorough documentation.)
Co-authored-by: Penelope Yong <[email protected]>
Nice catch @penelopeysm, removed. |
Ping me when you want me to have a look @mhauru :) |
All the tests now pass for ReverseDiff. Mooncake fails in some cases because of compintell/Mooncake.jl#315. There are also some ForwardDiff specific failures, that I may have worked around now, let's see with the latest CI run. @torfjelde, once you've recovered it could be a good time for you to take another look One thing I probably still want to change is the names and arguments of |
@torfjelde, I think I made all the changes we discussed on the call. I may still move the |
All the tests have passed on at least one platform, so I assume they'll all eventually pass. @torfjelde, wanna take another look? |
Some optimisation is clearly required. The following runs in sub 10s with the Gibbs sampler, takes more than a minute with new one: julia> module Benchmark1
using Turing, Test, Random
using Turing.RandomMeasures: ChineseRestaurantProcess, DirichletProcess
import ReverseDiff
adbackend = AutoReverseDiff()
@testset "dynamic model" begin
@model function imm(y, alpha, ::Type{M}=Vector{Float64}) where {M}
N = length(y)
rpm = DirichletProcess(alpha)
z = zeros(Int, N)
cluster_counts = zeros(Int, N)
fill!(cluster_counts, 0)
for i in 1:N
z[i] ~ ChineseRestaurantProcess(rpm, cluster_counts)
cluster_counts[z[i]] += 1
end
Kmax = findlast(!iszero, cluster_counts)
m = M(undef, Kmax)
for k in 1:Kmax
m[k] ~ Normal(1.0, 1.0)
end
end
model = imm(Random.randn(100), 1.0)
# https://github.com/TuringLang/Turing.jl/issues/1725
# sample(model, Gibbs(MH(:z), HMC(0.01, 4, :m)), 100);
sample(model, Gibbs(; z=PG(10), m=HMC(0.01, 4; adtype=adbackend)), 200)
end
end Here's another one. With the new sampler: julia> module Benchmark2
using Turing, Random
import ReverseDiff
adbackend = AutoReverseDiff()
@model function gdemo(x, y)
s ~ InverseGamma(2, 3)
m ~ Normal(0, sqrt(s))
x ~ Normal(m, sqrt(s))
y ~ Normal(m, sqrt(s))
return s, m
end
Random.seed!(100)
alg = Gibbs(; s=CSMC(15), m=HMC(0.2, 4; adtype=adbackend))
@time sample(gdemo(1.5, 2.0), alg, 1_000)
end
WARNING: replacing module Benchmark2.
38.251275 seconds (13.16 M allocations: 965.908 MiB, 0.70% gc time, 7.68% compilation time)
Main.Benchmark2 With the old Gibbs sampler: 16.980426 seconds (13.63 M allocations: 1.046 GiB, 1.01% gc time, 14.13% compilation time) There are some obvious optimisations to do, I'll see where they get us. |
Closes #2318.
Work in progress.