Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parallel Sampling #70

Open
mileslucas opened this issue Nov 16, 2021 · 17 comments
Open

Parallel Sampling #70

mileslucas opened this issue Nov 16, 2021 · 17 comments

Comments

@mileslucas
Copy link
Collaborator

Although sampling in NS is not as easily parallelizable as MCMC, there are still opportunities for multi-threading/multi-processing during the sample proposal step. Following the formalism of https://www.wesleyhenderson.me/pdfs/Parallelized_Nested_Sampling_Henderson_Goggans.pdf when we propose a new point during the sample steps, the proposals can be done in parallel, accepting the first new point that fulfills the likelihood constraints. This is implemented in dynesty using Python's multiprocessing.pool.

If we overload AbstractMCMC.sample methods using the MCMCDistributed and MCMCThreads structs, we should be able to do something similar, although I don't have a clear idea of the implementation figured out.

@mileslucas
Copy link
Collaborator Author

We could achieve this without too significant of an API overhaul with the following:

Implement the following methods

AbstractMCMC.sample(::AbstractRNG, ::NestedModel, ::Nested, ::MCMCThreads; kwargs...)
AbstractMCMC.sample(::AbstractRNG, ::NestedModel, ::Nested, ::MCMCDistributed; kwargs...)

these methods should not be functionally different than AbstractMCMC.sample(::AbstractModel, ::Nested). The parallelization is not at the "chain level", it is at the "sample level". Therefore, we need a way to contextualize parallelization down to AbstractMCMC.step. We could add the following methods

AbstractMCMC.step(::AbstractRNG, ::NestedModel, ::Nested, ::MCMCThreads, state; kwargs...)
AbstractMCMC.step(::AbstractRNG, ::NestedModel, ::Nested, ::MCMCDistributed, state; kwargs...)

The step methods will also be the same as the non-parallel versions. There are plenty of opportunities to factor out shared code if it is repeated enough. The general strategy (agnostic of Threads/Distributed) is that instead of sampling a new live point in serial, the proposals need to occur in parallel with the master process blocking until one of the proposal's meets the likelihood constraints. This means we might be able to write a single method

Proposals.(::AbstractProposal)(rng, ::MCMCThreads, point, logl_star, bounds, loglike, prior_transform; kwargs...)
Proposals.(::AbstractProposal)(rng, ::MCMCDistributed, point, logl_star, bounds, loglike, prior_transform; kwargs...)

which looks roughly like (pseudocode)

pool = spawn N tasks -> propose
start each task in pool
block until first return value # do not wait for remaining tasks to finish

this would require minimal code-rewriting, because we can effectively wrap all the existing proposal code. This also means each task's proposal will evolve independently, and I'm not sure what the repercussions of that are (or how dynesty/polychord manage that). In order to have the proposals evolve homogenously, I'd have to find a way to refactor the proposal code with generic ways to iterate and check for completion.

@mileslucas
Copy link
Collaborator Author

@devmotion @cpfiffer this seems like I'm really pushing the boundaries of the AbstractMCMC interface, but it's a fairly unique case. Any thoughts on the methods I proposed above? I'm most concerned about extending MCMCThreads/MCMCDistributed beyond the designated API.

@cpfiffer
Copy link
Member

Okay you mentioned that the parallelization is within the sample, right? Is that the correct reading? If so, I don't think you should need to touch step or sample -- I feel like I'm missing the part that the overload gives you.

I.e. you could add threaded=true or something to step which would parallelize the proposals. Alternatively, we could pass the threading/distributed struct down into step via a keyword.

@cpfiffer
Copy link
Member

I think your proposed method should be good:

Proposals.(::AbstractProposal)(rng, ::MCMCThreads, point, logl_star, bounds, loglike, prior_transform; kwargs...)
Proposals.(::AbstractProposal)(rng, ::MCMCDistributed, point, logl_star, bounds, loglike, prior_transform; kwargs...)

@mileslucas
Copy link
Collaborator Author

I feel like I'm missing the part that the overload gives you.

It's just contextualization so internally we know we need to parallelize the sampling, it could be a keyword argument, too.

@cpfiffer
Copy link
Member

Ah, okay, then yeah I like what you have (Proposals). I think we should leave this outside of AbstractMCMC for now -- you can just add a distributed or threaded keyword to your step functions that handle parallelization internally.

I don't necessarily want to pass the chain threading context (the one we give sample) to the step context, because they probably should be different. You are right though that this is kind of a weird edge case, and I'm not really sure what the best upstream fix would look like.

@marcobonici
Copy link

I am sorry that it took almost 6 months (I am the one who asked @mileslucas about parallelizing Nested Sampling), but now I really can help on this particular item.
I have found a paper, which is the refinement of the paper posted here by @mileslucas , which suggests a better parallelization strategy. Rather than parallelizing the proposal step, they suggest that a better strategy would be to perform N independent runs to be merged later.
In order to do this, it is necessary to:

  1. Merge all the accepted points in a single vector
  2. Sort the array according to the likelihood value
  3. Evaluate the weight
  4. Evaluate the evidence

However, as far as I understand. the likelihood for each accepted point is not saved, only the likelihood of the final state.

Tomorrow I'll make a couple of tests, in order to understand correctly the algorithm they propose and if it's worth implementing it. After this, I'll write here my findings and I'll ask to you how to proceed.
Cheers,
Marco

@marcobonici
Copy link

Hi @mileslucas ,
I have done a few tests, using a model of NestedSamplers.jl, correlated gaussians.
I can confirm that merging more runs together seems to be working and increasing the accuracy.
Here are the (resampled) chains for one runs, with n_live = 300 and dlogz = 0.1
resampled_chain1
Here are the (resampled) chains after 40 runs are merged together.
myresampled
The contours are much more accurate than in the previous case (I also made a comparison with a run with much more live points).

Here is attached the code I have written. This is NOT a beautiful code, this is just meant to show that it is indeed possible to merge more chains after a single run.
test_parallel_nested.txt

Please, let me know if you have any suggestions or considerations. In order to improve what I have done, I'd need to have saved for each point in the chains the value of the logl (I am computing it again, which is kind of inefficient).
Cheers,
Marco

@mileslucas
Copy link
Collaborator Author

@marcobonici Sorry for such a belated response. This is really nice to see! Your chain-merging code doesn't look too shabby

I'd need to have saved for each point in the chains the value of the logl

This is available in the raw samples (see here for example of the sample and state tuples).

To get these, you'll want to run sample with chain_type=Any. The output should be a vector of the sample tuples. The fields are

  • u - the sample in the unit-cube
  • v - the sample in regular coordinates (i.e., after the prior transform)
  • logwt - the log volume of the shell represented by this sample
  • logl - the log-likelihood of the sample

so you can directly access them in this way.

That being said, as you'll see here there is some minor processing of the samples that won't be automatically done in this case. So you'll have to manually add this to your analysis.

Thanks for taking a look into this! Thanks for your patience while I haven't been super available; I'm eager to see what you can come up with.

@marcobonici
Copy link

Hi @mileslucas ,
Sorry for the delay but I was busy with some academic duties... :/
I think that next week I will be able to work a bit on this task...I'll probably ask you some feedback/suggestions.

@mileslucas
Copy link
Collaborator Author

No worries, I'm also very behind on stuff from working on a paper, currently. As you start working on it feel free to ask me anything!

@marcobonici
Copy link

Hi @mileslucas ,
I have been studying both NestedSamplers.jl and AbstractMCMC.jl.
There are a few things are still a bit obscure to me, but I'm starting understanding the workflow.
My proposal is to write new methods for sample, adding:

  • an argument N::Int, which specifies which the number of run
  • an argument of type AbstractMCMCEnsemble, which specifies if the chains are obtained sequentually, using Threads parallelism or distributed computing

This is the easy part.
Now we move to the difficult part.
I think that the less invasive option is to merge two (or more) different chains as they are now, without any modifications.
In order to make this, I need to understand if, given the weight value of a point, it is possible to obtain the corresponding likelihood without any further evaluation. I mean, this should be possible but I'd like to check if there is a loss in precision or not.
I'll make a quick check, then I'll start thinking how to implement this in the code.

@mileslucas
Copy link
Collaborator Author

mileslucas commented Jun 5, 2022

My proposal is to write new methods for sample, adding:

Sounds good, make sure to follow the API in the AbstractMCMC interface

given the weight value of a point, it is possible to obtain the corresponding likelihood without any further evaluation.

Do you just need the log-likelihoods of the sampled points? If so, this is already stored for each sample, it just doesn't get exposed for some of the output types.

@marcobonici
Copy link

Hi @mileslucas ,
Thank you for your hints.
I tried to run sample with chain_type=Any, but it doesn't look to give me what I want: I don't obtain a chain with logl stored, but rather I obtain a long list of NamedTuples, containing us, vs etc.

I have modified the bundle_samples methods, in order to save a new column containing logl. I don't know if I will modify again this part of the code before opening the MR, but this gives me access exactly to the quantity I need. Btw, if you have a better proposal I am glad to hear it 😄 !

@marcobonici
Copy link

Here we are! I have obtained a few interesting results:)
I have defined a new function merge_chains, which takes two arguments:

  • A list of chains
  • The sampler used to run chains (in this moment it is used only to retrieve nactive)

From the tests I have performed, it seems that merging chains improve the posterior we obtain (as in the preliminary plots I obtained a few weeks ago).
Also using the analytical logevidence we obtain good results.
I attach here the script I used
parallel_nested.txt

I have checked the algorithm using the example problems I found in this repo. Probably I'll continue investigating using other problems where I have a closed form for the posterior.

@tomkimpson
Copy link

Hi @marcobonici @mileslucas - is there any more progress on using parallelisation here? In Dynesty we can pass a pool argument to the sampler. Is something anaogous currently possible with NestedSamplers.jl?

@mileslucas
Copy link
Collaborator Author

Hi @tomkimpson I have not looked at it and I do not have time right now with my commitments to my PhD thesis. Sorry- I believe Marco had a working version he emailed to me a few months ago, I do not know if that still works.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants