Understanding MC acquisiton functions #2266

ToennisStef · 2024-03-27T22:25:41Z

ToennisStef
Mar 27, 2024

I have a question regarding the acquisition functions:
I have some stupid questions that are leading up to more general questions.
I have this code that instatiates a SingleTaskGP and instantiates a qExpectedImprovement acquisition function object:

likelihood = gpytorch.likelihoods.GaussianLikelihood()
model = SingleTaskGP(X,Y,likelihood=likelihood)
acq = qExpectedImprovement(model=model, best_f=Y.min())

Let X and Y be some arbitrary points that i condition the model on.
I initially wanted to make these fancy visualisations like in the following picture:

This grafic is taken from the BayesOpt Book .
So i tried the following code to get acquisition function values for the whole x range:

x = torch.linspace(lowerBound,upperBound,res)
acq_value = acq(x)

This code did not compute an acquisition function value for each value in the x tensor instead it only output one acquisition function value. I want to know why that is? i think that that is probably connected the fact that BoTorch can perform with multidimensional inputs, but to be honest i don't know whats going on. Why do i still get a value when inputting the whole x tensor? how can i interpret this value? Is it some kind of crazy transform or is it just the acquistion function value for the last entry of my x tensor?

Secondly i wanted to better understand the MC acquisition functions and came across this discussion BoTorch Discussion talking about the sampling of MC acquisition functions.

posterior = model.posterior(X)
samples = acq.get_posterior_samples(posterior)

I don't quite get what the samples derived from this function acq.get_posterior_samples(posterior) are. Is it just another way to sample from the posterior? Because at first thought it was a sample $a(\xi_{i})$ from the picture below (the MC sampling overview from BoTorch), so if i would sum it up all samples and devide by the number of samples i would get the acquistion function value at that point. namely i thought if would do that i would get acq(X).

But that was not the case. So what exactly are the samples then?

In the description of the Basic Concepts of BoTorch, there is something written about the base samples: "If the base samples are fixed, the problem of optimizing the acquisition function is deterministic,[...]."
Are the basesamples refering to $\epsilon_i \sim N(0,I)$?

And my final question is: Is there a way to get the derivative values of the acquistion function with respect to x as described in BoTorch: A Framework for Efficient Monte-Carlo Bayesian Optimization or The reparameterization trick for acquisition functions?

Thank to everyone that read through all of this! I am trying to really understand BoTorch it is an amazing toolset and i would love to use it for my masterthesis, but for this i need to really understand the basis of the library to really know what i am doing.
I am greatfull for any help!

Answered by Balandat

Mar 29, 2024

This code did not compute an acquisition function value for each value in the x tensor instead it only output one acquisition function value. I want to know why that is? i think that that is probably connected the fact that BoTorch can perform with multidimensional inputs, but to be honest i don't know whats going on. Why do i still get a value when inputting the whole x tensor? how can i interpret this value? Is it some kind of crazy transform or is it just the acquistion function value for the last entry of my x tensor?

The issue is that you computed the joint acquisition value of all the points in your "batch" of candidates. In general, BoTorch acquisition functions take in tensors o…

View full answer

Balandat · 2024-03-29T18:54:53Z

Balandat
Mar 29, 2024
Collaborator

This code did not compute an acquisition function value for each value in the x tensor instead it only output one acquisition function value. I want to know why that is? i think that that is probably connected the fact that BoTorch can perform with multidimensional inputs, but to be honest i don't know whats going on. Why do i still get a value when inputting the whole x tensor? how can i interpret this value? Is it some kind of crazy transform or is it just the acquistion function value for the last entry of my x tensor?

The issue is that you computed the joint acquisition value of all the points in your "batch" of candidates. In general, BoTorch acquisition functions take in tensors of shape (batch_shape) x q x d, where q is the batch size and d is the dimension of the domain. It appears that your input was interpreted as k x 1, where k is the number of elements of x. If you pass this in with an explicit shape of k x 1 x 1 then the acquisition function should do the right thing. You can read more about this in the documentation here: https://botorch.org/docs/batching

I don't quite get what the samples derived from this function acq.get_posterior_samples(posterior) are. Is it just another way to sample from the posterior?

Yes. The shape of these samples is somewhat complicated due to the way batching in botorch works (see above). Due to the way you passed things in you didn't get N_MC samples for each of the k points you were interested in, but N_MC samples from the joint posterior over the k points.

Are the basesamples refering to $\epsilon_i \sim N(0,I)$?

Yes

And my final question is: Is there a way to get the derivative values of the acquistion function with respect to x as described in BoTorch: A Framework for Efficient Monte-Carlo Bayesian Optimization or The reparameterization trick for acquisition functions?

Yes, being able to compute gradients via auto-differentiation is one of the key value propositions of BoTorch. You will have to ensure that the x you are passing in has requires_grad=True (e.g. by defining this in the constructor as x = torch.tensor([0.1], requires_grad=True)), and then you can compute the gradient, either by calling backward() on the resulting acquisition function or using other ways from the torch.autograd module as you would with any other PyTorch code.

I am trying to really understand BoTorch it is an amazing toolset and i would love to use it for my masterthesis, but for this i need to really understand the basis of the library to really know what i am doing.

What exactly is the goal of your thesis? Are you doing research on Bayesian Optimization? Or are you using Bayesian Optimization in your research? If the latter, you should take a look at Ax, which is a higher-level interface that makes it easy to optimize things without having to understand all the botorch details: https://github.com/facebook/Ax

4 replies

ToennisStef Mar 30, 2024
Author

Thank you very much! This is so helpful!

My Thesis is about "Integrrating knowledge into Hardware-in-the-loop Optimization". I am trying to Benchmark different BO methods for process-engineering optimization tasks that can be understood as prior knowledge (for example constraints or structural knowledge). The methods include: composite functions, including a prior mean function & prior predictive mean function, GP constraints, multi-task BO (multifidelity modelling) and apllying this also to multi objective optimization. Pretty much to get a guide (especially for the institute i am working with) that explains for which problem which combination of methods is beneficial. I haven't really started with ax yet but i tought i will need to use BoTorch at some point for the constraint GP's so i thought lets stick direktly to that.

Balandat Mar 31, 2024
Collaborator

I see. If you do want to get into the weds of trying different prior mean functions and the like then using BoTorch will likely be easier. That said, Ax does support constrained optimization with black-box constraints as well (not sure what you mean exactly by GP constraints, whether it's that or imposing some kind of constraints such as monotonicity or the like in the model itself).

ToennisStef Apr 2, 2024
Author

Thanks a lot, that is good information! It is in the end probably good to also look into ax, as it will probably facilitate a lot of setup steps.

Yes wanted to try out some bound and monotonicity constraints on the GP models in the likes of:
Gaussian processes with linear operator inequality constraints
and
A Survey of Constrained Gaussian Process Regression: Approaches and Implementation Challenges
and then try to combine them with the composite functions or different prior mean functions.

In the first paper the GP is kind of conditioned on additional virtual point to satisfy boundness and monotonicity constraints finally resulting in a truncated Multivariate Gaussian.

The "Differential Equation Constraints" from "A Survey of Constrained Gaussian Process Regression..." also seem very interesting but for now i want to focus on these two constraint types.

Balandat Apr 2, 2024
Collaborator

cc @bletham who has looked into these kinds of models in the past

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Understanding MC acquisiton functions #2266

{{title}}

Replies: 1 comment 4 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Understanding MC acquisiton functions #2266

ToennisStef Mar 27, 2024

Replies: 1 comment · 4 replies

Balandat Mar 29, 2024 Collaborator

ToennisStef Mar 30, 2024 Author

Balandat Mar 31, 2024 Collaborator

ToennisStef Apr 2, 2024 Author

Balandat Apr 2, 2024 Collaborator

ToennisStef
Mar 27, 2024

Replies: 1 comment 4 replies

Balandat
Mar 29, 2024
Collaborator

ToennisStef Mar 30, 2024
Author

Balandat Mar 31, 2024
Collaborator

ToennisStef Apr 2, 2024
Author

Balandat Apr 2, 2024
Collaborator