Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What exactly is NELBO and why do we optimize it? #8

Open
legurp opened this issue Oct 12, 2020 · 3 comments
Open

What exactly is NELBO and why do we optimize it? #8

legurp opened this issue Oct 12, 2020 · 3 comments

Comments

@legurp
Copy link

legurp commented Oct 12, 2020

Can someone tell me why we optimize NELBO? In the paper it only said "We optimize the ELBO with respect to the variational parameters." As far as I understand it D-ETM consists of three neural networks to find the distributions for theta, eta and alpha and then estimates KL divergences for them. And then the KL divergence values are simply added together and optimized jointly? But why is NLL added? And I thought that "Solving this optimization problem is equivalent to maximizing the evidence lower bound (ELBO)" would mean that we don't minimize it as a loss which the model seems to do but rather maximize it.

Sorry, I am pretty confused (I am rather new to Bayesian statistics and variational inference)

@legurp
Copy link
Author

legurp commented Oct 12, 2020

in detm.py in the forward() function it says:
nelbo = nll + kl_alpha + kl_eta + kl_theta
return nelbo, nll, kl_alpha, kl_eta, kl_theta

in main.py it says:
loss, nll, kl_alpha, kl_eta, kl_theta = model(data_batch, normalized_data_batch, times_batch, train_rnn_inp, args.num_docs_train)
loss.backward()
optimizer.step()

@mona-timmermann
Copy link

mona-timmermann commented Oct 15, 2020

The following paper might be helpful: https://arxiv.org/abs/2002.07514

@jfcann
Copy link

jfcann commented Jan 29, 2021

Hi @legurp, NELBO is the "negative ELBO", and NLL should stand for "negative log-likelihood".
Usually people state they are maximising ELBO, it's true, but since logs of probabilities give you a quantity <=0, it's often more convenient to multiply it by -1 (so that it becomes positive) and then minimise this new quantity (as a loss).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants