Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problems of IWAE ELBO Loss #34

Open
GloryyrolG opened this issue Jun 3, 2021 · 5 comments
Open

Problems of IWAE ELBO Loss #34

GloryyrolG opened this issue Jun 3, 2021 · 5 comments

Comments

@GloryyrolG
Copy link

Hi Anand and all,

As weighting of samples, weight should be detached from the current computational graph for the expected optimization objective, right? See

weight = F.softmax(log_weight, dim = -1)

@GloryyrolG
Copy link
Author

Actually I saw there is a detach statement but in the annotation.

log_weight = (log_p_x_z + kld_weight * kld_loss) #.detach().data

@GloryyrolG GloryyrolG changed the title Not Detach Problem of IWAE ELBO Loss Problems of IWAE ELBO Loss Jun 3, 2021
@GloryyrolG
Copy link
Author

Besides, as the original paper said, "Vanilla VAE separated out the KL divergence in the bound in order to achieve a
simpler and lower-variance update. Unfortunately, no analogous trick applies for k > 1" (Y. Burda et al., 2016). How are we still able to compute KL Divergence?

log_weight = (log_p_x_z + kld_weight * kld_loss) #.detach().data

@tongdaxu
Copy link

tongdaxu commented Mar 15, 2022

I also found this change very suspicious.

In the original paper Eq 14, we have:

Capture

this obviously requires the grad w to be detached. or else the grad will be equals to:

\Sum (w * \nabla \log(ELBO) + \nabla w * \log(ELBO))

which has an additional term due to taking derivative wrt to \sum(w * ELBO)

@tongdaxu
Copy link

Besides, as the original paper said, "Vanilla VAE separated out the KL divergence in the bound in order to achieve a simpler and lower-variance update. Unfortunately, no analogous trick applies for k > 1" (Y. Burda et al., 2016). How are we still able to compute KL Divergence?

log_weight = (log_p_x_z + kld_weight * kld_loss) #.detach().data

I think you are also right here, the SGVB 2 estimator separate out KL divergence out of the monte carlo sampling of reparameterized noise. Here, we should use SGVB 1 instead and use Monte Carlo to compute the whole log p(x, y) - q(y|h).

@tongdaxu
Copy link

Kindly refers to PR: #53

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants