Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

some questions about the code #1

Open
meet-cjli opened this issue Apr 20, 2019 · 3 comments
Open

some questions about the code #1

meet-cjli opened this issue Apr 20, 2019 · 3 comments

Comments

@meet-cjli
Copy link

Recently, I have read this paper and code. However, I have some questions about them. Firstly, I can't find details about the TYPE_CONJUGATE branch, especially the early termination criterion. Moreover, there is no such a function called eval_z. Finally, I can't understand the initial value of alpha and beta. Why do you set this as the initial value? And in the TYPE_CONJUGATE branch, you added 0.5 on the original basis.
And in
alpha = torch.log(X.new_ones(*size) / m) beta = torch.log(X.new_ones(*size) / m) exp_alpha = torch.exp(-alpha) exp_beta = torch.exp(-beta), there is a negative sign. However, in the Algoritom2, it's a positive sign. I have tried to set it as a positive sign, it still works. Is that means the initial values will have few impacts on the results?

@riceric22
Copy link
Member

riceric22 commented Apr 22, 2019

Hey @Sudsakorn123! The following should hopefully answer all of your questions.

  1. The TYPE_CONJUGATE branch is an implementation of Equation 14 in the paper in Section 5.3, on provable defenses, for Wasserstein balls.

  2. The early termination criteria is not anything algorithmically special: it's an optimization to the algorithm which checks whether the objective can even change to begin. In other words, it checks whether it is even possible for the objective to change within the ball within the local transport restriction. If there is no mass to move in areas with non-zero objective, then there's no need to run the sinkhorn iteration since the objective is the same throughout the ball, and so we can terminate early.

  3. eval_z seems to have been accidentally removed while I was cleaning the code, I will add it back in shortly, thanks!

  4. The initial values for alpha and beta are taken to reflect the same initial value used in the original sinkhorn algorithm from Cuturi 2013 (Algorithm 1). Beyond that, there's no other justification.

  5. In the sinkhorn iteration, there is a factor of exp(-1). In the algorithm box of our paper and for the 2NORM branch, we combined this into the K variable, however in other texts (e.g. Cuturi 2013) it's common to distribute this exp(-1) factor in to the alpha and beta variables with exp(-1/2) each. You rightfully point out that the code is confusing since the 2NORM branch uses the version in the paper, whereas the conjugate code uses the latter description, however both of these are equivalent.

  6. The initializations for alpha and beta for the conjugate branch, as you pointed out, are indeed not consistent with Algorithm 2 and the negative sign is a mistake. However, it will still work as you observed, since these are just the initial values: the projected sinkhorn iteration is a strictly convex problem, and it can be shown that the coordinate descent procedure is guaranteed to converge to the optimal value. It will still certainly be better of course for me to correct this initialization in the code.

I will leave this issue open until I've added back in the eval_z function, fixed the initialization for the conjugate branch, and made the location of the exp(-1) factor consistent in both branches.

@ptpam
Copy link

ptpam commented Jul 12, 2019

I want to ask something related to runtime. Is it the case that one step in an epoch takes around 1m30s for you as well?

@riceric22
Copy link
Member

Hey @ptpam, with batch sizes 128, it takes about 30-40 seconds to do a step of adversarial training (so doing PGD for at most 50 iterations on a single minibatch with the default parameters in adv_training_cifar.py using a single 2080ti)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants