Softmax Policy Target #128

oscardssmith · 2020-06-13T05:11:21Z

I discovered this afternoon that if you give a non zero policy training weight with data where the policy that doesn't add up to 1, the reg term goes absolutely berserk (I've seen reg losses of 5000). think this happens because the net is trying to reach an impossible policy distribution. Would it be a significant slowdown to either re-normalize the policy target or to have a warning if the sum of your policy head isn't approximately 1?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Softmax Policy Target #128

Softmax Policy Target #128

oscardssmith commented Jun 13, 2020

Softmax Policy Target #128

Softmax Policy Target #128

Comments

oscardssmith commented Jun 13, 2020