You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I discovered this afternoon that if you give a non zero policy training weight with data where the policy that doesn't add up to 1, the reg term goes absolutely berserk (I've seen reg losses of 5000). think this happens because the net is trying to reach an impossible policy distribution. Would it be a significant slowdown to either re-normalize the policy target or to have a warning if the sum of your policy head isn't approximately 1?
The text was updated successfully, but these errors were encountered:
I discovered this afternoon that if you give a non zero policy training weight with data where the policy that doesn't add up to 1, the reg term goes absolutely berserk (I've seen reg losses of 5000). think this happens because the net is trying to reach an impossible policy distribution. Would it be a significant slowdown to either re-normalize the policy target or to have a warning if the sum of your policy head isn't approximately 1?
The text was updated successfully, but these errors were encountered: