Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Termination prob calculated over current state instead of the next state #1

Open
backpropper opened this issue Aug 20, 2020 · 2 comments

Comments

@backpropper
Copy link

The termination probability is calculated over the next state according to the original paper. So it should be using next_obs instead of obs.

termination_loss = option_term_prob * (Q[option].detach() - Q.max(dim=-1)[0].detach() + args.termination_reg) * (1 - done)

@lweitkamp
Copy link
Owner

I'm not so sure this is a problem; it's similar to line 128 in the original code. We are already calculating the Q value for s, so perhaps the authors see it as too expensive to calculate it for both s and s' (all terms in the beta update concern s').

I'm going to keep this issue open because I might see what happens if the code is changed.

@backpropper
Copy link
Author

backpropper commented Aug 23, 2020

The option_term_prob gives the option termination probability for the current option and done indicates a transition from current state to the next state. In that case, we need an advantage over the next state.
The other way would be to replace the two with prev_option_term_prob and the previous dones, since they are already available to the agent at a given timestep.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants