Termination prob calculated over current state instead of the next state #1

backpropper · 2020-08-20T12:13:29Z

The termination probability is calculated over the next state according to the original paper. So it should be using next_obs instead of obs.

option-critic-pytorch/option_critic.py

Line 238 in 0c57da7

    
           termination_loss = option_term_prob * (Q[option].detach() - Q.max(dim=-1)[0].detach() + args.termination_reg) * (1 - done)

lweitkamp · 2020-08-23T13:21:36Z

I'm not so sure this is a problem; it's similar to line 128 in the original code. We are already calculating the Q value for s, so perhaps the authors see it as too expensive to calculate it for both s and s' (all terms in the beta update concern s').

I'm going to keep this issue open because I might see what happens if the code is changed.

backpropper · 2020-08-23T14:15:01Z

The option_term_prob gives the option termination probability for the current option and done indicates a transition from current state to the next state. In that case, we need an advantage over the next state.
The other way would be to replace the two with prev_option_term_prob and the previous dones, since they are already available to the agent at a given timestep.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Termination prob calculated over current state instead of the next state #1

Termination prob calculated over current state instead of the next state #1

backpropper commented Aug 20, 2020

lweitkamp commented Aug 23, 2020

backpropper commented Aug 23, 2020 •

edited

Loading

Termination prob calculated over current state instead of the next state #1

Termination prob calculated over current state instead of the next state #1

Comments

backpropper commented Aug 20, 2020

lweitkamp commented Aug 23, 2020

backpropper commented Aug 23, 2020 • edited Loading

backpropper commented Aug 23, 2020 •

edited

Loading