Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question: calculation of the actor loss #64

Open
zhaoyi11 opened this issue Sep 1, 2024 · 1 comment
Open

Question: calculation of the actor loss #64

zhaoyi11 opened this issue Sep 1, 2024 · 1 comment

Comments

@zhaoyi11
Copy link

zhaoyi11 commented Sep 1, 2024

Hi,

Thanks for the implementation!

I noticed that in this implementation, when calculating the actor loss, DMC tasks use the dynamics loss and atari tasks use the reinforcement loss (w/o return normalization), which should be similar to the Dreamer v2

dreamerv3-torch/models.py

Lines 406 to 412 in 4e50f30

if self._config.imag_gradient == "dynamics":
actor_target = adv
elif self._config.imag_gradient == "reinforce":
actor_target = (
policy.log_prob(imag_action)[:-1][:, :, None]
* (target - self.value(imag_feat[:-1]).mode()).detach()
)
. However, in the dreamer v3 paper and the official Jax code, reinforcement loss with a normalized return is used for all domains https://github.com/danijar/dreamerv3/blob/251910d04c9f38dd9dc385775bb0d6efa0e57a95/dreamerv3/agent.py#L319-L320. Have you tried with this loss for DMC tasks? Or do you think the dynamics loss still works better for DMC tasks?

Thanks a lot!

Best

@NM512
Copy link
Owner

NM512 commented Sep 26, 2024

Hi,

Thank you for bringing this to my attention!
I noticed that the paper and the official implementation were updated in April of this year as shown below.
paper before update, paper after update
The difference in the actor loss calculation is due to this implementation being done prior to that update.
I'll look into the differences between the two approaches, but it may take some time, so I appreciate your patience.

Best

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants