Question: calculation of the actor loss #64

zhaoyi11 · 2024-09-01T00:36:03Z

Hi,

Thanks for the implementation!

I noticed that in this implementation, when calculating the actor loss, DMC tasks use the dynamics loss and atari tasks use the reinforcement loss (w/o return normalization), which should be similar to the Dreamer v2

dreamerv3-torch/models.py

Lines 406 to 412 in 4e50f30

    
           if self._config.imag_gradient == "dynamics": 
        
               actor_target = adv 
        
           elif self._config.imag_gradient == "reinforce": 
        
               actor_target = ( 
        
                   policy.log_prob(imag_action)[:-1][:, :, None] 
        
                   * (target - self.value(imag_feat[:-1]).mode()).detach() 
        
               )

. However, in the dreamer v3 paper and the official Jax code, reinforcement loss with a normalized return is used for all domains https://github.com/danijar/dreamerv3/blob/251910d04c9f38dd9dc385775bb0d6efa0e57a95/dreamerv3/agent.py#L319-L320. Have you tried with this loss for DMC tasks? Or do you think the dynamics loss still works better for DMC tasks?

Thanks a lot!

Best

NM512 · 2024-09-26T14:46:18Z

Hi,

Thank you for bringing this to my attention!
I noticed that the paper and the official implementation were updated in April of this year as shown below.
paper before update, paper after update
The difference in the actor loss calculation is due to this implementation being done prior to that update.
I'll look into the differences between the two approaches, but it may take some time, so I appreciate your patience.

Best

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question: calculation of the actor loss #64

Question: calculation of the actor loss #64

zhaoyi11 commented Sep 1, 2024

NM512 commented Sep 26, 2024

Question: calculation of the actor loss #64

Question: calculation of the actor loss #64

Comments

zhaoyi11 commented Sep 1, 2024

NM512 commented Sep 26, 2024