You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have two custom environments, one with a hand and one with a humanoid with hands. In the past, I trained the humanoid with just PPO and the hand with both PPO and SAC.
I am currently trying to train the humanoid with SAC by adapting the training script I used for the hand. However, it does not converge at all. I am trying several hyperparams combinations, but I always get the same results no matter the combination. The maximum reward per step is 1 and the cumulative reward I am getting after 2000 steps is always 1. However, if I use the exact same environment but train using PPO instead of SAC, it works. Also, if I use the exact same training script with the hand environment, it works.
I took a look at this and this issues, as i noticed that for some runs the loss diverges.
Any idea about what could be wrong? To me, it looks like it might be related to SAC rather than the custom gym env, but I might be wrong...
tfederico
changed the title
Two similar custom environment, PPO learns on both but SAC only on one
Two similar custom environments, PPO learns on both but SAC only on one
Feb 2, 2024
Hello,
your issue falls into the category of "tech support" (why X doesn't work on Y?) which we don't do (as mentioned in the readme and issue template), the rl discord, reddit or stack overflow are better places for such questions.
The only thing I can recommend you is to read/watch our documentation, especially the "rl tips and tricks" (see for instance #1826).
🐛 Bug
I have two custom environments, one with a hand and one with a humanoid with hands. In the past, I trained the humanoid with just PPO and the hand with both PPO and SAC.
I am currently trying to train the humanoid with SAC by adapting the training script I used for the hand. However, it does not converge at all. I am trying several hyperparams combinations, but I always get the same results no matter the combination. The maximum reward per step is 1 and the cumulative reward I am getting after 2000 steps is always 1. However, if I use the exact same environment but train using PPO instead of SAC, it works. Also, if I use the exact same training script with the hand environment, it works.
I took a look at this and this issues, as i noticed that for some runs the loss diverges.
Any idea about what could be wrong? To me, it looks like it might be related to SAC rather than the custom gym env, but I might be wrong...
Code example
Training script
Custom env humanoid
Relevant log output / Error message
No response
System Info
Describe the characteristic of your environment:
Checklist
The text was updated successfully, but these errors were encountered: