Two similar custom environments, PPO learns on both but SAC only on one #1824

tfederico · 2024-02-02T16:43:25Z

🐛 Bug

I have two custom environments, one with a hand and one with a humanoid with hands. In the past, I trained the humanoid with just PPO and the hand with both PPO and SAC.
I am currently trying to train the humanoid with SAC by adapting the training script I used for the hand. However, it does not converge at all. I am trying several hyperparams combinations, but I always get the same results no matter the combination. The maximum reward per step is 1 and the cumulative reward I am getting after 2000 steps is always 1. However, if I use the exact same environment but train using PPO instead of SAC, it works. Also, if I use the exact same training script with the hand environment, it works.

I took a look at this and this issues, as i noticed that for some runs the loss diverges.

Any idea about what could be wrong? To me, it looks like it might be related to SAC rather than the custom gym env, but I might be wrong...

Code example

Training script

Custom env humanoid

Relevant log output / Error message

No response

System Info

Describe the characteristic of your environment:

GPU model: RTX 2080Ti
Versions of any other relevant libraries: all indicated in requirements.txt file

python==3.9.12
gym==0.21.0
matplotlib==3.5.2
mpi4py
numpy==1.23.1
opencv-python==4.6.0.66
packaging==21.3
pandas==1.4.3
pybullet==3.2.5
pytorch3d==0.6.2
scipy==1.8.1
stable-baselines3==1.6.0
torch==1.11.0
torchvision==0.12.0
tqdm==4.64.0
wandb==0.13.1

Checklist

I have checked that there is no similar issue in the repo
I have read the documentation
I have provided a minimal and working example to reproduce the bug
I have checked my env using the env checker
I've used the markdown code blocks for both code and stack traces.

The text was updated successfully, but these errors were encountered:

araffin · 2024-02-07T16:10:41Z

Hello,
your issue falls into the category of "tech support" (why X doesn't work on Y?) which we don't do (as mentioned in the readme and issue template), the rl discord, reddit or stack overflow are better places for such questions.

The only thing I can recommend you is to read/watch our documentation, especially the "rl tips and tricks" (see for instance #1826).

tfederico added the custom gym env Issue related to Custom Gym Env label Feb 2, 2024

tfederico changed the title ~~Two similar custom environment, PPO learns on both but SAC only on one~~ Two similar custom environments, PPO learns on both but SAC only on one Feb 2, 2024

araffin added the No tech support We do not do tech support label Feb 2, 2024

araffin closed this as not planned Won't fix, can't repro, duplicate, stale Feb 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Two similar custom environments, PPO learns on both but SAC only on one #1824

Two similar custom environments, PPO learns on both but SAC only on one #1824

tfederico commented Feb 2, 2024

araffin commented Feb 7, 2024

Two similar custom environments, PPO learns on both but SAC only on one #1824

Two similar custom environments, PPO learns on both but SAC only on one #1824

Comments

tfederico commented Feb 2, 2024

🐛 Bug

Code example

Relevant log output / Error message

System Info

Checklist

araffin commented Feb 7, 2024