Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Nan Problems for SAC, TQC, for AntBulletEnv-v0, HalfCheetahBulletEnv-v0 #427

Open
5 tasks done
ZJEast opened this issue Nov 28, 2023 · 10 comments
Open
5 tasks done
Labels
bug Something isn't working documentation Improvements or additions to documentation enhancement New feature or request help wanted Help from contributors is needed

Comments

@ZJEast
Copy link

ZJEast commented Nov 28, 2023

🐛 Bug

Hello. I am trying to reproduce some algorithms or experiments, to record some data. But some expectation happens, nan is generated for some unknown reasons. Any advice to solve?

To Reproduce

python -u ../../rl-baselines3-zoo-master/train.py --algo sac --env AntBulletEnv-v0 --n-timesteps 20000000 --tensorboard-log tf-logs
python -u ../../rl-baselines3-zoo-master/train.py --algo sac --env HalfCheetahBulletEnv-v0 --n-timesteps 20000000 --tensorboard-log tf-logs
python -u ../../rl-baselines3-zoo-master/train.py --algo tqc --env AntBulletEnv-v0 --n-timesteps 20000000 --tensorboard-log tf-logs
python -u ../../rl-baselines3-zoo-master/train.py --algo tqc --env HalfCheetahBulletEnv-v0 --n-timesteps 20000000 --tensorboard-log tf-logs

Relevant log output / Error message

python -u ../../rl-baselines3-zoo-master/train.py --algo sac --env AntBulletEnv-v0 --n-timesteps 20000000 --tensorboard-log tf-logs
Traceback (most recent call last):
  File "/share/home/zhangjundong/exp/sac-AntBulletEnv-v0/../../rl-baselines3-zoo-master/train.py", line 4, in <module>
    train()
  File "/share/home/zhangjundong/rl-baselines3-zoo-master/rl_zoo3/train.py", line 272, in train
    exp_manager.learn(model)
  File "/share/home/zhangjundong/rl-baselines3-zoo-master/rl_zoo3/exp_manager.py", line 240, in learn
    model.learn(self.n_timesteps, **kwargs)
  File "/share/home/zhangjundong/stable-baselines3-master/stable_baselines3/sac/sac.py", line 307, in learn
    return super().learn(
  File "/share/home/zhangjundong/stable-baselines3-master/stable_baselines3/common/off_policy_algorithm.py", line 347, in learn
    self.train(batch_size=self.batch_size, gradient_steps=gradient_steps)
  File "/share/home/zhangjundong/stable-baselines3-master/stable_baselines3/sac/sac.py", line 219, in train
    self.actor.reset_noise()
  File "/share/home/zhangjundong/stable-baselines3-master/stable_baselines3/sac/policies.py", line 145, in reset_noise
    self.action_dist.sample_weights(self.log_std, batch_size=batch_size)
  File "/share/home/zhangjundong/stable-baselines3-master/stable_baselines3/common/distributions.py", line 508, in sample_weights
    self.weights_dist = Normal(th.zeros_like(std), std)
  File "/share/home/zhangjundong/.local/lib/python3.9/site-packages/torch/distributions/normal.py", line 56, in __init__
    super().__init__(batch_shape, validate_args=validate_args)
  File "/share/home/zhangjundong/.local/lib/python3.9/site-packages/torch/distributions/distribution.py", line 68, in __init__
    raise ValueError(
ValueError: Expected parameter scale (Tensor of shape (300, 8)) of distribution Normal(loc: torch.Size([300, 8]), scale: torch.Size([300, 8])) to satisfy the constraint GreaterThan(lower_bound=0.0), but found invalid values:
tensor([[nan, nan, nan,  ..., nan, nan, nan],
        [nan, nan, nan,  ..., nan, nan, nan],
        [nan, nan, nan,  ..., nan, nan, nan],
        ...,
        [nan, nan, nan,  ..., nan, nan, nan],
        [nan, nan, nan,  ..., nan, nan, nan],
        [nan, nan, nan,  ..., nan, nan, nan]], device='cuda:0',
       grad_fn=<ExpBackward0>)
python -u ../../rl-baselines3-zoo-master/train.py --algo sac --env HalfCheetahBulletEnv-v0 --n-timesteps 20000000 --tensorboard-log tf-logs
Traceback (most recent call last):
  File "/share/home/zhangjundong/exp/sac-HalfCheetahBulletEnv-v0/../../rl-baselines3-zoo-master/train.py", line 4, in <module>
    train()
  File "/share/home/zhangjundong/rl-baselines3-zoo-master/rl_zoo3/train.py", line 272, in train
    exp_manager.learn(model)
  File "/share/home/zhangjundong/rl-baselines3-zoo-master/rl_zoo3/exp_manager.py", line 240, in learn
    model.learn(self.n_timesteps, **kwargs)
  File "/share/home/zhangjundong/stable-baselines3-master/stable_baselines3/sac/sac.py", line 307, in learn
    return super().learn(
  File "/share/home/zhangjundong/stable-baselines3-master/stable_baselines3/common/off_policy_algorithm.py", line 347, in learn
    self.train(batch_size=self.batch_size, gradient_steps=gradient_steps)
  File "/share/home/zhangjundong/stable-baselines3-master/stable_baselines3/sac/sac.py", line 219, in train
    self.actor.reset_noise()
  File "/share/home/zhangjundong/stable-baselines3-master/stable_baselines3/sac/policies.py", line 145, in reset_noise
    self.action_dist.sample_weights(self.log_std, batch_size=batch_size)
  File "/share/home/zhangjundong/stable-baselines3-master/stable_baselines3/common/distributions.py", line 508, in sample_weights
    self.weights_dist = Normal(th.zeros_like(std), std)
  File "/share/home/zhangjundong/.local/lib/python3.9/site-packages/torch/distributions/normal.py", line 56, in __init__
    super().__init__(batch_shape, validate_args=validate_args)
  File "/share/home/zhangjundong/.local/lib/python3.9/site-packages/torch/distributions/distribution.py", line 68, in __init__
    raise ValueError(
ValueError: Expected parameter scale (Tensor of shape (300, 6)) of distribution Normal(loc: torch.Size([300, 6]), scale: torch.Size([300, 6])) to satisfy the constraint GreaterThan(lower_bound=0.0), but found invalid values:
tensor([[nan, nan, nan, nan, nan, nan],
        [nan, nan, nan, nan, nan, nan],
        [nan, nan, nan, nan, nan, nan],
        ...,
        [nan, nan, nan, nan, nan, nan],
        [nan, nan, nan, nan, nan, nan],
        [nan, nan, nan, nan, nan, nan]], device='cuda:0',
       grad_fn=<ExpBackward0>)
python -u ../../rl-baselines3-zoo-master/train.py --algo tqc --env AntBulletEnv-v0 --n-timesteps 20000000 --tensorboard-log tf-logs
Traceback (most recent call last):
  File "/share/home/zhangjundong/exp/tqc-AntBulletEnv-v0/../../rl-baselines3-zoo-master/train.py", line 4, in <module>
    train()
  File "/share/home/zhangjundong/rl-baselines3-zoo-master/rl_zoo3/train.py", line 272, in train
    exp_manager.learn(model)
  File "/share/home/zhangjundong/rl-baselines3-zoo-master/rl_zoo3/exp_manager.py", line 240, in learn
    model.learn(self.n_timesteps, **kwargs)
  File "/share/home/zhangjundong/stable-baselines3-contrib-master/sb3_contrib/tqc/tqc.py", line 302, in learn
    return super().learn(
  File "/share/home/zhangjundong/stable-baselines3-master/stable_baselines3/common/off_policy_algorithm.py", line 347, in learn
    self.train(batch_size=self.batch_size, gradient_steps=gradient_steps)
  File "/share/home/zhangjundong/stable-baselines3-contrib-master/sb3_contrib/tqc/tqc.py", line 213, in train
    self.actor.reset_noise()
  File "/share/home/zhangjundong/stable-baselines3-contrib-master/sb3_contrib/tqc/policies.py", line 144, in reset_noise
    self.action_dist.sample_weights(self.log_std, batch_size=batch_size)
  File "/share/home/zhangjundong/stable-baselines3-master/stable_baselines3/common/distributions.py", line 508, in sample_weights
    self.weights_dist = Normal(th.zeros_like(std), std)
  File "/share/home/zhangjundong/.local/lib/python3.9/site-packages/torch/distributions/normal.py", line 56, in __init__
    super().__init__(batch_shape, validate_args=validate_args)
  File "/share/home/zhangjundong/.local/lib/python3.9/site-packages/torch/distributions/distribution.py", line 68, in __init__
    raise ValueError(
ValueError: Expected parameter scale (Tensor of shape (300, 8)) of distribution Normal(loc: torch.Size([300, 8]), scale: torch.Size([300, 8])) to satisfy the constraint GreaterThan(lower_bound=0.0), but found invalid values:
tensor([[nan, nan, nan,  ..., nan, nan, nan],
        [nan, nan, nan,  ..., nan, nan, nan],
        [nan, nan, nan,  ..., nan, nan, nan],
        ...,
        [nan, nan, nan,  ..., nan, nan, nan],
        [nan, nan, nan,  ..., nan, nan, nan],
        [nan, nan, nan,  ..., nan, nan, nan]], device='cuda:0',
       grad_fn=<ExpBackward0>)
python -u ../../rl-baselines3-zoo-master/train.py --algo tqc --env HalfCheetahBulletEnv-v0 --n-timesteps 20000000 --tensorboard-log tf-logs
Traceback (most recent call last):
  File "/share/home/zhangjundong/exp/tqc-HalfCheetahBulletEnv-v0/../../rl-baselines3-zoo-master/train.py", line 4, in <module>
    train()
  File "/share/home/zhangjundong/rl-baselines3-zoo-master/rl_zoo3/train.py", line 272, in train
    exp_manager.learn(model)
  File "/share/home/zhangjundong/rl-baselines3-zoo-master/rl_zoo3/exp_manager.py", line 240, in learn
    model.learn(self.n_timesteps, **kwargs)
  File "/share/home/zhangjundong/stable-baselines3-contrib-master/sb3_contrib/tqc/tqc.py", line 302, in learn
    return super().learn(
  File "/share/home/zhangjundong/stable-baselines3-master/stable_baselines3/common/off_policy_algorithm.py", line 347, in learn
    self.train(batch_size=self.batch_size, gradient_steps=gradient_steps)
  File "/share/home/zhangjundong/stable-baselines3-contrib-master/sb3_contrib/tqc/tqc.py", line 213, in train
    self.actor.reset_noise()
  File "/share/home/zhangjundong/stable-baselines3-contrib-master/sb3_contrib/tqc/policies.py", line 144, in reset_noise
    self.action_dist.sample_weights(self.log_std, batch_size=batch_size)
  File "/share/home/zhangjundong/stable-baselines3-master/stable_baselines3/common/distributions.py", line 508, in sample_weights
    self.weights_dist = Normal(th.zeros_like(std), std)
  File "/share/home/zhangjundong/.local/lib/python3.9/site-packages/torch/distributions/normal.py", line 56, in __init__
    super().__init__(batch_shape, validate_args=validate_args)
  File "/share/home/zhangjundong/.local/lib/python3.9/site-packages/torch/distributions/distribution.py", line 68, in __init__
    raise ValueError(
ValueError: Expected parameter scale (Tensor of shape (300, 6)) of distribution Normal(loc: torch.Size([300, 6]), scale: torch.Size([300, 6])) to satisfy the constraint GreaterThan(lower_bound=0.0), but found invalid values:
tensor([[0.0026, 0.0041,    nan, 0.0036, 0.0046, 0.0034],
        [0.0054, 0.0040,    nan, 0.0035, 0.0053, 0.0054],
        [0.0192, 0.0061,    nan, 0.0105, 0.0105, 0.0105],
        ...,
        [0.0257, 0.0262,    nan, 0.0058, 0.0023, 0.0098],
        [0.1410, 0.0130,    nan, 0.1707, 0.1281, 0.0216],
        [0.0494, 0.0480,    nan, 0.0506, 0.0509, 0.0487]], device='cuda:0',
       grad_fn=<ExpBackward0>)

System Info

  • OS: Linux-3.10.0-1160.el7.x86_64-x86_64-with-glibc2.17 # 1 SMP Mon Oct 19 16:18:59 UTC 2020
  • Python: 3.9.18
  • Stable-Baselines3: 2.2.1
  • PyTorch: 2.1.0+cu121
  • GPU Enabled: True
  • Numpy: 1.26.1
  • Cloudpickle: 3.0.0
  • Gymnasium: 0.29.1
  • OpenAI Gym: 0.26.2

Checklist

@ZJEast ZJEast added the bug Something isn't working label Nov 28, 2023
@qgallouedec
Copy link
Collaborator

qgallouedec commented Nov 28, 2023

This may be due to a learning rate too high, see #156 (comment); do you use the default hyperparams?

Also related (and probably duplicate): DLR-RM/stable-baselines3#1401 and DLR-RM/stable-baselines3#1418

@ZJEast
Copy link
Author

ZJEast commented Nov 28, 2023

yes, I use the default hyperparams, I try different learning rate later.

@araffin
Copy link
Member

araffin commented Nov 28, 2023

Hello,
thanks for sharing the bug report.
Does the NaN happen only for some runs or for all runs?
Could you log and share a failed run using W&B? (that would allow us to take a look at all the logged data)

I also assume you are using pybullet gymnasium repo?

I'll try to reproduce the issue in the meantime.

Also related: DLR-RM/stable-baselines3#1372 changing to AdamW might solve the problem too.

@ZJEast
Copy link
Author

ZJEast commented Nov 28, 2023

I have tried TD3, SAC, TQC on some pybullet envs. And it only happens for the task I mention, the others is fine.
I install pybullet env by 'pip install -r ./requirements.txt' .

I can upload some log file.

sac-AntBulletEnv-v0.zip
sac-HalfCheetahBulletEnv-v0.zip
tqc-AntBulletEnv-v0.zip
tqc-HalfCheetahBulletEnv-v0.zip

@araffin
Copy link
Member

araffin commented Nov 28, 2023

Thanks =)

Looking at the log it seems to be due to an explosion of std (and you are using a much larger budget that the one we were using by default).
So, setting use_expln=True (and maybe using AdamW) should solve your issue.

I would appreciate a PR that adds this parameter =)

Hmm, for TD3 it is weird if it happens as it doesn't rely on any distribution.

EDIT: I guess the issue is similar to Stable-Baselines-Team/stable-baselines3-contrib#146 by @qgallouedec

@araffin araffin added documentation Improvements or additions to documentation enhancement New feature or request help wanted Help from contributors is needed labels Nov 28, 2023
@qgallouedec
Copy link
Collaborator

qgallouedec commented Nov 28, 2023

Bug already encountered in openrlbenchmark, I might have forgotten to report it: https://wandb.ai/openrlbenchmark/sb3/runs/27cez5ua
EDIT: I did report it, you're right @araffin ;)

@qgallouedec
Copy link
Collaborator

For TD3, I only found two runs where you have an explosion of the losses, but this didn't lead to the bug:
https://wandb.ai/openrlbenchmark/sb3/runs/2qdjqemd (Walker2DBulletEnv-v0)
https://wandb.ai/openrlbenchmark/sb3/runs/ffc7kx3m (BipedalWalkerHardcore-v0)
What a wonderful tool openrlbenchmark is, ping @vwxyzjn ;)

@ZJEast
Copy link
Author

ZJEast commented Dec 1, 2023

after I change the hyperparams from

policy_kwargs: "dict(log_std_init=-3, net_arch=[400, 300])"

to

policy_kwargs: "dict(log_std_init=-3, net_arch=[400, 300], use_expln=True)"

this problem never happens again, so let's close this issue

@ZJEast ZJEast closed this as completed Dec 1, 2023
@araffin
Copy link
Member

araffin commented Dec 1, 2023

Thanks for trying out =)
i'm reopening as we need to change the defaults (we would welcome a PR).

@Torchtopher
Copy link

Torchtopher commented Dec 20, 2024

I am also facing this as well when trying to optimize hyperparmeters (both on built in envs and my custom one). Setting use_expln=True in the algorithms yaml file still results in trials where it's not set and I get this

Expected parameter loc (Tensor of shape (128, 3)) of distribution Normal(loc: torch.Size([128, 3]), scale: torch.Size([128, 3])) to satisfy the constraint Real(), but found invalid values:
tensor([[nan, nan, nan],
        [nan, nan, nan],

Is there more information that would be helpful to debug?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working documentation Improvements or additions to documentation enhancement New feature or request help wanted Help from contributors is needed
Projects
None yet
Development

No branches or pull requests

4 participants