[question] In train.py, why is gamma in VecNormalize not updated per trial? #91

liyan2015 · 2020-07-03T14:30:52Z

Hi, from this issue, it says VecNormalize's gamma should match the gamma of RL algorithm (e.g., gamma=0.99 should be consistent in both PPO2 and VecNormalize) to ensure consistent sliding window size. However, it seems the normalization arguments used in create_env are always the default one read from .yml file (i.e., gamma=0.99 as default):

rl-baselines-zoo/train.py

Line 269 in fd9d388

env = VecNormalize(env, **normalize_kwargs)

although gamma has different candidates in hyperparams_opt.py:

rl-baselines-zoo/utils/hyperparams_opt.py

Line 188 in fd9d388

    
           gamma = trial.suggest_categorical('gamma', [0.9, 0.95, 0.98, 0.99, 0.995, 0.999, 0.9999])

The same applies for rl-baselines3-zoo. Is this a bug? Should create_env consider gamma change in initiating VecNormalize per trial? Please give me some hint if I missed anything, thank you!

The text was updated successfully, but these errors were encountered:

araffin · 2020-07-06T08:50:35Z

Good point.

Overall, it should not make a big difference as the main point is to normalize the reward magnitude.
But for consistency, I agree that gamma should be updated.

Related: hill-a/stable-baselines#698

araffin added the enhancement New feature or request label Jul 6, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[question] In train.py, why is gamma in VecNormalize not updated per trial? #91

[question] In train.py, why is gamma in VecNormalize not updated per trial? #91

liyan2015 commented Jul 3, 2020

araffin commented Jul 6, 2020 •

edited

Loading

[question] In train.py, why is gamma in VecNormalize not updated per trial? #91

[question] In train.py, why is gamma in VecNormalize not updated per trial? #91

Comments

liyan2015 commented Jul 3, 2020

araffin commented Jul 6, 2020 • edited Loading

araffin commented Jul 6, 2020 •

edited

Loading