[PPO2] problems resuming training #781

k0rean · 2020-04-03T14:03:22Z

I'm trying to resume the model training and I'm getting some strange results. Using SubProcVecEnv and VecNormalize on a custom environment:

from stable_baselines.common.policies import MlpPolicy
from stable_baselines.common.vec_env import SubprocVecEnv, VecNormalize
from stable_baselines import PPO2
import os
...

env = SubprocVecEnv([init_env(i) for i in range(n_envs)])

if os.path.exists("ppo/model.zip"): # resume training
    norm_env = VecNormalize.load("ppo/norm_env.p", env)
    model = PPO2.load("ppo/model.zip", norm_env, reset_num_timesteps=False, verbose=0, tensorboard_log="./ppo/logs")
else: # new model
    norm_env = VecNormalize(env, norm_reward=False)
    model = PPO2(MlpPolicy, norm_env, verbose=0, tensorboard_log="./ppo/logs")

model.learn(total_timesteps=2500000)
model.save("ppo/model.zip")
norm_env.save("ppo/norm_env.p")
env.close()

Firstly, don't know why it doesn't continue the current tensorboard training curve if I passed reset_num_timesteps=False. Already updated tensorboard to the last version and verified the same behaviour.
But the bigger problem is the discontinuity verified between the two runs. Already tried a single run with more timesteps (10e6) and got a continuous improving curve but without reaching a reward of 2.5 as the 2nd run got in this case. The 2nd run reached a bigger reward almost in the beginning but didn't improve anymore.
Am I doing some mistake loading the previous model?

System Info

Library installed using pip
Python version 3.6.9
Tensorflow version 1.14.0

The text was updated successfully, but these errors were encountered:

araffin · 2020-04-03T14:09:40Z

Related: #301 #692
for continuing the tensorboard log, this is a known plotting bug (I need to find the issue again)

Also, you should use a Monitor wrapper to have access to the original reward, so you can compare runs. The plotted reward is the normalized one, you cannot compare run with it.

Did you try using the rl zoo?

k0rean · 2020-04-03T14:20:36Z

I looked at that issues but didn't find the solution. That's not critical anyway.
I'm not normalizing rewards with VecNormalize, only the observations. So that's not the problem for the discontinuity.
No I didn't , why?

njanirudh · 2021-03-12T21:14:27Z

@k0rean any solution to this problem?

Miffyli · 2021-03-12T21:48:41Z

@njanirudh I do not have direct answer, but if possible, try out stable-baselines3 and see if it helps with your issues. It is more actively maintained so we can discuss and fix bugs there :)

rambo1111 · 2024-02-03T19:22:58Z

#1192

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[PPO2] problems resuming training #781

[PPO2] problems resuming training #781

k0rean commented Apr 3, 2020

araffin commented Apr 3, 2020

k0rean commented Apr 3, 2020

njanirudh commented Mar 12, 2021

Miffyli commented Mar 12, 2021

rambo1111 commented Feb 3, 2024

[PPO2] problems resuming training #781

[PPO2] problems resuming training #781

Comments

k0rean commented Apr 3, 2020

araffin commented Apr 3, 2020

k0rean commented Apr 3, 2020

njanirudh commented Mar 12, 2021

Miffyli commented Mar 12, 2021

rambo1111 commented Feb 3, 2024