Running the code #1

nanxintin · 2018-04-17T03:12:36Z

Hi,
This is a meaningful work to combine imitation learning and reinforcement learning in a hierarchical architecture to solve Montezuma’s Revenge.
I successfully run the code to train hybrid_rl_il_agent. When I test the well-trained model, I find the agent makes the same actions every episode. It seems that the agent follows a completely fixed trajectory to play the game without some adaptation. Is this a good strategy for the agent?
And then I want to train the h-DQN agent as a comparison, but I cannot find the right code to do this. Can you give me some advice to start the training?
Thanks.

hoangminhle · 2018-04-25T22:42:17Z

hi there, regarding fixed trajectory: this is due to the arcade learning environment (ALE) largely being deterministic, and the subgoal policies learned are also deterministic (it is a variant of double deep Q learning for each subgoal). Doesn't mean that it is a bad strategy. Of course you could swap it with some other stochastic policies for the lower-level policies.

Regarding h-DQN baseline comparison: Let me clean up my baseline code and I will put them up as well. The summary is that it mostly doesn't learn anything useful for games like Montezuma's Revenge.

moonsh · 2020-01-26T20:41:33Z

@nanxintin Did you use python 2 to run the code?

nanxintin · 2020-01-29T14:00:16Z

@moonsh I'm sorry that I can not remember yet.

hoangminhle · 2020-01-29T16:41:16Z

Yes I did use python 2.7 to run the code back then

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Running the code #1

Running the code #1

nanxintin commented Apr 17, 2018

hoangminhle commented Apr 25, 2018

moonsh commented Jan 26, 2020

nanxintin commented Jan 29, 2020

hoangminhle commented Jan 29, 2020

Running the code #1

Running the code #1

Comments

nanxintin commented Apr 17, 2018

hoangminhle commented Apr 25, 2018

moonsh commented Jan 26, 2020

nanxintin commented Jan 29, 2020

hoangminhle commented Jan 29, 2020