-
Notifications
You must be signed in to change notification settings - Fork 74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Running the code #1
Comments
hi there, regarding fixed trajectory: this is due to the arcade learning environment (ALE) largely being deterministic, and the subgoal policies learned are also deterministic (it is a variant of double deep Q learning for each subgoal). Doesn't mean that it is a bad strategy. Of course you could swap it with some other stochastic policies for the lower-level policies. Regarding h-DQN baseline comparison: Let me clean up my baseline code and I will put them up as well. The summary is that it mostly doesn't learn anything useful for games like Montezuma's Revenge. |
@nanxintin Did you use python 2 to run the code? |
@moonsh I'm sorry that I can not remember yet. |
Yes I did use python 2.7 to run the code back then |
Hi,
This is a meaningful work to combine imitation learning and reinforcement learning in a hierarchical architecture to solve Montezuma’s Revenge.
I successfully run the code to train hybrid_rl_il_agent. When I test the well-trained model, I find the agent makes the same actions every episode. It seems that the agent follows a completely fixed trajectory to play the game without some adaptation. Is this a good strategy for the agent?
And then I want to train the h-DQN agent as a comparison, but I cannot find the right code to do this. Can you give me some advice to start the training?
Thanks.
The text was updated successfully, but these errors were encountered: