You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am re-implementing your interesting work. I have some problems on the Montezuma's Revenge task. During training, in run_hybrid_atari_experiment.py, you used Hdqn(GPU) as the subgoal network, but for testing, in test_model.py, you used another network architecture Net() as the subgoal network. Why are they not consistent? Could you please upload the trained weights and the code for using Hdapn(GPU) in testing?
Also, I notice that in testing, the trained meta controller is actually not used. Instead, the subgoals are manually set and each subgoal is achieved by a simple_net, which seems not surpassing a supervised method that using imitation learning to learn to achieve each fixed subgoal under a fixed environment. Could you explain the generalizability of the method? Thanks!
The text was updated successfully, but these errors were encountered:
SamitHuang
changed the title
inconsistency between the models for training and testing.
Inconsistency between the models for training and testing.
Apr 24, 2019
Sorry for the late reply. Not sure if I'm missing something from your question, but I'm pretty sure the trained meta controller was used using testing. I'll double check and may upload the network weights once I find it
I am re-implementing your interesting work. I have some problems on the Montezuma's Revenge task. During training, in run_hybrid_atari_experiment.py, you used Hdqn(GPU) as the subgoal network, but for testing, in test_model.py, you used another network architecture Net() as the subgoal network. Why are they not consistent? Could you please upload the trained weights and the code for using Hdapn(GPU) in testing?
Also, I notice that in testing, the trained meta controller is actually not used. Instead, the subgoals are manually set and each subgoal is achieved by a simple_net, which seems not surpassing a supervised method that using imitation learning to learn to achieve each fixed subgoal under a fixed environment. Could you explain the generalizability of the method? Thanks!
The text was updated successfully, but these errors were encountered: