-
Notifications
You must be signed in to change notification settings - Fork 5
Current progress & Near‐term plan
Itomigna2 edited this page Mar 18, 2024
·
5 revisions
- This version uses CNN based representation network + LSTM based dynamics network architecture for RGB input environment (LunarLander-v2, using rgb states wrapped by PixelObservationWrapper(self.env))
- It works for randomly initialized env(random terrain and acceleration), but still not perfectly converge to more than 200 score.
- Training(non-fixed env seed) takes several hours, longer experiment length than 5000, with Deep CNN.
- Training(fixed env seed) takes up to 1 hour, experiment length 200~2000, with Deep CNN. Agent can learn perfectly in this case.
- It needs to be improved further.
- Write the docs on the wiki.
- Check the nni config setting and try to fix the issue#5.
- Efficiency optimization about code and nni experiment setting.
- Try to use off-policy correction method like V-trace, Retrace.