Reinforcement Learning Fine-tuning #6
Martyn0324
started this conversation in
Ideas
Replies: 1 comment
-
Hi @Martyn0324, thank you for the suggestion and insight! I do want to try training/finetuning the models in an RL configuration eventually. However, I have zero knowledge of RL and will need to read up on a lot: so, thank you for the links, they will be very helpful. I'm currently focusing on building up the basic codebase (which is very much unfinished) and training supervised models slightly larger than the small models I currently have as those were mainly experimental. In the meantime, if you want to try anything with RL, let me know and I'll try to help in any way I can regarding explaining the current code, etc. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hey! I really like the idea of using Transformers to play a game, especially since the rumours about OpenAI's Q* began.
I'm looking forward to studying your code, but it seems that you're using the model in a Supervised Learning configuration, but I'm also interested in implementing it in Reinforcement Learning configuration. However, you seem to understand much more of programming than I do, so I'd like to suggest you that.
It seems that it could be possible to use Transformer to implement a Q-Learning algorithm, where the model tries to predict the value for each possible action it could do. The highest value could be the action to be used in the game.
Probably it wouldn't require any change to the Transformer architecture, but to the way it's trained:
https://github.com/hill-a/stable-baselines/blob/master/stable_baselines/deepq/dqn.py - Deep Q-Learning code from Stable Baselines
https://github.com/saashanair/rl-series/blob/master/dqn/dqn_agent.py - Implementation of Deep Q-Learning from scratch in Pytorch
https://pytorch.org/tutorials/intermediate/reinforcement_q_learning.html - Pytorch tutorial on Reinforcement Learning using Deep Q-Learning.
https://www.cs.toronto.edu/~vmnih/docs/dqn.pdf - Deep Q-Learning original paper.
https://lilianweng.github.io/posts/2018-02-19-rl-overview/#q-learning-off-policy-td-control - Lilian Weng's blog about Reinforcement Learning.
Since it seems that the model can already be trained in a supervised learning, and can also be submitted to play against a human, it could be pre-trained in a supervised learning manner and then fine-tuned with Reinforcement Learning. This process appears to be the most efficient approach to implementing RL.
https://www.alexirpan.com/2018/02/14/rl-hard.html
Beta Was this translation helpful? Give feedback.
All reactions