Reinforcement Learning Fine-tuning #6

Martyn0324 · 2024-02-13T00:54:18Z

Martyn0324
Feb 13, 2024

Hey! I really like the idea of using Transformers to play a game, especially since the rumours about OpenAI's Q* began.
I'm looking forward to studying your code, but it seems that you're using the model in a Supervised Learning configuration, but I'm also interested in implementing it in Reinforcement Learning configuration. However, you seem to understand much more of programming than I do, so I'd like to suggest you that.

It seems that it could be possible to use Transformer to implement a Q-Learning algorithm, where the model tries to predict the value for each possible action it could do. The highest value could be the action to be used in the game.
Probably it wouldn't require any change to the Transformer architecture, but to the way it's trained:

https://github.com/hill-a/stable-baselines/blob/master/stable_baselines/deepq/dqn.py - Deep Q-Learning code from Stable Baselines
https://github.com/saashanair/rl-series/blob/master/dqn/dqn_agent.py - Implementation of Deep Q-Learning from scratch in Pytorch
https://pytorch.org/tutorials/intermediate/reinforcement_q_learning.html - Pytorch tutorial on Reinforcement Learning using Deep Q-Learning.
https://www.cs.toronto.edu/~vmnih/docs/dqn.pdf - Deep Q-Learning original paper.
https://lilianweng.github.io/posts/2018-02-19-rl-overview/#q-learning-off-policy-td-control - Lilian Weng's blog about Reinforcement Learning.

Since it seems that the model can already be trained in a supervised learning, and can also be submitted to play against a human, it could be pre-trained in a supervised learning manner and then fine-tuned with Reinforcement Learning. This process appears to be the most efficient approach to implementing RL.

https://www.alexirpan.com/2018/02/14/rl-hard.html

sgrvinod · 2024-02-14T00:02:56Z

sgrvinod
Feb 14, 2024
Maintainer

Hi @Martyn0324, thank you for the suggestion and insight! I do want to try training/finetuning the models in an RL configuration eventually. However, I have zero knowledge of RL and will need to read up on a lot: so, thank you for the links, they will be very helpful. I'm currently focusing on building up the basic codebase (which is very much unfinished) and training supervised models slightly larger than the small models I currently have as those were mainly experimental.

In the meantime, if you want to try anything with RL, let me know and I'll try to help in any way I can regarding explaining the current code, etc.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reinforcement Learning Fine-tuning #6

{{title}}

Replies: 1 comment

{{title}}

Select a reply

Reinforcement Learning Fine-tuning #6

Martyn0324 Feb 13, 2024

Replies: 1 comment

sgrvinod Feb 14, 2024 Maintainer

Martyn0324
Feb 13, 2024

sgrvinod
Feb 14, 2024
Maintainer