Robot Arm learning basketball using NFQ and Q-Learning
Q-learning is slightly rudimentary but it had quite successful results. It uses a table of all possible states and uses the following equuation to explore and discover an optimal policy:
NFQ uses a neural network to learn the Q values.
First a bunch of data is created using a random policy. Then a 2-layer neural net using PyTorch and a RPROP optimizer is created. Training was done target seen in the algorithm below.
There were a lot of issues and there are still are. Firstly and most easily fixed is to change to a dynamic alpha and exploration value (epsilon) in the q-table variant. Much much more work can be done on the nfq side to create a better structured neural net as well tune the other various parameters.