-
Notifications
You must be signed in to change notification settings - Fork 949
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add tabular MARL algorithms including Nash-Q and Correlated-Q #821
Conversation
Hi @rezunli96, The tests are failing for the same mysterious reasons as the ones in #818 (which are likely unrelated to your changes). I think you will have to wait until we merge #807 which upgrades the JAX dependency versions. I'm on vacation at the moment, but will import and merge that PR next week when I'm back and will let you know, then you can merge with master and the tests will likely pass. |
Hi @rezunli96, the CI tests are now fixed on the master branch. Can you pull changes and push the merge to trigger a new invocation of the tests? Thanks! |
Also can you edit the name of this PR to better reflect what algorithms it's adding? Is it tabular CE-Q and/or NashQ ? |
Thank you! @lanctot The check is done. Yes it is trying to add general joint-action q-learner including Nash-Q and Correlated-Q. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apologies for the delay! I left some initial comments, I'll check again later.
from open_spiel.python import rl_agent | ||
from open_spiel.python import rl_tools | ||
from open_spiel.python.algorithms.jpsro import _mgce, _mgcce | ||
from open_spiel.python.examples.matrix_nash import lemke_howson_solve |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should not be importing from examples. Can you refactor the example to move lemke_howson_solve to its own file? (Maybe python/algorithms/lemke_howson.py)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure! Currently I made a file algorithms/matrix_nash and modified the original example/matrix_nash to an _example file so that we can import lemke_howson from other places.
def __call__(self, q_tables, action_dims): | ||
''' | ||
Args: | ||
q_tables: a list of Q tables for each agent |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the type of the q table?
Any reason to not have not have these be 3-dimensional (player, row, col) numpy array where the indices correspond to indices in each player's legal moves list? That's how we normally do it elsewhere, and we would not have to pass in action_dimcs and it's generally easier to work with numpy arrays
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes it makes sense. Now I just directly make this action_dims as an explicit input argument for the MAQagent (num_of_actions, which is a list now) so a JointActionSolver won't need it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. Looking good! I added a few more comments.
Two other things:
-
Can you rename tabular_maqlearner -> tabular_joint_action_learner (or tabular_multiagent_qlearner, or tabular_markov_qlearner)
-
Can you add a small test (something similar to the example). It can be a very simple pathfinding instance, like a 3x3 map, and just verifies that the agents learn to go to their destinations after 100 or so episodes?
Should be ok to import after these two and other comments
Can I ask for one more test on rock paper scissors to check that it computes a mixed strstegy and gets near the equilibrium value? |
Sure! Just added a rps test to check whether Nash-Q learners learned the correct equilibrium and values. |
Hello, I am trying to implement classical tabular-based MARL algorithms (NashQ, Correlated-Q). I adapted the code in tabular_qlearner and several other utility functions in openspiel. Please let me know if there is anything issue. Thanks!