Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add tabular MARL algorithms including Nash-Q and Correlated-Q #821

Merged
merged 8 commits into from
May 23, 2022

Conversation

rezunli96
Copy link
Contributor

Hello, I am trying to implement classical tabular-based MARL algorithms (NashQ, Correlated-Q). I adapted the code in tabular_qlearner and several other utility functions in openspiel. Please let me know if there is anything issue. Thanks!

@lanctot
Copy link
Collaborator

lanctot commented Apr 23, 2022

Hi @rezunli96,

The tests are failing for the same mysterious reasons as the ones in #818 (which are likely unrelated to your changes). I think you will have to wait until we merge #807 which upgrades the JAX dependency versions. I'm on vacation at the moment, but will import and merge that PR next week when I'm back and will let you know, then you can merge with master and the tests will likely pass.

@lanctot
Copy link
Collaborator

lanctot commented Apr 26, 2022

Hi @rezunli96, the CI tests are now fixed on the master branch. Can you pull changes and push the merge to trigger a new invocation of the tests? Thanks!

@lanctot
Copy link
Collaborator

lanctot commented Apr 26, 2022

Also can you edit the name of this PR to better reflect what algorithms it's adding? Is it tabular CE-Q and/or NashQ ?

@rezunli96
Copy link
Contributor Author

rezunli96 commented Apr 26, 2022

Thank you!

@lanctot The check is done. Yes it is trying to add general joint-action q-learner including Nash-Q and Correlated-Q.

@rezunli96 rezunli96 changed the title Add tabular MARL algorithm Add tabular MARL algorithm including Nash-Q and Correlated-Q Apr 28, 2022
@rezunli96 rezunli96 changed the title Add tabular MARL algorithm including Nash-Q and Correlated-Q Add tabular MARL algorithms including Nash-Q and Correlated-Q Apr 28, 2022
Copy link
Collaborator

@lanctot lanctot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies for the delay! I left some initial comments, I'll check again later.

open_spiel/python/algorithms/tabular_maqlearner.py Outdated Show resolved Hide resolved
from open_spiel.python import rl_agent
from open_spiel.python import rl_tools
from open_spiel.python.algorithms.jpsro import _mgce, _mgcce
from open_spiel.python.examples.matrix_nash import lemke_howson_solve
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should not be importing from examples. Can you refactor the example to move lemke_howson_solve to its own file? (Maybe python/algorithms/lemke_howson.py)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure! Currently I made a file algorithms/matrix_nash and modified the original example/matrix_nash to an _example file so that we can import lemke_howson from other places.

def __call__(self, q_tables, action_dims):
'''
Args:
q_tables: a list of Q tables for each agent
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the type of the q table?

Any reason to not have not have these be 3-dimensional (player, row, col) numpy array where the indices correspond to indices in each player's legal moves list? That's how we normally do it elsewhere, and we would not have to pass in action_dimcs and it's generally easier to work with numpy arrays

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes it makes sense. Now I just directly make this action_dims as an explicit input argument for the MAQagent (num_of_actions, which is a list now) so a JointActionSolver won't need it

open_spiel/python/algorithms/tabular_maqlearner.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@lanctot lanctot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Looking good! I added a few more comments.

Two other things:

  • Can you rename tabular_maqlearner -> tabular_joint_action_learner (or tabular_multiagent_qlearner, or tabular_markov_qlearner)

  • Can you add a small test (something similar to the example). It can be a very simple pathfinding instance, like a 3x3 map, and just verifies that the agents learn to go to their destinations after 100 or so episodes?

Should be ok to import after these two and other comments

open_spiel/python/algorithms/tabular_maqlearner.py Outdated Show resolved Hide resolved
open_spiel/python/examples/marl_example.py Outdated Show resolved Hide resolved
open_spiel/python/examples/marl_example.py Outdated Show resolved Hide resolved
open_spiel/python/examples/marl_example.py Show resolved Hide resolved
@lanctot
Copy link
Collaborator

lanctot commented May 9, 2022

Can I ask for one more test on rock paper scissors to check that it computes a mixed strstegy and gets near the equilibrium value?

@rezunli96
Copy link
Contributor Author

Sure! Just added a rps test to check whether Nash-Q learners learned the correct equilibrium and values.

@lanctot lanctot added the imported This PR has been imported and awaiting internal review. Please avoid any more local changes, thanks! label May 20, 2022
@OpenSpiel OpenSpiel merged commit b1468a1 into google-deepmind:master May 23, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
imported This PR has been imported and awaiting internal review. Please avoid any more local changes, thanks!
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants