You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have been running reinforcement learning and multi-agent RL for one of my projects by implementing a custom env. One of the crucial requirements for my project is that agents have distinct action spaces, e.g. [Discrete(2), Discrete(3), ...] , where the action spaces are completely disjoint. I observed that the default behavior in MARL is that all the agents's policies are constructed as per the definition of the action_space attribute in the env, which makes it impossible to achieve the behavior I desire. I also know that the actions can be masked via action_mask, but I feel that would be a nasty workaround when there is a clear structure to be exploited here.
On reading the documentation, I learned that MARLlib leverages raylib, which already supports defining custom/distinct policies. On digging the code, I found the following code to be relevant, where the policies are built before scheduling the ray job.
I was able to redefine the way policies are constructed so as to have distinct action spaces for individual agents. The hack is to just read the respective action spaces for each agent. env_info["space_act_test"] is an attribute I have defined as part of the env_info. Although this did work, I would like to know if modifying this bit of code could have adverse effects elsewhere.
Hi,
I have been running reinforcement learning and multi-agent RL for one of my projects by implementing a custom env. One of the crucial requirements for my project is that agents have distinct action spaces, e.g.
[Discrete(2), Discrete(3), ...]
, where the action spaces are completely disjoint. I observed that the default behavior in MARL is that all the agents's policies are constructed as per the definition of theaction_space
attribute in the env, which makes it impossible to achieve the behavior I desire. I also know that the actions can be masked viaaction_mask
, but I feel that would be a nasty workaround when there is a clear structure to be exploited here.On reading the documentation, I learned that MARLlib leverages
raylib
, which already supports defining custom/distinct policies. On digging the code, I found the following code to be relevant, where thepolicies
are built before scheduling the ray job.MARLlib/marllib/marl/algos/run_cc.py
Line 137 in 368c617
I was able to redefine the way policies are constructed so as to have distinct action spaces for individual agents. The hack is to just read the respective action spaces for each agent.
env_info["space_act_test"]
is an attribute I have defined as part of the env_info. Although this did work, I would like to know if modifying this bit of code could have adverse effects elsewhere.The text was updated successfully, but these errors were encountered: