-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature Request] discrete soft actor-critic #505
Comments
Hey. We would indeed be interested in having those algos! These less-known/newer algorithms should go to the contrib repository, to avoid bloating up / complicating this core code too much. If you or @ku2482 wants to implement discrete SAC, feel free to open a PR in that repo and we can discuss further :) Edit: If discrete-actions support for SAC is trivial (i.e. easy to add along with continuous actions) and there is a good, established paper that details an implementation, we may consider adding it here to the main repository, but I feel it would change code all around quite a bit. |
Hi, thanks for the reply Miffyli! I think there is indeed such a paper detailing the structure. It seems there are no big changes from the original SAC except discrete action support. But after having tinkered with the code a little bit myself , I don't have a clear idea whether it would be trivial to be integrated into SB3 . But I think it would definitely provide a lot of ease for implementing the recently popular algorithms such as all of the algorithms in this paper. |
Hello, |
Closing this one as it belongs to SB3 contrib (please open one there). |
🚀 Feature
A discrete version of Soft actor-critic.
Motivation
I have been using SB3 quite heavily recently and found that there is no (correct me if I made a mistake) discrete off-policy actor-critic algorithm , which can serve as a prototype for implementing various algorithms of research interest.
I have been researching entropy-regularized RL algorithms and I believe with such a prototype many more algorithms could be developed by simply changing the Shannon entropy to something else (e.g. KL divergence or alpha divergence). For example. all of the algorithms (more than 10) in this paper could be implemented based on such prototype.
I also found that this repo has a good implementation and the author has mentioned the possibility of contributing #157 . I have also talked with its author @ku2482 who still thinks it is important and willing to contribute.
Pitch
A discrete version of soft actor-critic could serve as prototype for implementing various algorithms of research interest, especially policy iteration style algorithms.
Alternatives
I have thought about several alternatives but currently I have no better idea to circumvent the need for such a prototype entropy-regularized discrete off-policy actor-critic algorithm.
### Checklist
The text was updated successfully, but these errors were encountered: