Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] discrete soft actor-critic #505

Closed
1 task done
lingweizhu opened this issue Jul 7, 2021 · 4 comments
Closed
1 task done

[Feature Request] discrete soft actor-critic #505

lingweizhu opened this issue Jul 7, 2021 · 4 comments
Labels
duplicate This issue or pull request already exists enhancement New feature or request

Comments

@lingweizhu
Copy link

lingweizhu commented Jul 7, 2021

🚀 Feature

A discrete version of Soft actor-critic.

Motivation

I have been using SB3 quite heavily recently and found that there is no (correct me if I made a mistake) discrete off-policy actor-critic algorithm , which can serve as a prototype for implementing various algorithms of research interest.

I have been researching entropy-regularized RL algorithms and I believe with such a prototype many more algorithms could be developed by simply changing the Shannon entropy to something else (e.g. KL divergence or alpha divergence). For example. all of the algorithms (more than 10) in this paper could be implemented based on such prototype.

I also found that this repo has a good implementation and the author has mentioned the possibility of contributing #157 . I have also talked with its author @ku2482 who still thinks it is important and willing to contribute.

Pitch

A discrete version of soft actor-critic could serve as prototype for implementing various algorithms of research interest, especially policy iteration style algorithms.

Alternatives

I have thought about several alternatives but currently I have no better idea to circumvent the need for such a prototype entropy-regularized discrete off-policy actor-critic algorithm.

### Checklist

  • I have checked that there is no similar issue in the repo (required)
@lingweizhu lingweizhu added the enhancement New feature or request label Jul 7, 2021
@Miffyli
Copy link
Collaborator

Miffyli commented Jul 7, 2021

Hey. We would indeed be interested in having those algos! These less-known/newer algorithms should go to the contrib repository, to avoid bloating up / complicating this core code too much. If you or @ku2482 wants to implement discrete SAC, feel free to open a PR in that repo and we can discuss further :)

Edit: If discrete-actions support for SAC is trivial (i.e. easy to add along with continuous actions) and there is a good, established paper that details an implementation, we may consider adding it here to the main repository, but I feel it would change code all around quite a bit.

@lingweizhu
Copy link
Author

lingweizhu commented Jul 7, 2021

Hi, thanks for the reply Miffyli!

I think there is indeed such a paper detailing the structure. It seems there are no big changes from the original SAC except discrete action support. But after having tinkered with the code a little bit myself , I don't have a clear idea whether it would be trivial to be integrated into SB3 .

But I think it would definitely provide a lot of ease for implementing the recently popular algorithms such as all of the algorithms in this paper.

@araffin
Copy link
Member

araffin commented Jul 7, 2021

Hello,
as mentioned already in #157, I would prefer it to be in contrib first.
Please open an issue there and make sure to read the contributing guide of SB3 contrib if you want to implement the algorithm and submit a PR ;)

@araffin araffin added the duplicate This issue or pull request already exists label Jul 7, 2021
@araffin
Copy link
Member

araffin commented Jul 13, 2021

Closing this one as it belongs to SB3 contrib (please open one there).

@araffin araffin closed this as completed Jul 13, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
duplicate This issue or pull request already exists enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants