[Feature Request] discrete soft actor-critic #505

lingweizhu · 2021-07-07T10:04:49Z

🚀 Feature

A discrete version of Soft actor-critic.

Motivation

I have been using SB3 quite heavily recently and found that there is no (correct me if I made a mistake) discrete off-policy actor-critic algorithm , which can serve as a prototype for implementing various algorithms of research interest.

I have been researching entropy-regularized RL algorithms and I believe with such a prototype many more algorithms could be developed by simply changing the Shannon entropy to something else (e.g. KL divergence or alpha divergence). For example. all of the algorithms (more than 10) in this paper could be implemented based on such prototype.

I also found that this repo has a good implementation and the author has mentioned the possibility of contributing #157 . I have also talked with its author @ku2482 who still thinks it is important and willing to contribute.

Pitch

A discrete version of soft actor-critic could serve as prototype for implementing various algorithms of research interest, especially policy iteration style algorithms.

Alternatives

I have thought about several alternatives but currently I have no better idea to circumvent the need for such a prototype entropy-regularized discrete off-policy actor-critic algorithm.

### Checklist

I have checked that there is no similar issue in the repo (required)

Miffyli · 2021-07-07T10:44:26Z

Hey. We would indeed be interested in having those algos! These less-known/newer algorithms should go to the contrib repository, to avoid bloating up / complicating this core code too much. If you or @ku2482 wants to implement discrete SAC, feel free to open a PR in that repo and we can discuss further :)

Edit: If discrete-actions support for SAC is trivial (i.e. easy to add along with continuous actions) and there is a good, established paper that details an implementation, we may consider adding it here to the main repository, but I feel it would change code all around quite a bit.

lingweizhu · 2021-07-07T11:02:46Z

Hi, thanks for the reply Miffyli!

I think there is indeed such a paper detailing the structure. It seems there are no big changes from the original SAC except discrete action support. But after having tinkered with the code a little bit myself , I don't have a clear idea whether it would be trivial to be integrated into SB3 .

But I think it would definitely provide a lot of ease for implementing the recently popular algorithms such as all of the algorithms in this paper.

araffin · 2021-07-07T12:00:16Z

Hello,
as mentioned already in #157, I would prefer it to be in contrib first.
Please open an issue there and make sure to read the contributing guide of SB3 contrib if you want to implement the algorithm and submit a PR ;)

araffin · 2021-07-13T08:58:06Z

Closing this one as it belongs to SB3 contrib (please open one there).

lingweizhu added the enhancement New feature or request label Jul 7, 2021

araffin added the duplicate This issue or pull request already exists label Jul 7, 2021

araffin closed this as completed Jul 13, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature Request] discrete soft actor-critic #505

[Feature Request] discrete soft actor-critic #505

lingweizhu commented Jul 7, 2021 •

edited

Loading

Miffyli commented Jul 7, 2021 •

edited

Loading

lingweizhu commented Jul 7, 2021 •

edited

Loading

araffin commented Jul 7, 2021

araffin commented Jul 13, 2021

[Feature Request] discrete soft actor-critic #505

[Feature Request] discrete soft actor-critic #505

Comments

lingweizhu commented Jul 7, 2021 • edited Loading

🚀 Feature

Motivation

Pitch

Alternatives

Miffyli commented Jul 7, 2021 • edited Loading

lingweizhu commented Jul 7, 2021 • edited Loading

araffin commented Jul 7, 2021

araffin commented Jul 13, 2021

lingweizhu commented Jul 7, 2021 •

edited

Loading

Miffyli commented Jul 7, 2021 •

edited

Loading

lingweizhu commented Jul 7, 2021 •

edited

Loading