textbook 10 #166

xiaowuhu · 2022-06-13T08:27:48Z

以图10.1.1举例，这里没有a,s,r的符号，读者无法和后面的解释对应起来
10.1.2 的Gt公式是可以有展开形式的，如果写出来会更容易理解
$\gamma$ 我记得是 (0,1], 不是 [0,1)
$\pi (a|s)=p(a_t=a|s_t=s)$ 这类的公式（我认为的）标准写法是 $\pi (a|s)=p(A_t=a|S_t=s)$
10.1 参考文献内容太长，与正文不成比例
10.1.2 可能需要一个三个圆叠加的图来展示三者的关系
10.1.1 和 10.1.2 请放在一个md文件中
10.2.2 中为什么会有四级索引序号的图？比如图10.2.2.1，可以改成 10.2.1
10.2.2 和 10.2.3 可以考虑合并为 10.3，这样篇幅和内容上都可以和 10.2.1（变成10.2）来匹配

xuehui1991 · 2022-07-13T03:47:27Z

Thanks for your comments.
As for the the range of gamma, I check the the book[1] and this book define the gamma ~[0, 1].
However, setting the value of gamma as 1 will make the model hard to converge.
You could reference this discussion .
Therefore, I will reference the book[1] and revise this part.

[1] Sutton R S, Barto A G. Reinforcement learning: An introduction[M]. MIT press, 2018.

YanjieGao assigned xuehui1991 Jun 13, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

textbook 10 #166

textbook 10 #166

xiaowuhu commented Jun 13, 2022

xuehui1991 commented Jul 13, 2022 •

edited

Loading

textbook 10 #166

textbook 10 #166

Comments

xiaowuhu commented Jun 13, 2022

xuehui1991 commented Jul 13, 2022 • edited Loading

xuehui1991 commented Jul 13, 2022 •

edited

Loading