From 3811330d8036fba0f7752a602d911f95c0741135 Mon Sep 17 00:00:00 2001 From: Jan Koch Date: Fri, 16 Sep 2022 08:58:48 +0200 Subject: [PATCH 1/2] Fixes wrong reference the link actually references `https://spinningup.openai.com/en/latest/spinningup/rl_intro3.html#id16` and not `spinningup/extra_pg_proof1.html` --- docs/spinningup/rl_intro3.rst | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/docs/spinningup/rl_intro3.rst b/docs/spinningup/rl_intro3.rst index 34e4d5d57..e2774b572 100644 --- a/docs/spinningup/rl_intro3.rst +++ b/docs/spinningup/rl_intro3.rst @@ -338,9 +338,8 @@ is called the **reward-to-go** from that point, and this policy gradient express **But how is this better?** A key problem with policy gradients is how many sample trajectories are needed to get a low-variance sample estimate for them. The formula we started with included terms for reinforcing actions proportional to past rewards, all of which had zero mean, but nonzero variance: as a result, they would just add noise to sample estimates of the policy gradient. By removing them, we reduce the number of sample trajectories needed. -An (optional) proof of this claim can be found `here`_, and it ultimately depends on the EGLP lemma. +An (optional) proof of this claim can be found `here <../spinningup/extra_pg_proof1.html>`_, and it ultimately depends on the EGLP lemma. -.. _`here`: ../spinningup/extra_pg_proof1.html Implementing Reward-to-Go Policy Gradient ========================================= @@ -474,4 +473,4 @@ In this chapter, we described the basic theory of policy gradient methods and co .. _`advantage of an action`: ../spinningup/rl_intro.html#advantage-functions .. _`this page`: ../spinningup/extra_pg_proof2.html .. _`Generalized Advantage Estimation`: https://arxiv.org/abs/1506.02438 -.. _`Vanilla Policy Gradient`: ../algorithms/vpg.html \ No newline at end of file +.. _`Vanilla Policy Gradient`: ../algorithms/vpg.html From dd7a8e5acad67f88e10815a8ba8c2147b59d2ac1 Mon Sep 17 00:00:00 2001 From: Jan Koch Date: Fri, 16 Sep 2022 09:04:21 +0200 Subject: [PATCH 2/2] Update rl_intro3.rst