From 3811330d8036fba0f7752a602d911f95c0741135 Mon Sep 17 00:00:00 2001
From: Jan Koch <Jan.Koch@tu-dortmund.de>
Date: Fri, 16 Sep 2022 08:58:48 +0200
Subject: [PATCH 1/2] Fixes wrong reference

the link actually references `https://spinningup.openai.com/en/latest/spinningup/rl_intro3.html#id16` and not `spinningup/extra_pg_proof1.html`
---
 docs/spinningup/rl_intro3.rst | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/docs/spinningup/rl_intro3.rst b/docs/spinningup/rl_intro3.rst
index 34e4d5d57..e2774b572 100644
--- a/docs/spinningup/rl_intro3.rst
+++ b/docs/spinningup/rl_intro3.rst
@@ -338,9 +338,8 @@ is called the **reward-to-go** from that point, and this policy gradient express
 
     **But how is this better?** A key problem with policy gradients is how many sample trajectories are needed to get a low-variance sample estimate for them. The formula we started with included terms for reinforcing actions proportional to past rewards, all of which had zero mean, but nonzero variance: as a result, they would just add noise to sample estimates of the policy gradient. By removing them, we reduce the number of sample trajectories needed.
 
-An (optional) proof of this claim can be found `here`_, and it ultimately depends on the EGLP lemma.
+An (optional) proof of this claim can be found `here <../spinningup/extra_pg_proof1.html>`_, and it ultimately depends on the EGLP lemma.
 
-.. _`here`: ../spinningup/extra_pg_proof1.html
 
 Implementing Reward-to-Go Policy Gradient
 =========================================
@@ -474,4 +473,4 @@ In this chapter, we described the basic theory of policy gradient methods and co
 .. _`advantage of an action`: ../spinningup/rl_intro.html#advantage-functions
 .. _`this page`: ../spinningup/extra_pg_proof2.html
 .. _`Generalized Advantage Estimation`: https://arxiv.org/abs/1506.02438
-.. _`Vanilla Policy Gradient`: ../algorithms/vpg.html
\ No newline at end of file
+.. _`Vanilla Policy Gradient`: ../algorithms/vpg.html

From dd7a8e5acad67f88e10815a8ba8c2147b59d2ac1 Mon Sep 17 00:00:00 2001
From: Jan Koch <Jan.Koch@tu-dortmund.de>
Date: Fri, 16 Sep 2022 09:04:21 +0200
Subject: [PATCH 2/2] Update rl_intro3.rst