diff --git a/doc/LectureNotes/_build/.doctrees/environment.pickle b/doc/LectureNotes/_build/.doctrees/environment.pickle index 1e6427201..6c07b31ff 100644 Binary files a/doc/LectureNotes/_build/.doctrees/environment.pickle and b/doc/LectureNotes/_build/.doctrees/environment.pickle differ diff --git a/doc/LectureNotes/_build/.doctrees/exercisesweek42.doctree b/doc/LectureNotes/_build/.doctrees/exercisesweek42.doctree index f0399fe60..13e7a21a5 100644 Binary files a/doc/LectureNotes/_build/.doctrees/exercisesweek42.doctree and b/doc/LectureNotes/_build/.doctrees/exercisesweek42.doctree differ diff --git a/doc/LectureNotes/_build/html/_sources/exercisesweek42.ipynb b/doc/LectureNotes/_build/html/_sources/exercisesweek42.ipynb index c6dd8e5a0..c24e52ac4 100644 --- a/doc/LectureNotes/_build/html/_sources/exercisesweek42.ipynb +++ b/doc/LectureNotes/_build/html/_sources/exercisesweek42.ipynb @@ -3,9 +3,7 @@ { "cell_type": "markdown", "id": "4b4c06bc", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ "\n", @@ -15,13 +13,11 @@ { "cell_type": "markdown", "id": "bcb25e64", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ "# Exercises week 42\n", "\n", - "**October 11-18, 2024**\n", + "**October 14-18, 2024**\n", "\n", "Date: **Deadline is Friday October 18 at midnight**\n" ] @@ -29,17 +25,17 @@ { "cell_type": "markdown", "id": "bb01f126", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ "# Overarching aims of the exercises this week\n", "\n", - "The aim of the exercises this week is to get started with implementing a neural network. There are a lot of technical and finicky parts of implementing a neutal network, so take your time.\n", + "This week, you will implement the entire feed-forward pass of a neural network! Next week you will compute the gradient of the network by implementing back-propagation manually, and by using autograd which does back-propagation for you (much easier!). Next week, you will also use the gradient to optimize the network with a gradient method! However, there is an optional exercise this week to get started on training the network and getting good results!\n", "\n", - "This week, you will implement only the feed-forward pass and updating the network parameters with simple gradient descent, the gradient will be computed using autograd using code we provide. Next week, you will implement backpropagation. We recommend that you do the exercises this week by editing and running this notebook file, as it includes some checks along the way that you have implemented the pieces of the feed-forward pass correctly, and running small parts of the code at a time will be important for understanding the methods.\n", + "We recommend that you do the exercises this week by editing and running this notebook file, as it includes some checks along the way that you have implemented the pieces of the feed-forward pass correctly, and running small parts of the code at a time will be important for understanding the methods.\n", "\n", - "If you have trouble running a notebook, you can run this notebook in google colab instead (https://colab.research.google.com/drive/1OCQm1tlTWB6hZSf9I7gGUgW9M8SbVeQu#offline=true&sandboxMode=true), an updated link will be provided on the course discord (you can also send an email to k.h.fredly@fys.uio.no if you encounter any trouble), though we recommend that you set up VSCode and your python environment to run code like this locally.\n" + "If you have trouble running a notebook, you can run this notebook in google colab instead (https://colab.research.google.com/drive/1zKibVQf-iAYaAn2-GlKfgRjHtLnPlBX4#offline=true&sandboxMode=true), an updated link will be provided on the course discord (you can also send an email to k.h.fredly@fys.uio.no if you encounter any trouble), though we recommend that you set up VSCode and your python environment to run code like this locally.\n", + "\n", + "First, here are some functions you are going to need, don't change this cell. If you are unable to import autograd, just swap in normal numpy until you want to do the final optional exercise.\n" ] }, { @@ -365,9 +361,9 @@ "metadata": {}, "outputs": [], "source": [ - "def feed_forward(input, layers, activations):\n", + "def feed_forward(input, layers, activation_funcs):\n", " a = input\n", - " for (W, b), activation in zip(layers, activations):\n", + " for (W, b), activation_func in zip(layers, activation_funcs):\n", " z = ...\n", " a = ...\n", " return a" @@ -378,9 +374,9 @@ "id": "8f7df363", "metadata": {}, "source": [ - "**b)** Make a list with three activation functions(don't call them yet! you can make a list with function names as elements, and then call these elements of the list later), two ReLU and one sigmoid. (If you add other functions than the ones defined at the start of the notebook, make sure everything is defined using autograd's numpy wrapper, like above, since we want to use automatic differentiation on all of these functions later.)\n", + "**b)** You are now given a list with three activation functions, two ReLU and one sigmoid. (Don't call them yet! you can make a list with function names as elements, and then call these elements of the list later. If you add other functions than the ones defined at the start of the notebook, make sure everything is defined using autograd's numpy wrapper, like above, since we want to use automatic differentiation on all of these functions later.)\n", "\n", - "Then evaluate a network with three layers and these activation functions.\n" + "Evaluate a network with three layers and these activation functions.\n" ] }, { @@ -392,11 +388,19 @@ "source": [ "network_input_size = ...\n", "layer_output_sizes = [...]\n", - "activations = [...]\n", + "activation_funcs = [ReLU, ReLU, sigmoid]\n", "layers = ...\n", "\n", "x = np.random.randn(network_input_size)\n", - "feed_forward(x, layers, activations)" + "feed_forward(x, layers, activation_funcs)" + ] + }, + { + "cell_type": "markdown", + "id": "9c914fd0", + "metadata": {}, + "source": [ + "**c)** How does the output of the network change if you use sigmoid in the hidden layers and ReLU in the output layer?\n" ] }, { @@ -463,9 +467,9 @@ "inputs = np.random.rand(1000, 4)\n", "\n", "\n", - "def feed_forward_batch(inputs, layers, activations):\n", + "def feed_forward_batch(inputs, layers, activation_funcs):\n", " a = inputs\n", - " for (W, b), activation in zip(layers, activations):\n", + " for (W, b), activation_func in zip(layers, activation_funcs):\n", " z = ...\n", " a = ...\n", " return a" @@ -488,11 +492,11 @@ "source": [ "network_input_size = ...\n", "layer_output_sizes = [...]\n", - "activations = [...]\n", + "activation_funcs = [...]\n", "layers = create_layers_batch(network_input_size, layer_output_sizes)\n", "\n", "x = np.random.randn(network_input_size)\n", - "feed_forward_batch(inputs, layers, activations)" + "feed_forward_batch(inputs, layers, activation_funcs)" ] }, { @@ -567,7 +571,7 @@ "id": "0362c4a9", "metadata": {}, "source": [ - "**a)** What should the input size for the network be with this dataset? What should the output shape of the last layer be?\n" + "**a)** What should the input size for the network be with this dataset? What should the output size of the last layer be?\n" ] }, { @@ -604,7 +608,7 @@ "metadata": {}, "outputs": [], "source": [ - "predictions = feed_forward_batch(inputs, layers, activations)" + "predictions = feed_forward_batch(inputs, layers, activation_funcs)" ] }, { @@ -630,7 +634,7 @@ "id": "334560b6", "metadata": {}, "source": [ - "# Exercise 6 - Training on real data\n", + "# Exercise 7 - Training on real data (Optional)\n", "\n", "To be able to actually do anything useful with your neural network, you need to train it. For this, we need a cost function and a way to take the gradient of the cost function wrt. the network parameters. The following exercises guide you through taking the gradient using autograd, and updating the network parameters using the gradient. Feel free to implement gradient methods like ADAM if you finish everything.\n" ] @@ -640,31 +644,58 @@ "id": "700cabe4", "metadata": {}, "source": [ - "The cross-entropy loss function can evaluate performance on classification tasks. It sees if your prediction is \"most certain\" on the correct target.\n" + "Since we are doing a classification task with multiple output classes, we use the cross-entropy loss function, which can evaluate performance on classification tasks. It sees if your prediction is \"most certain\" on the correct target.\n" ] }, { "cell_type": "code", "execution_count": null, - "id": "56bef776", + "id": "f30e6e2c", "metadata": {}, "outputs": [], "source": [ - "from autograd import grad\n", - "\n", + "def cross_entropy(predict, target):\n", + " return np.sum(-target * np.log(predict))\n", "\n", - "def cost(input, layers, activations, target):\n", - " predict = feed_forward_batch(input, layers, activations)\n", - " return cross_entropy(predict, target)\n", "\n", + "def cost(input, layers, activation_funcs, target):\n", + " predict = feed_forward_batch(input, layers, activation_funcs)\n", + " return cross_entropy(predict, target)" + ] + }, + { + "cell_type": "markdown", + "id": "7ea9c1a4", + "metadata": {}, + "source": [ + "To improve our network on whatever prediction task we have given it, we need to use a sensible cost function, take the gradient of that cost function with respect to our network parameters, the weights and biases, and then update the weights and biases using these gradients. To clarify, we need to find and use these\n", "\n", - "def cross_entropy(predict, target):\n", - " return np.sum(-target * np.log(predict))\n", + "$$\n", + "\\frac{\\partial C}{\\partial W}, \\frac{\\partial C}{\\partial b}\n", + "$$\n" + ] + }, + { + "cell_type": "markdown", + "id": "6c753e3b", + "metadata": {}, + "source": [ + "Now we need to compute these gradients. This is pretty hard to do for a neural network, we will use most of next week to do this, but we can also use autograd to just do it for us, which is what we always do in practice. With the code cell below, we create a function which takes all of these gradients for us.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "56bef776", + "metadata": {}, + "outputs": [], + "source": [ + "from autograd import grad\n", "\n", "\n", "gradient_func = grad(\n", - " cross_entropy, 1\n", - ") # Taking the gradient wrt. the second input to the cost function" + " cost, 1\n", + ") # Taking the gradient wrt. the second input to the cost function, i.e. the layers" ] }, { @@ -684,7 +715,9 @@ "metadata": {}, "outputs": [], "source": [ - "layers_grad = gradient_func(inputs, layers, activations, targets) # Don't change this" + "layers_grad = gradient_func(\n", + " inputs, layers, activation_funcs, targets\n", + ") # Don't change this" ] }, { @@ -703,10 +736,10 @@ "outputs": [], "source": [ "def train_network(\n", - " inputs, layers, activations, targets, learning_rate=0.001, epochs=100\n", + " inputs, layers, activation_funcs, targets, learning_rate=0.001, epochs=100\n", "):\n", " for i in range(epochs):\n", - " layers_grad = gradient_func(inputs, layers, activations, targets)\n", + " layers_grad = gradient_func(inputs, layers, activation_funcs, targets)\n", " for (W, b), (W_g, b_g) in zip(layers, layers_grad):\n", " W -= ...\n", " b -= ..." @@ -749,7 +782,7 @@ ], "metadata": { "kernelspec": { - "display_name": ".venv", + "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, @@ -763,7 +796,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.12.7" + "version": "3.9.15" } }, "nbformat": 4, diff --git a/doc/LectureNotes/_build/html/exercisesweek42.html b/doc/LectureNotes/_build/html/exercisesweek42.html index f4d0fc025..a4004e731 100644 --- a/doc/LectureNotes/_build/html/exercisesweek42.html +++ b/doc/LectureNotes/_build/html/exercisesweek42.html @@ -51,6 +51,8 @@ const thebe_selector_output = ".output, .cell_output" + + @@ -460,8 +462,8 @@
October 11-18, 2024
+October 14-18, 2024
Date: Deadline is Friday October 18 at midnight
The aim of the exercises this week is to get started with implementing a neural network. There are a lot of technical and finicky parts of implementing a neutal network, so take your time.
-This week, you will implement only the feed-forward pass and updating the network parameters with simple gradient descent, the gradient will be computed using autograd using code we provide. Next week, you will implement backpropagation. We recommend that you do the exercises this week by editing and running this notebook file, as it includes some checks along the way that you have implemented the pieces of the feed-forward pass correctly, and running small parts of the code at a time will be important for understanding the methods.
-If you have trouble running a notebook, you can run this notebook in google colab instead (https://colab.research.google.com/drive/1OCQm1tlTWB6hZSf9I7gGUgW9M8SbVeQu#offline=true&sandboxMode=true), an updated link will be provided on the course discord (you can also send an email to k.h.fredly@fys.uio.no if you encounter any trouble), though we recommend that you set up VSCode and your python environment to run code like this locally.
+This week, you will implement the entire feed-forward pass of a neural network! Next week you will compute the gradient of the network by implementing back-propagation manually, and by using autograd which does back-propagation for you (much easier!). Next week, you will also use the gradient to optimize the network with a gradient method! However, there is an optional exercise this week to get started on training the network and getting good results!
+We recommend that you do the exercises this week by editing and running this notebook file, as it includes some checks along the way that you have implemented the pieces of the feed-forward pass correctly, and running small parts of the code at a time will be important for understanding the methods.
+If you have trouble running a notebook, you can run this notebook in google colab instead (https://colab.research.google.com/drive/1zKibVQf-iAYaAn2-GlKfgRjHtLnPlBX4#offline=true&sandboxMode=true), an updated link will be provided on the course discord (you can also send an email to k.h.fredly@fys.uio.no if you encounter any trouble), though we recommend that you set up VSCode and your python environment to run code like this locally.
+First, here are some functions you are going to need, don’t change this cell. If you are unable to import autograd, just swap in normal numpy until you want to do the final optional exercise.
import autograd.numpy as np # We need to use this numpy wrapper to make automatic differentiation work later
@@ -762,9 +765,9 @@ Exercise 4 - Custom activation for each layerfeed_forward function which accepts a list of activation functions as an argument, and which evaluates these activation functions at each layer.
-def feed_forward(input, layers, activations):
+def feed_forward(input, layers, activation_funcs):
a = input
- for (W, b), activation in zip(layers, activations):
+ for (W, b), activation_func in zip(layers, activation_funcs):
z = ...
a = ...
return a
@@ -772,21 +775,22 @@ Exercise 4 - Custom activation for each layer
network_input_size = ...
layer_output_sizes = [...]
-activations = [...]
+activation_funcs = [ReLU, ReLU, sigmoid]
layers = ...
x = np.random.randn(network_input_size)
-feed_forward(x, layers, activations)
+feed_forward(x, layers, activation_funcs)
+c) How does the output of the network change if you use sigmoid in the hidden layers and ReLU in the output layer?
Exercise 5 - Processing multiple inputs at once¶
@@ -816,9 +820,9 @@ Exercise 5 - Processing multiple inputs at onceinputs = np.random.rand(1000, 4)
-def feed_forward_batch(inputs, layers, activations):
+def feed_forward_batch(inputs, layers, activation_funcs):
a = inputs
- for (W, b), activation in zip(layers, activations):
+ for (W, b), activation_func in zip(layers, activation_funcs):
z = ...
a = ...
return a
@@ -831,11 +835,11 @@ Exercise 5 - Processing multiple inputs at once
network_input_size = ...
layer_output_sizes = [...]
-activations = [...]
+activation_funcs = [...]
layers = create_layers_batch(network_input_size, layer_output_sizes)
x = np.random.randn(network_input_size)
-feed_forward_batch(inputs, layers, activations)
+feed_forward_batch(inputs, layers, activation_funcs)
@@ -881,7 +885,7 @@ Exercise 6 - Predicting on real data
@@ -894,7 +898,7 @@ Exercise 6 - Predicting on real data
-predictions = feed_forward_batch(inputs, layers, activations)
+predictions = feed_forward_batch(inputs, layers, activation_funcs)
@@ -908,27 +912,37 @@ Exercise 6 - Predicting on real data
-Exercise 6 - Training on real data¶
+
+Exercise 7 - Training on real data (Optional)¶
To be able to actually do anything useful with your neural network, you need to train it. For this, we need a cost function and a way to take the gradient of the cost function wrt. the network parameters. The following exercises guide you through taking the gradient using autograd, and updating the network parameters using the gradient. Feel free to implement gradient methods like ADAM if you finish everything.
-The cross-entropy loss function can evaluate performance on classification tasks. It sees if your prediction is “most certain” on the correct target.
+Since we are doing a classification task with multiple output classes, we use the cross-entropy loss function, which can evaluate performance on classification tasks. It sees if your prediction is “most certain” on the correct target.
-from autograd import grad
+def cross_entropy(predict, target):
+ return np.sum(-target * np.log(predict))
-def cost(input, layers, activations, target):
- predict = feed_forward_batch(input, layers, activations)
+def cost(input, layers, activation_funcs, target):
+ predict = feed_forward_batch(input, layers, activation_funcs)
return cross_entropy(predict, target)
-
-
-def cross_entropy(predict, target):
- return np.sum(-target * np.log(predict))
+
+
+
+
+To improve our network on whatever prediction task we have given it, we need to use a sensible cost function, take the gradient of that cost function with respect to our network parameters, the weights and biases, and then update the weights and biases using these gradients. To clarify, we need to find and use these
+
+\[
+\frac{\partial C}{\partial W}, \frac{\partial C}{\partial b}
+\]
+Now we need to compute these gradients. This is pretty hard to do for a neural network, we will use most of next week to do this, but we can also use autograd to just do it for us, which is what we always do in practice. With the code cell below, we create a function which takes all of these gradients for us.
+
+
+from autograd import grad
gradient_func = grad(
- cross_entropy, 1
-) # Taking the gradient wrt. the second input to the cost function
+ cost, 1
+) # Taking the gradient wrt. the second input to the cost function, i.e. the layers
@@ -937,7 +951,9 @@ Exercise 6 - Training on real datagradient_func function to take the gradient of the cross entropy wrt. the weights and biases of the network. Check the shapes of what’s inside. What does the grad
func from autograd actually do?