-
Notifications
You must be signed in to change notification settings - Fork 134
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
8 changed files
with
324 additions
and
189 deletions.
There are no files selected for viewing
Binary file not shown.
Binary file not shown.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -3,9 +3,7 @@ | |
{ | ||
"cell_type": "markdown", | ||
"id": "4b4c06bc", | ||
"metadata": { | ||
"editable": true | ||
}, | ||
"metadata": {}, | ||
"source": [ | ||
"<!-- HTML file automatically generated from DocOnce source (https://github.com/doconce/doconce/)\n", | ||
"doconce format html exercisesweek41.do.txt -->\n", | ||
|
@@ -15,31 +13,29 @@ | |
{ | ||
"cell_type": "markdown", | ||
"id": "bcb25e64", | ||
"metadata": { | ||
"editable": true | ||
}, | ||
"metadata": {}, | ||
"source": [ | ||
"# Exercises week 42\n", | ||
"\n", | ||
"**October 11-18, 2024**\n", | ||
"**October 14-18, 2024**\n", | ||
"\n", | ||
"Date: **Deadline is Friday October 18 at midnight**\n" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "bb01f126", | ||
"metadata": { | ||
"editable": true | ||
}, | ||
"metadata": {}, | ||
"source": [ | ||
"# Overarching aims of the exercises this week\n", | ||
"\n", | ||
"The aim of the exercises this week is to get started with implementing a neural network. There are a lot of technical and finicky parts of implementing a neutal network, so take your time.\n", | ||
"This week, you will implement the entire feed-forward pass of a neural network! Next week you will compute the gradient of the network by implementing back-propagation manually, and by using autograd which does back-propagation for you (much easier!). Next week, you will also use the gradient to optimize the network with a gradient method! However, there is an optional exercise this week to get started on training the network and getting good results!\n", | ||
"\n", | ||
"This week, you will implement only the feed-forward pass and updating the network parameters with simple gradient descent, the gradient will be computed using autograd using code we provide. Next week, you will implement backpropagation. We recommend that you do the exercises this week by editing and running this notebook file, as it includes some checks along the way that you have implemented the pieces of the feed-forward pass correctly, and running small parts of the code at a time will be important for understanding the methods.\n", | ||
"We recommend that you do the exercises this week by editing and running this notebook file, as it includes some checks along the way that you have implemented the pieces of the feed-forward pass correctly, and running small parts of the code at a time will be important for understanding the methods.\n", | ||
"\n", | ||
"If you have trouble running a notebook, you can run this notebook in google colab instead (https://colab.research.google.com/drive/1OCQm1tlTWB6hZSf9I7gGUgW9M8SbVeQu#offline=true&sandboxMode=true), an updated link will be provided on the course discord (you can also send an email to [email protected] if you encounter any trouble), though we recommend that you set up VSCode and your python environment to run code like this locally.\n" | ||
"If you have trouble running a notebook, you can run this notebook in google colab instead (https://colab.research.google.com/drive/1zKibVQf-iAYaAn2-GlKfgRjHtLnPlBX4#offline=true&sandboxMode=true), an updated link will be provided on the course discord (you can also send an email to [email protected] if you encounter any trouble), though we recommend that you set up VSCode and your python environment to run code like this locally.\n", | ||
"\n", | ||
"First, here are some functions you are going to need, don't change this cell. If you are unable to import autograd, just swap in normal numpy until you want to do the final optional exercise.\n" | ||
] | ||
}, | ||
{ | ||
|
@@ -365,9 +361,9 @@ | |
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"def feed_forward(input, layers, activations):\n", | ||
"def feed_forward(input, layers, activation_funcs):\n", | ||
" a = input\n", | ||
" for (W, b), activation in zip(layers, activations):\n", | ||
" for (W, b), activation_func in zip(layers, activation_funcs):\n", | ||
" z = ...\n", | ||
" a = ...\n", | ||
" return a" | ||
|
@@ -378,9 +374,9 @@ | |
"id": "8f7df363", | ||
"metadata": {}, | ||
"source": [ | ||
"**b)** Make a list with three activation functions(don't call them yet! you can make a list with function names as elements, and then call these elements of the list later), two ReLU and one sigmoid. (If you add other functions than the ones defined at the start of the notebook, make sure everything is defined using autograd's numpy wrapper, like above, since we want to use automatic differentiation on all of these functions later.)\n", | ||
"**b)** You are now given a list with three activation functions, two ReLU and one sigmoid. (Don't call them yet! you can make a list with function names as elements, and then call these elements of the list later. If you add other functions than the ones defined at the start of the notebook, make sure everything is defined using autograd's numpy wrapper, like above, since we want to use automatic differentiation on all of these functions later.)\n", | ||
"\n", | ||
"Then evaluate a network with three layers and these activation functions.\n" | ||
"Evaluate a network with three layers and these activation functions.\n" | ||
] | ||
}, | ||
{ | ||
|
@@ -392,11 +388,19 @@ | |
"source": [ | ||
"network_input_size = ...\n", | ||
"layer_output_sizes = [...]\n", | ||
"activations = [...]\n", | ||
"activation_funcs = [ReLU, ReLU, sigmoid]\n", | ||
"layers = ...\n", | ||
"\n", | ||
"x = np.random.randn(network_input_size)\n", | ||
"feed_forward(x, layers, activations)" | ||
"feed_forward(x, layers, activation_funcs)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "9c914fd0", | ||
"metadata": {}, | ||
"source": [ | ||
"**c)** How does the output of the network change if you use sigmoid in the hidden layers and ReLU in the output layer?\n" | ||
] | ||
}, | ||
{ | ||
|
@@ -463,9 +467,9 @@ | |
"inputs = np.random.rand(1000, 4)\n", | ||
"\n", | ||
"\n", | ||
"def feed_forward_batch(inputs, layers, activations):\n", | ||
"def feed_forward_batch(inputs, layers, activation_funcs):\n", | ||
" a = inputs\n", | ||
" for (W, b), activation in zip(layers, activations):\n", | ||
" for (W, b), activation_func in zip(layers, activation_funcs):\n", | ||
" z = ...\n", | ||
" a = ...\n", | ||
" return a" | ||
|
@@ -488,11 +492,11 @@ | |
"source": [ | ||
"network_input_size = ...\n", | ||
"layer_output_sizes = [...]\n", | ||
"activations = [...]\n", | ||
"activation_funcs = [...]\n", | ||
"layers = create_layers_batch(network_input_size, layer_output_sizes)\n", | ||
"\n", | ||
"x = np.random.randn(network_input_size)\n", | ||
"feed_forward_batch(inputs, layers, activations)" | ||
"feed_forward_batch(inputs, layers, activation_funcs)" | ||
] | ||
}, | ||
{ | ||
|
@@ -567,7 +571,7 @@ | |
"id": "0362c4a9", | ||
"metadata": {}, | ||
"source": [ | ||
"**a)** What should the input size for the network be with this dataset? What should the output shape of the last layer be?\n" | ||
"**a)** What should the input size for the network be with this dataset? What should the output size of the last layer be?\n" | ||
] | ||
}, | ||
{ | ||
|
@@ -604,7 +608,7 @@ | |
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"predictions = feed_forward_batch(inputs, layers, activations)" | ||
"predictions = feed_forward_batch(inputs, layers, activation_funcs)" | ||
] | ||
}, | ||
{ | ||
|
@@ -630,7 +634,7 @@ | |
"id": "334560b6", | ||
"metadata": {}, | ||
"source": [ | ||
"# Exercise 6 - Training on real data\n", | ||
"# Exercise 7 - Training on real data (Optional)\n", | ||
"\n", | ||
"To be able to actually do anything useful with your neural network, you need to train it. For this, we need a cost function and a way to take the gradient of the cost function wrt. the network parameters. The following exercises guide you through taking the gradient using autograd, and updating the network parameters using the gradient. Feel free to implement gradient methods like ADAM if you finish everything.\n" | ||
] | ||
|
@@ -640,31 +644,58 @@ | |
"id": "700cabe4", | ||
"metadata": {}, | ||
"source": [ | ||
"The cross-entropy loss function can evaluate performance on classification tasks. It sees if your prediction is \"most certain\" on the correct target.\n" | ||
"Since we are doing a classification task with multiple output classes, we use the cross-entropy loss function, which can evaluate performance on classification tasks. It sees if your prediction is \"most certain\" on the correct target.\n" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "56bef776", | ||
"id": "f30e6e2c", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"from autograd import grad\n", | ||
"\n", | ||
"def cross_entropy(predict, target):\n", | ||
" return np.sum(-target * np.log(predict))\n", | ||
"\n", | ||
"def cost(input, layers, activations, target):\n", | ||
" predict = feed_forward_batch(input, layers, activations)\n", | ||
" return cross_entropy(predict, target)\n", | ||
"\n", | ||
"def cost(input, layers, activation_funcs, target):\n", | ||
" predict = feed_forward_batch(input, layers, activation_funcs)\n", | ||
" return cross_entropy(predict, target)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "7ea9c1a4", | ||
"metadata": {}, | ||
"source": [ | ||
"To improve our network on whatever prediction task we have given it, we need to use a sensible cost function, take the gradient of that cost function with respect to our network parameters, the weights and biases, and then update the weights and biases using these gradients. To clarify, we need to find and use these\n", | ||
"\n", | ||
"def cross_entropy(predict, target):\n", | ||
" return np.sum(-target * np.log(predict))\n", | ||
"$$\n", | ||
"\\frac{\\partial C}{\\partial W}, \\frac{\\partial C}{\\partial b}\n", | ||
"$$\n" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "6c753e3b", | ||
"metadata": {}, | ||
"source": [ | ||
"Now we need to compute these gradients. This is pretty hard to do for a neural network, we will use most of next week to do this, but we can also use autograd to just do it for us, which is what we always do in practice. With the code cell below, we create a function which takes all of these gradients for us.\n" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "56bef776", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"from autograd import grad\n", | ||
"\n", | ||
"\n", | ||
"gradient_func = grad(\n", | ||
" cross_entropy, 1\n", | ||
") # Taking the gradient wrt. the second input to the cost function" | ||
" cost, 1\n", | ||
") # Taking the gradient wrt. the second input to the cost function, i.e. the layers" | ||
] | ||
}, | ||
{ | ||
|
@@ -684,7 +715,9 @@ | |
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"layers_grad = gradient_func(inputs, layers, activations, targets) # Don't change this" | ||
"layers_grad = gradient_func(\n", | ||
" inputs, layers, activation_funcs, targets\n", | ||
") # Don't change this" | ||
] | ||
}, | ||
{ | ||
|
@@ -703,10 +736,10 @@ | |
"outputs": [], | ||
"source": [ | ||
"def train_network(\n", | ||
" inputs, layers, activations, targets, learning_rate=0.001, epochs=100\n", | ||
" inputs, layers, activation_funcs, targets, learning_rate=0.001, epochs=100\n", | ||
"):\n", | ||
" for i in range(epochs):\n", | ||
" layers_grad = gradient_func(inputs, layers, activations, targets)\n", | ||
" layers_grad = gradient_func(inputs, layers, activation_funcs, targets)\n", | ||
" for (W, b), (W_g, b_g) in zip(layers, layers_grad):\n", | ||
" W -= ...\n", | ||
" b -= ..." | ||
|
@@ -749,7 +782,7 @@ | |
], | ||
"metadata": { | ||
"kernelspec": { | ||
"display_name": ".venv", | ||
"display_name": "Python 3 (ipykernel)", | ||
"language": "python", | ||
"name": "python3" | ||
}, | ||
|
@@ -763,7 +796,7 @@ | |
"name": "python", | ||
"nbconvert_exporter": "python", | ||
"pygments_lexer": "ipython3", | ||
"version": "3.12.7" | ||
"version": "3.9.15" | ||
} | ||
}, | ||
"nbformat": 4, | ||
|
Oops, something went wrong.