diff --git a/00-intro.ipynb b/00-intro.ipynb
new file mode 100644
index 0000000..5708c46
--- /dev/null
+++ b/00-intro.ipynb
@@ -0,0 +1,169 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Introduction\n",
+    "\n",
+    "Material for this tutorial is here: !!LINK\n",
+    "\n",
+    "**Note:**\n",
+    "If you use PyTorch on a daily basis, you will most probably not learn a lot during this tutorial.\n",
+    "\n",
+    "**Goals:**\n",
+    "- understand PyTorch concepts (e.g. autograd, broadcasting, ...) and understand what it can and cannot do\n",
+    "- be aware of some handy tools/libraries\n",
+    "- be able to create simple neural networks\n",
+    "- learn the some tools that will help to code more complicated models in the future"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "![](img/the_real_reason.png)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# PyTorch Overview\n",
+    "\n",
+    "\n",
+    "> \"PyTorch - Tensors and Dynamic neural networks in Python\n",
+    "with strong GPU acceleration.\n",
+    "PyTorch is a deep learning framework for fast, flexible experimentation.\"\n",
+    ">\n",
+    "> -- https://pytorch.org/*\n",
+    "\n",
+    "This was the tagline prior to PyTorch 1.0.\n",
+    "Now it's:\n",
+    "\n",
+    "> \"PyTorch - From Research To Production\n",
+    "> \n",
+    "> An open source deep learning platform that provides a seamless path from research prototyping to production deployment.\""
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## \"Build by run\" - what is that and why do I care?"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "![](img/dynamic_graph.gif)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "A very practical reason to use PyTorch:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import torch\n",
+    "from IPython.core.debugger import set_trace\n",
+    "\n",
+    "def f(x):\n",
+    "    res = x + x\n",
+    "    # set_trace()  # <-- :o\n",
+    "    return res\n",
+    "\n",
+    "x = torch.randn(1, 10)\n",
+    "f(x)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## TensorFlow and PyTorch\n",
+    "- static vs dynamic\n",
+    "- production vs prototyping "
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Other neural network toolkits you might want to check out\n",
+    "- TensorFlow\n",
+    "- MXNet\n",
+    "- Keras\n",
+    "- CNTK\n",
+    "- Chainer\n",
+    "- caffe\n",
+    "- caffe2\n",
+    "- dynet\n",
+    "- many many more\n",
+    "\n",
+    "All of them are good!\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Useful Links\n",
+    "\n",
+    "- Twitter: https://twitter.com/PyTorch\n",
+    "- Forum: https://discuss.pytorch.org/\n",
+    "- Tutorials: https://pytorch.org/tutorials/\n",
+    "- Examples: https://github.com/pytorch/examples\n",
+    "- API Reference: https://pytorch.org/docs/stable/index.html\n",
+    "- Torchvision: https://pytorch.org/docs/stable/torchvision/index.html\n",
+    "- PyTorch Text: https://github.com/pytorch/text\n",
+    "- PyTorch Audio: https://github.com/pytorch/audio\n",
+    "\n",
+    "## Tutorials I based this on...\n",
+    "\n",
+    "- https://github.com/sotte/pytorch_tutorial\n",
+    "- https://github.com/erickrf/pytorch-lecture"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Some Jupyter Notebook shortcuts that might be useful:\n",
+    "\n",
+    "- `A` and `B` create a cell above and below, respectively\n",
+    "- `Shift+Enter` runs the cell and jumps to the next one/creates one below\n",
+    "- `D`, `D` deletes the cell"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.7.4"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
diff --git a/PyTorch Basics.ipynb b/01-pytorch-basics.ipynb
similarity index 63%
rename from PyTorch Basics.ipynb
rename to 01-pytorch-basics.ipynb
index e84e3ba..816c001 100644
--- a/PyTorch Basics.ipynb	
+++ b/01-pytorch-basics.ipynb
@@ -14,15 +14,7 @@
     "* Activation functions (tanh, relu, sigmoid, etc.)\n",
     "* Gradient computation\n",
     "* Optimizer (adam, adagrad, RMSprop, SGD, etc.)\n",
-    "* Implementations speed gains in GPU\n",
-    "\n",
-    "## Alternatives\n",
-    "\n",
-    "Other platforms for deep learning in Python exist, with different focuses: Tensorflow, Caffe, MXNet,...\n",
-    "\n",
-    "* Pytorch is comparetively simple to use \n",
-    "* ... and also the only one besides Tensorflow I have experience with 🙂\n",
-    "* Feel free to try the others!"
+    "* Implementations speed gains in GPU"
    ]
   },
   {
@@ -40,9 +32,24 @@
    "metadata": {},
    "outputs": [],
    "source": [
+    "import numpy as np\n",
     "import torch"
    ]
   },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "v1 = np.arange(10)\n",
+    "v2 = np.arange(10, 20)\n",
+    "\n",
+    "print(\"v1: %s\\n\" % v1)\n",
+    "print(\"v2: %s\\n\" % v2)\n",
+    "print(\"Dot product: %d\" % v1.dot(v2))"
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": null,
@@ -64,6 +71,19 @@
     "#### Setting values manually or randomly:"
    ]
   },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "v3 = np.array([2, 4, 6, 8])\n",
+    "v4 = np.random.random(10)\n",
+    "\n",
+    "print(\"v3: %s\\n\" % v3)\n",
+    "print(\"v4: %s\\n\" % v4)"
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": null,
@@ -140,7 +160,63 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "#### Elementwise operations"
+    "## Converting"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "A = torch.eye(3)\n",
+    "A"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# torch --> numpy\n",
+    "B = A.numpy()\n",
+    "B"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# numpy --> torch\n",
+    "torch.from_numpy(np.eye(3))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Elementwise operations"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "v1"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "v2"
    ]
   },
   {
@@ -183,8 +259,8 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "x = torch.tensor(v1, dtype=torch.float)\n",
-    "y = torch.tensor(v2, dtype=torch.float)\n",
+    "x = v1.to(torch.float)\n",
+    "y = v2.to(torch.float)\n",
     "x / y"
    ]
   },
@@ -195,6 +271,15 @@
     "#### Operations with constants"
    ]
   },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "x"
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": null,
@@ -249,14 +334,23 @@
    },
    "outputs": [],
    "source": [
-    "print(m1.matmul(m2))"
+    "print(m1.mm(m2))"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "print(m1 @ m2)"
    ]
   },
   {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "#### Higher order tensors"
+    "What if I have batched data? It's better to use `.bmm()`! This is a common source of errors."
    ]
   },
   {
@@ -265,8 +359,23 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "t = torch.rand(3, 4, 5)\n",
-    "t"
+    "print(m1.bmm(m2))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "`@` will work as `.bmm()`!"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "print(m1 @ m2)"
    ]
   },
   {
@@ -299,8 +408,15 @@
     "print(\"m:\", m)\n",
     "print()\n",
     "print(\"v:\", v)\n",
-    "print()\n",
-    "\n",
+    "print()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
     "m_plus_v = m + v\n",
     "print(\"m + v:\", m_plus_v)"
    ]
@@ -348,7 +464,7 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "v = v.view([2, 2])\n",
+    "v = v.view(2, 2)\n",
     "v"
    ]
   },
@@ -358,7 +474,7 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "v = v.view([4, 1])\n",
+    "v = v.view(4, 1)\n",
     "v"
    ]
   },
@@ -391,7 +507,7 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "v = v.view([1, 4])\n",
+    "v = v.view(1, 4)\n",
     "m + v"
    ]
   },
@@ -412,6 +528,24 @@
     "u + v"
    ]
   },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "u"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "v"
+   ]
+  },
   {
    "cell_type": "markdown",
    "metadata": {},
@@ -512,7 +646,11 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "We can implement softmax. Since it's a function of the whole array, the plot has a slightly different meaning (notice that the y-axis only goes until 0.1)"
+    "# But what about the GPU?\n",
+    "How do I use the GPU?\n",
+    "\n",
+    "If you have a GPU make sure that the right pytorch is installed\n",
+    "(check https://pytorch.org/ for details)."
    ]
   },
   {
@@ -521,17 +659,18 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "exps = torch.exp(x)\n",
-    "z = exps.sum()\n",
-    "softmax = exps / z\n",
-    "pl.plot(x.numpy(), softmax.numpy())"
+    "device = torch.device(\"cuda:0\" if torch.cuda.is_available() else \"cpu\")\n",
+    "device"
    ]
   },
   {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "Anyway, torch also provides an implementation of softmax:"
+    "If you have a GPU you should get something like: \n",
+    "`device(type='cuda', index=0)`\n",
+    "\n",
+    "You can move data to the GPU by doing `.to(device)`."
    ]
   },
   {
@@ -540,15 +679,151 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "y = torch.softmax(x, dim=0)\n",
-    "pl.plot(x.numpy(), y.numpy())"
+    "data = torch.eye(3)\n",
+    "data.to(device)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Now the computation happens on the GPU."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "res = data + data\n",
+    "res"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "res.device"
    ]
   },
   {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "Let's see the softmax with another x:"
+    "# Automatic differentiation with `autograd`\n",
+    "\n",
+    "Ref:\n",
+    "- https://pytorch.org/docs/stable/autograd.html\n",
+    "- https://pytorch.org/tutorials/beginner/blitz/autograd_tutorial.html"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "x = torch.tensor(2.)\n",
+    "x"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "x = torch.tensor(2., requires_grad=True)\n",
+    "x"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "print(x.requires_grad)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "print(x.grad)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "y = x ** 2\n",
+    "\n",
+    "print(\"Grad of x:\", x.grad)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "y = x ** 2\n",
+    "y.backward()\n",
+    "\n",
+    "print(\"Grad of x:\", x.grad)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# What is going to happen here?\n",
+    "x = torch.tensor(2.)\n",
+    "x.backward()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Don't record the gradient\n",
+    "# Useful for inference\n",
+    "\n",
+    "x = torch.tensor(2.)\n",
+    "params = torch.tensor(2., requires_grad=True)\n",
+    "\n",
+    "with torch.no_grad():\n",
+    "    y = x * x\n",
+    "    print(x.grad)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "`nn.Module` and `nn.Parameter` keep track of gradients for you."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "lin = torch.nn.Linear(2, 1, bias=True)\n",
+    "lin.weight"
    ]
   },
   {
@@ -557,9 +832,7 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "x = torch.randn([100])\n",
-    "y = torch.softmax(x, dim=0)\n",
-    "pl.plot(x.numpy(), y.numpy(), '.')"
+    "type(lin.weight)"
    ]
   }
  ],
@@ -579,7 +852,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.6.6"
+   "version": "3.7.4"
   }
  },
  "nbformat": 4,
diff --git a/02-linear-regression.ipynb b/02-linear-regression.ipynb
new file mode 100644
index 0000000..98f7dab
--- /dev/null
+++ b/02-linear-regression.ipynb
@@ -0,0 +1,196 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Linear Regression and Gradient Descent"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "%matplotlib inline"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import torch\n",
+    "import torch.nn as nn\n",
+    "import torch.nn.functional as F\n",
+    "import torch.optim as optim\n",
+    "import torchvision\n",
+    "\n",
+    "DEVICE = torch.device(\"cuda:0\" if torch.cuda.is_available() else \"cpu\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from pprint import pprint\n",
+    "\n",
+    "import matplotlib.pyplot as plt\n",
+    "import numpy as np\n",
+    "from IPython.core.debugger import set_trace"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# The Problem"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from sklearn.datasets import make_regression\n",
+    "\n",
+    "\n",
+    "n_features = 1\n",
+    "n_samples = 100\n",
+    "\n",
+    "X, y = make_regression(\n",
+    "    n_samples=n_samples,\n",
+    "    n_features=n_features,\n",
+    "    noise=20,\n",
+    "    random_state=42,\n",
+    ")\n",
+    "\n",
+    "fix, ax = plt.subplots()\n",
+    "ax.plot(X, y, \".\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# The Solution"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "X = torch.from_numpy(X).float()\n",
+    "y = torch.from_numpy(y.reshape((n_samples, n_features))).float()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "class LinReg(nn.Module):\n",
+    "    def __init__(self, input_dim):\n",
+    "        super().__init__()\n",
+    "        self.beta = nn.Linear(input_dim, 1)\n",
+    "        \n",
+    "    def forward(self, X):\n",
+    "        return self.beta(X)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "model = LinReg(n_features).to(DEVICE)  # <-- here\n",
+    "loss_fn = nn.MSELoss()\n",
+    "optimizer = optim.SGD(model.parameters(), lr=0.1)\n",
+    "\n",
+    "\n",
+    "X, y = X.to(DEVICE), y.to(DEVICE)  # <-- here"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Train step\n",
+    "model.train()  # <-- here\n",
+    "optimizer.zero_grad()\n",
+    "\n",
+    "y_ = model(X)\n",
+    "loss = loss_fn(y_, y)\n",
+    "\n",
+    "loss.backward()\n",
+    "optimizer.step()\n",
+    "\n",
+    "# Eval\n",
+    "model.eval()  # <-- here\n",
+    "with torch.no_grad():\n",
+    "    y_ = model(X)    \n",
+    "\n",
+    "# Vis\n",
+    "fig, ax = plt.subplots()\n",
+    "ax.plot(X.cpu().numpy(), y_.cpu().numpy(), \".\", label=\"pred\")\n",
+    "ax.plot(X.cpu().numpy(), y.cpu().numpy(), \".\", label=\"data\")\n",
+    "ax.set_title(f\"MSE: {loss.item():0.1f}\")\n",
+    "ax.legend();"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Note: I did gradient descent with all the data. I did not split the data into `train` and `valid` which should be done!"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Exercise"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "- Write a proper training loop."
+   ]
+  }
+ ],
+ "metadata": {
+  "anaconda-cloud": {},
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.7.4"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 1
+}
diff --git a/03-modules-and-mlps.ipynb b/03-modules-and-mlps.ipynb
new file mode 100644
index 0000000..743a8db
--- /dev/null
+++ b/03-modules-and-mlps.ipynb
@@ -0,0 +1,863 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Read the data\n",
+    "\n",
+    "We will download the MNIST dataset for training a classifier. Torch provides a convenient function for that.\n",
+    "\n",
+    "The MNIST dataset is composed of images of digits that must be classified with labels from 0 to 9. The inputs are 28x28 matrices containing the grayscale intensity in each pixel."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "%matplotlib inline"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import torch\n",
+    "import torch.nn as nn\n",
+    "import torch.nn.functional as F\n",
+    "import torch.optim as optim\n",
+    "import torchvision\n",
+    "from torchvision import datasets"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from pprint import pprint\n",
+    "\n",
+    "import matplotlib.pyplot as plt\n",
+    "import numpy as np\n",
+    "from IPython.core.debugger import set_trace"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Dataset\n",
+    "It's easy to create your `Dataset`,\n",
+    "but PyTorch comes with some\n",
+    "[built-in datasets](https://pytorch.org/docs/stable/torchvision/datasets.html):\n",
+    "\n",
+    "- MNIST\n",
+    "- Fashion-MNIST\n",
+    "- KMNIST\n",
+    "- EMNIST\n",
+    "- FakeData\n",
+    "- COCO\n",
+    "  - Captions\n",
+    "  - Detection\n",
+    "- LSUN\n",
+    "- ImageFolder\n",
+    "- DatasetFolder\n",
+    "- Imagenet-12\n",
+    "- CIFAR\n",
+    "- STL10\n",
+    "- SVHN\n",
+    "- PhotoTour\n",
+    "- SBU\n",
+    "- Flickr\n",
+    "- VOC\n",
+    "- Cityscapes\n",
+    "\n",
+    "`Dataset` gives you information about the number of samples (implement `__len__`) and gives you the sample at a given index (implement `__getitem__`.\n",
+    "It's a nice and simple abstraction to work with data."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from torch.utils.data import Dataset"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "```python\n",
+    "class Dataset(object):\n",
+    "    def __getitem__(self, index):\n",
+    "        raise NotImplementedError\n",
+    "\n",
+    "    def __len__(self):\n",
+    "        raise NotImplementedError\n",
+    "\n",
+    "    def __add__(self, other):\n",
+    "        return ConcatDataset([self, other])\n",
+    "```\n",
+    "\n",
+    "For now, let's use MNIST. You'll have an example on how to use `Dataset` in your next homework."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "train_data = datasets.MNIST('../data', train=True, download=True)\n",
+    "test_data = datasets.MNIST('../data', train=False)\n",
+    "\n",
+    "train_x = train_data.data\n",
+    "train_y = train_data.targets\n",
+    "test_x = test_data.data\n",
+    "test_y = test_data.targets"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "n_train_examples = train_x.shape[0]\n",
+    "n_test_examples = test_x.shape[0]\n",
+    "\n",
+    "print('%d training instances and %d test instances' % (n_train_examples, n_test_examples))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Check the shape of our training data to see how many input features there are:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "print(train_x.shape)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "And what the image looks like:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "plt.imshow(train_x[0])"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Formatting"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Each sample is a 28x28 matrix. But we want to represent them as vectors, since our model doesn't take any advantage of the 2-d nature of the data.\n",
+    "\n",
+    "So, we reshape the data:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "num_features = 28 * 28\n",
+    "new_shape = [n_train_examples, num_features]\n",
+    "train_x_vectors = train_x.reshape(new_shape)\n",
+    "print(train_x_vectors.shape)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "When we reshape an array (or torch tensor, for that matter), we don't need to specify all dimensions. We can leave one as -1, and it will be automatically determined from the size of the data. This is useful when we don't know a priori the shape of some array."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "train_x_vectors = train_x.view(n_train_examples, -1)\n",
+    "test_x_vectors = test_x.view(n_test_examples, -1)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Also, the values are integers in the range $[0, 255]$. It is better to work with float values in a smaller interval, such as $[0, 1]$ or $[-1, 1]$. There are some more elaborate normalization techniques, but for now let's just normalize it to $[0, 1]$."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "train_x_norm = train_x_vectors / 255\n",
+    "test_x_norm = test_x_vectors / 255\n",
+    "print(train_x_norm[0])"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Oops! Notice that the arrays had integer values, but the result of the division would be floats. One way to change the `dtype` of a torch tensor is using `.to()`.\n",
+    "\n",
+    "Keep in mind that data type is a common source of errors!"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "train_x_vectors = train_x_vectors.to(torch.float)\n",
+    "test_x_vectors = test_x_vectors.to(torch.float)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Let's try again:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "train_x_norm = train_x_vectors / 255\n",
+    "test_x_norm = test_x_vectors / 255\n",
+    "print(train_x_norm[0])"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Now, check the labels:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "print(torch.unique(train_y))\n",
+    "num_classes = len(torch.unique(train_y))"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "train_x.shape"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "train_y.shape"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Modules and MLPs\n",
+    "\n",
+    "We've seen how the internals of simple linear classifier work. However, we still had to set a lot of things manually. It's much better to have a higher-level API that encapsulates the classifier.\n",
+    "\n",
+    "We are going to see that now, with pytorch Module objects. Then, it will allow us to build more complex models, like a multilayer perceptron."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "We begin by loading the data again:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import torch\n",
+    "import numpy as np\n",
+    "from torchvision import datasets\n",
+    "from matplotlib import pyplot as pl\n",
+    "\n",
+    "train_dataset = datasets.MNIST('../data', train=True, download=True, transform=torchvision.transforms.ToTensor())\n",
+    "test_dataset = datasets.MNIST('../data', train=False, transform=torchvision.transforms.ToTensor())\n",
+    "\n",
+    "train_x = train_dataset.data\n",
+    "train_y = train_dataset.targets\n",
+    "test_x = test_dataset.data\n",
+    "test_y = test_dataset.targets\n",
+    "\n",
+    "num_features = 28 * 28\n",
+    "num_classes = len(np.unique(train_y))\n",
+    "new_shape = [-1, num_features]\n",
+    "train_x_vectors = train_x.reshape(new_shape)\n",
+    "test_x_vectors = test_x.reshape(new_shape)\n",
+    "\n",
+    "# shorten the names\n",
+    "train_x = train_x_vectors.to(torch.float) / 255\n",
+    "test_x = test_x_vectors.to(torch.float) / 255"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Using Modules\n",
+    "\n",
+    "Let's create a linear model."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "class LinearModel(nn.Module):\n",
+    "    def __init__(self, n_features, n_classes):\n",
+    "        super().__init__()\n",
+    "        self.linear_layer = nn.Linear(n_features, n_classes)\n",
+    "        \n",
+    "    def forward(self, X):\n",
+    "        return self.linear_layer(X)\n",
+    "\n",
+    "linear_model = LinearModel(num_features, num_classes)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "The model can be called as function to compute an output. Let's see how it works:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "batch = train_x[:2]\n",
+    "\n",
+    "answers = linear_model(batch)\n",
+    "answers"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Same as doing the forward method $$w^T x + b$$"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "batch @ linear_model.linear_layer.weight.t() + linear_model.linear_layer.bias"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "#### Loss"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "loss_function = nn.CrossEntropyLoss()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "#### Optimizer"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "learning_rate = 0.1\n",
+    "\n",
+    "# the optimizer needs to be told which are the parameters to optimize\n",
+    "optimizer = torch.optim.SGD(linear_model.parameters(), lr=learning_rate, momentum=0.9)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "#### Batching\n",
+    "\n",
+    "Batching can be boring to code. `DataLoader` helps!"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from torch.utils.data import DataLoader\n",
+    "\n",
+    "train_dataloader = DataLoader(train_dataset, batch_size=64, shuffle=True)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "#### Training loop\n",
+    "\n",
+    "Now we write the main training loop. This is the basic skeleton for training pytorch models."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def train_model(model, train_dataloader, num_epochs, optimizer):\n",
+    "    losses = []\n",
+    "\n",
+    "    for epoch in range(1, num_epochs+1):\n",
+    "        print('Starting epoch %d' % epoch)\n",
+    "        batch_index = 0\n",
+    "        total_loss = 0\n",
+    "        hits = 0\n",
+    "\n",
+    "        for batch_x, batch_y in train_dataloader:\n",
+    "            optimizer.zero_grad()\n",
+    "            # get the data for this batch\n",
+    "            batch_x = batch_x.reshape(batch_x.shape[0], -1)\n",
+    "            batch_x = batch_x.to(torch.float) / 255\n",
+    "\n",
+    "            # forward pass\n",
+    "            logits = model(batch_x)\n",
+    "\n",
+    "            # compute the loss\n",
+    "            loss = loss_function(logits, batch_y)\n",
+    "            loss_value = loss.item()\n",
+    "            total_loss += loss_value\n",
+    "            losses.append(loss_value)\n",
+    "\n",
+    "            y_pred = logits.argmax(dim=1)\n",
+    "\n",
+    "            hits += torch.sum(y_pred == batch_y).item()\n",
+    "\n",
+    "            loss.backward()\n",
+    "            # after determining the gradients, take a step toward their direction\n",
+    "            optimizer.step()\n",
+    "\n",
+    "        avg_loss = total_loss / len(train_dataloader.dataset)\n",
+    "        print('Epoch loss: %.4f' % avg_loss)\n",
+    "        acc = hits / len(train_dataloader.dataset)\n",
+    "        print('Epoch accuracy: %.4f' % acc)\n",
+    "    \n",
+    "    return np.array(losses)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "linear_losses = train_model(linear_model, train_dataloader, 10, optimizer)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Graphics are good to understand the performance of a model. Let's plot the loss curve by batch:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "fig, ax = plt.subplots()\n",
+    "ax.plot(linear_losses, \".\", label=\"linear\")\n",
+    "ax.legend()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Multilayer Perceptron\n",
+    "\n",
+    "We can now proceed to a more sofisticated classifier: a multilayer perceptron. Let's build one using the Sequential API."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "hidden_size = 200\n",
+    "\n",
+    "class MLP(nn.Module):\n",
+    "    def __init__(self, n_features, hidden_size, n_classes):\n",
+    "        super().__init__()\n",
+    "        linear_layer1 = nn.Linear(n_features, hidden_size)\n",
+    "        linear_layer2 = nn.Linear(hidden_size, n_classes)\n",
+    "        self.feedforward = nn.Sequential(linear_layer1, \n",
+    "                                  nn.Tanh(), \n",
+    "                                  linear_layer2)\n",
+    "\n",
+    "    def forward(self, X):\n",
+    "        return self.feedforward(X)\n",
+    "\n",
+    "mlp = MLP(num_features, hidden_size, num_classes)\n",
+    "loss_function = nn.CrossEntropyLoss()\n",
+    "optimizer = torch.optim.SGD(mlp.parameters(), lr=0.1, momentum=0.9)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Now let's train the model. How do the loss and accuracy compare with the linear model?\n",
+    "\n",
+    "You probably also noticed a difference in running time!"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "mlp_losses = train_model(mlp, train_dataloader, 10, optimizer)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Notice the different concentration of dots in the MLP and Linear graphics!"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "fig, ax = plt.subplots()\n",
+    "ax.plot(linear_losses, \".\", label=\"linear\")\n",
+    "ax.plot(mlp_losses, \".\", label=\"mlp\")\n",
+    "ax.legend()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Validation data\n",
+    "\n",
+    "Evaluating the performance on training data is important to understand if the model is actually learning, but if we want to know if our model has any usefulness, we should evaluate its performance on validation or test data.\n",
+    "\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def evaluate_model(model, test_x, test_y):\n",
+    "    with torch.no_grad():\n",
+    "        loss_function = torch.nn.CrossEntropyLoss()\n",
+    "        logits = model(test_x)\n",
+    "        loss = loss_function(logits, test_y)\n",
+    "        \n",
+    "        y_pred = logits.argmax(dim=1)\n",
+    "        hits = torch.sum(y_pred == test_y).item()\n",
+    "        return loss / len(test_x), hits / len(test_x)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "evaluate_model(mlp, train_x, train_y)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "evaluate_model(mlp, test_x, test_y)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "evaluate_model(linear_model, train_x, train_y)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "evaluate_model(linear_model, test_x, test_y)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "How can we make our model better? There are two things to be done:\n",
+    "\n",
+    "1. **Hyperparameter search**. Do a grid search or random search on the hyperparameters (hidden size, learning rate, batch size, activation function, type of optimizer, ...)\n",
+    "2. **Generalize better**. This include either finding some better feature representation or regularizing, i.e., add some kind of penalty to the model weights that encourages it to find a more general solution. Examples: L2-norm weight regularization, dropout.\n",
+    "3. **Early stop**. Evaluate the model on validation data after each epoch or some number of batches; only save it when validation performance increases. This means detecting when the model achieved its performance peak."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "#### Dropout\n",
+    "\n",
+    "We could try dropout. It effectivelly deactivates some neural connections at random, forcing the network to avoid depending on specific inputs."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "class MLPDropout(nn.Module):\n",
+    "    def __init__(self, n_features, hidden_size, n_classes, p_dropout):\n",
+    "        super().__init__()\n",
+    "        linear_layer1 = nn.Linear(n_features, hidden_size)\n",
+    "        linear_layer2 = nn.Linear(hidden_size, n_classes)\n",
+    "        self.feedforward = nn.Sequential(\n",
+    "            linear_layer1,\n",
+    "            nn.Tanh(),\n",
+    "            nn.Dropout(p_dropout),\n",
+    "            linear_layer2)\n",
+    "\n",
+    "    def forward(self, X):\n",
+    "        return self.feedforward(X)\n",
+    "\n",
+    "hidden_size = 200\n",
+    "p_dropout = 0.5\n",
+    "mlp_dropout = MLPDropout(num_features, hidden_size, num_classes, p_dropout)\n",
+    "loss_function = nn.CrossEntropyLoss()\n",
+    "optimizer = torch.optim.SGD(mlp_dropout.parameters(), lr=0.1, momentum=0.9)  # weight_decay"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "losses = train_model(mlp_dropout, train_dataloader, 10, optimizer)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Training loss is a bit worse, as expected. After all, we are obstructing some connections.\n",
+    "\n",
+    "Now let's check validation performance:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "evaluate_model(mlp, test_x, test_y)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "evaluate_model(mlp_dropout, test_x, test_y)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "No improvement. Ideally, we should retrain our model with different hyperparamters (learning rates, layer sizes, number of layers, dropout rate) as well as some changes in the structure (different optimizers, activation functions, losses).\n",
+    "\n",
+    "However, data representation plays a key role. Do you think representing the input as independent pixels is a good idea for recognizing digits?"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Saving\n",
+    "\n",
+    "Persisting the model after training is obviously important to reuse it later.\n",
+    "\n",
+    "In Pytorch, we can save the model calling `save()` and passing  the model's `state_dict`."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "torch.save(mlp.state_dict(), 'mlp.model')"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Later, recreate the model and load the data."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "mlp2 = MLP(num_features, hidden_size, num_classes)\n",
+    "mlp2.load_state_dict(torch.load('mlp.model'))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Let's check the performance to see if it's the same!"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "evaluate_model(mlp, test_x, test_y)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# The End\n",
+    "\n",
+    "![https://twitter.com/karpathy/status/1013244313327681536](img/common_mistakes.png)\n",
+    "https://twitter.com/karpathy/status/1013244313327681536"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.7.4"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
diff --git a/Computational Efficiency.ipynb b/Computational Efficiency.ipynb
deleted file mode 100644
index 89d18e5..0000000
--- a/Computational Efficiency.ipynb	
+++ /dev/null
@@ -1,258 +0,0 @@
-{
- "cells": [
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## Numpy and Computation Efficiency\n",
-    "\n",
-    "This notebooks illustrates the computational efficiency of running linear algebra with the proper tools - such as numpy.\n",
-    "\n",
-    "Let's compute an array dot product in Python:"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "import numpy as np"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "def array_dot_product(v1, v2):\n",
-    "    dot_product = 0\n",
-    "    \n",
-    "    for v1_i, v2_i in zip(v1, v2):\n",
-    "        dot_product += v1_i * v2_i\n",
-    "    \n",
-    "    return dot_product\n",
-    "\n",
-    "v1 = list(range(100))\n",
-    "v2 = list(range(100, 200))\n",
-    "\n",
-    "print(\"v1 = %s\\n\" % v1)\n",
-    "print(\"v2 = %s\\n\" % v2)\n",
-    "\n",
-    "result = array_dot_product(v1, v2)\n",
-    "print(\"v1 dot v2 = %d\" % result)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "Okay, it works, but how long does it take?"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "%timeit array_dot_product(v1, v2)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "Now let's try with numpy -- it uses data structures like in C, optimized for mathematical operations, without the Python overhead."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "v1_np = np.arange(100)\n",
-    "v2_np = np.arange(100, 200)\n",
-    "print(\"v1: %s\\n\" % v1_np)\n",
-    "print(\"v2: %s\\n\" % v2_np)\n",
-    "\n",
-    "result = v1_np.dot(v2_np)\n",
-    "print(\"v1 dot v2 = %d\" % result)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "Nice, aligned formatting. Now let's check the running time."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "%timeit v1_np.dot(v2_np)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "What about matrices?"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "def matrix_dot_product(m1, m2_t):\n",
-    "    num_rows = len(m1)\n",
-    "    num_columns = len(m2_t)\n",
-    "    internal_dim = len(m1[0])\n",
-    "    result = []\n",
-    "    \n",
-    "    for i in range(num_rows):\n",
-    "        new_row = []\n",
-    "        for j in range(num_columns):\n",
-    "            total = 0\n",
-    "            for k in range(internal_dim):\n",
-    "                total += m1[i][k] * m2_t[j][k]\n",
-    "            new_row.append(total)\n",
-    "        result.append(new_row)\n",
-    "    \n",
-    "    return result"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "m1 = np.random.rand(100, 200)\n",
-    "m2 = np.random.rand(200, 300)\n",
-    "\n",
-    "m2_t = m2.T\n",
-    "m1_list = m1.tolist()\n",
-    "m2_t_list = m2_t.tolist()\n",
-    "result_list = matrix_dot_product(m1_list, m2_t_list)\n",
-    "result_numpy = m1.dot(m2)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "Checking the results..."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "result_list = np.array(result_list)\n",
-    "result_list == result_numpy"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "Different? How much?"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "result_list - result_numpy"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "np.abs(result_list - result_numpy).sum()"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "Okay. Now lets time it again."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "%timeit matrix_dot_product(m1_list, m2_t_list)"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "%timeit m1.dot(m2)"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "time1 = 747-3\n",
-    "time2 = 212e-6\n",
-    "time1 / time2"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "200e-6 / 700e-9"
-   ]
-  }
- ],
- "metadata": {
-  "anaconda-cloud": {},
-  "kernelspec": {
-   "display_name": "Python 3",
-   "language": "python",
-   "name": "python3"
-  },
-  "language_info": {
-   "codemirror_mode": {
-    "name": "ipython",
-    "version": 3
-   },
-   "file_extension": ".py",
-   "mimetype": "text/x-python",
-   "name": "python",
-   "nbconvert_exporter": "python",
-   "pygments_lexer": "ipython3",
-   "version": "3.6.6"
-  }
- },
- "nbformat": 4,
- "nbformat_minor": 2
-}
diff --git a/Linear Classifier Basics.ipynb b/Linear Classifier Basics.ipynb
deleted file mode 100644
index 39755e0..0000000
--- a/Linear Classifier Basics.ipynb	
+++ /dev/null
@@ -1,521 +0,0 @@
-{
- "cells": [
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## Read the data\n",
-    "\n",
-    "We will download the MNIST dataset for training a classifier. Torch provides a convenient function for that.\n",
-    "\n",
-    "The MNIST dataset is composed of images of digits that must be classified with labels from 0 to 9. The inputs are 28x28 matrices containing the grayscale intensity in each pixel."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "import numpy as np\n",
-    "import mnist\n",
-    "\n",
-    "train_x = mnist.train_images()\n",
-    "train_y = mnist.train_labels()\n",
-    "test_x = mnist.test_images()\n",
-    "test_y = mnist.test_labels()\n",
-    "\n",
-    "print('%d training instances and %d test instances' % (len(train_x), len(test_x)))"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "Check the shape of our training data to see how many input features there are:"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "print(train_x.shape)\n",
-    "print(train_x[0])"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "### Formatting"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "Each sample is a 28x28 matrix. But we want to represent them as vectors, since our model doesn't take any advantage of the 2-d nature of the data.\n",
-    "\n",
-    "So, we reshape the data:"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "num_features = 28 * 28\n",
-    "new_shape = [60000, num_features]\n",
-    "train_x_vectors = train_x.reshape(new_shape)\n",
-    "print(train_x_vectors.shape)\n",
-    "print(train_x_vectors[0])"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "When we reshape an array (or torch tensor, for that matter), we don't need to specify all dimensions. We can leave one as -1, and it will be automatically determined from the size of the data. This is useful when we don't know a priori the shape of some array."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "train_x_vectors = train_x.reshape([-1, num_features])\n",
-    "test_x_vectors = test_x.reshape([-1, num_features])"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "Also, the values are integers in the range $[0, 255]$. It is better to work with float values in a smaller interval, such as $[0, 1]$ or $[-1, 1]$. There are some more elaborate normalization techniques, but for now let's just normalize it to $[0, 1]$."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "train_x_vectors /= 255\n",
-    "test_x_vectors /= 255"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "Oops! Notice that the arrays had integer values, but the result of the division would be floats. The `dtype` of the arrays cannot be changed by arithmetic operators; we need instead to create new arrays.\n",
-    "\n",
-    "Keep in mind that data type are a common source of errors!"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "train_x_norm = train_x_vectors / 255\n",
-    "test_x_norm = test_x_vectors / 255\n",
-    "print(train_x_norm[0])"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "Now, check the labels:"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "print(np.unique(train_y))\n",
-    "num_classes = len(np.unique(train_y))"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "train_x.shape"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "train_y.shape"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "### Creating a simple linear classifier\n",
-    "\n",
-    "Our input has 748 dimensions (each one is a pixel), and the output has 10 possible classes. We will create a weight matrix $w$ and a bias vector $b$.\n",
-    "\n",
-    "The parameter `requires_grad` tells pytorch that their values are adjustable through gradient backpropagation."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "import torch\n",
-    "w = torch.randn([num_features, num_classes], requires_grad=True)\n",
-    "b = torch.randn([num_classes], requires_grad=True)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "For illustration purposes, let's take the first row of the data and create a pytorch tensor with it."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "x0 = torch.tensor(train_x_norm[0])\n",
-    "torch.matmul(x0, w) + b"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "Again, take care with data types! The inputs were double precision floats (64 bit) and the weights are normal floats (32 bits). Let's explicitly create the batch tensor with normal floats."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "x0 = torch.tensor(train_x_norm[0], dtype=torch.float)\n",
-    "logits = torch.matmul(x0, w) + b"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "This is how the logits look like. Think of them as the scores for each instance/class combination."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "logits"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "We want to take the highest scoring class for each instance, i.e., the argmax:"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "answer = torch.argmax(logits)\n",
-    "answer"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "What are the correct classes for those? Most of them must be wrong, we just initialized weights randomly."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {
-    "scrolled": true
-   },
-   "outputs": [],
-   "source": [
-    "label = torch.tensor(train_y[0], dtype=torch.long)\n",
-    "label"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "#### Loss\n",
-    "\n",
-    "We can compute the loss as the mean cross-entropy, as usual for classification problems. Remember that the cross-entropy between the true label distribution $p$ and the predicted $q$ is computed as:\n",
-    "\n",
-    "\\begin{align}\n",
-    "loss(x, y, \\theta) = -\\sum_c p(y=c|x) \\log q(y=c|x, \\theta)\n",
-    "\\end{align}\n",
-    "\n",
-    "for every label $c$.\n",
-    "\n",
-    "The true distribution $p$ is one for the correct label and 0 elsewhere; the predicted $q$ can be computed as the softmax over the logits."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "p = torch.zeros([num_classes])\n",
-    "p[label] = 1\n",
-    "p"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "q = torch.softmax(logits, dim=0)\n",
-    "q"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "q.sum()"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "cross_entropy = -torch.sum(p * torch.log(q))\n",
-    "cross_entropy"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "We can also use pytorch's own function for that! \n",
-    "\n",
-    "We just need to reshape our logits to be a 1x10 tensor and the label to be a 1-dim tensor. This is because usually we process lots of inputs at once."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "logits = logits.reshape([1, -1])\n",
-    "label = label.reshape([1])\n",
-    "\n",
-    "loss = torch.nn.functional.cross_entropy(logits, label)\n",
-    "loss"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "#### Gradients\n",
-    "\n",
-    "Now we have to compute the gradients to ajust weights. If we take the partial derivative of the cross-entropy (formula above!) with respect to weights $w_c$, we eventually end up with:\n",
-    "\n",
-    "\\begin{align}\n",
-    "\\frac{\\partial loss(x, y, \\theta)}{\\partial w_c} = x \\cdot (\\mathbb{1}(c = y) - q(c|x, \\theta))\n",
-    "\\end{align}"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "Let's compute the gradient for $w_0$, i.e., the weights for the label 0:"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "c = 0\n",
-    "q0 = q[0]\n",
-    "gradient0 = x0 * (int(label == answer) - q0)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "Of course, we can compute the gradient with pytorch as well. Once we call the method `backward()` in a tensor, all tensors that are used to compute it get an attribute `grad`."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {
-    "scrolled": true
-   },
-   "outputs": [],
-   "source": [
-    "loss.backward()\n",
-    "w.grad"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {
-    "scrolled": true
-   },
-   "outputs": [],
-   "source": [
-    "w.grad.nonzero().shape"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "x0.nonzero().shape"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "Let's check if pytorch gradients match ours. Again, we use the mean squared error instead of the simple `==` operator because of possible differences in precision."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "gradient0_pytorch = w.grad[:, 0]\n",
-    "torch.sum((gradient0 - gradient0_pytorch) ** 2)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "That's great! Now we have to effectively change the weights in the direction of the gradients. While we are at it, let's also compute the gradient with respect to the bias."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "w.data.sub_(w.grad.data)\n",
-    "b.data.sub_(b.grad.data)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "Run the forward pass again"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "logits = torch.matmul(x0, w) + b\n",
-    "logits = logits.view([1, -1])\n",
-    "torch.softmax(logits, dim=1)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "And the loss:"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "torch.nn.functional.cross_entropy(logits.view([1, -1]), label.view([1]))"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "We zeroed the loss! So is it that simple to get 100% accuracy classifiers?"
-   ]
-  }
- ],
- "metadata": {
-  "anaconda-cloud": {},
-  "kernelspec": {
-   "display_name": "Python 3",
-   "language": "python",
-   "name": "python3"
-  },
-  "language_info": {
-   "codemirror_mode": {
-    "name": "ipython",
-    "version": 3
-   },
-   "file_extension": ".py",
-   "mimetype": "text/x-python",
-   "name": "python",
-   "nbconvert_exporter": "python",
-   "pygments_lexer": "ipython3",
-   "version": "3.6.6"
-  }
- },
- "nbformat": 4,
- "nbformat_minor": 1
-}
diff --git a/Modules and MLPs.ipynb b/Modules and MLPs.ipynb
deleted file mode 100644
index 876f48d..0000000
--- a/Modules and MLPs.ipynb	
+++ /dev/null
@@ -1,492 +0,0 @@
-{
- "cells": [
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "# Modules and MLPs\n",
-    "\n",
-    "We've seen how the internals of simple linear classifier work. However, we still had to set a lot of things manually. It's much better to have a higher-level API that encapsulates the classifier.\n",
-    "\n",
-    "We are going to see that now, with pytorch Module objects. Then, it will allow us to build more complex models, like a multilayer perceptron."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "We begin by loading the data again:"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "import torch\n",
-    "import numpy as np\n",
-    "import mnist\n",
-    "from matplotlib import pyplot as pl\n",
-    "\n",
-    "train_x = mnist.train_images()\n",
-    "train_y = mnist.train_labels()\n",
-    "test_x = mnist.test_images()\n",
-    "test_y = mnist.test_labels()\n",
-    "\n",
-    "num_features = 28 * 28\n",
-    "num_classes = len(np.unique(train_y))\n",
-    "new_shape = [-1, num_features]\n",
-    "train_x_vectors = train_x.reshape(new_shape)\n",
-    "test_x_vectors = test_x.reshape(new_shape)\n",
-    "\n",
-    "# shorten the names\n",
-    "train_x = train_x_vectors / 255\n",
-    "test_x = test_x_vectors / 255"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "#### Sequential\n",
-    "\n",
-    "Let's create a model similar to the one in the previous notebook, but now with a more organized structure."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "linear_layer = torch.nn.Linear(num_features, num_classes)\n",
-    "linear_model = torch.nn.Sequential(linear_layer)\n",
-    "loss_function = torch.nn.CrossEntropyLoss()"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "The model can be called as function to compute an output. Let's see how it works:"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "batch_size = 8\n",
-    "batch = torch.tensor(train_x[:batch_size], dtype=torch.float)\n",
-    "labels = torch.tensor(train_y[:batch_size], dtype=torch.long)\n",
-    "\n",
-    "answers = linear_model(batch)\n",
-    "answers"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "#### Optimizer\n",
-    "\n",
-    "The answers and loss are pretty much in the same way as in our last notebook. Now let's define an optimizer that will update weights more efficiently."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "learning_rate = 1e-2\n",
-    "\n",
-    "# the optimizer needs to be told which are the parameters to optimize\n",
-    "optimizer = torch.optim.SGD(linear_model.parameters(), lr=learning_rate)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "#### Training loop\n",
-    "\n",
-    "Now we write the main training loop. This is the basic skeleton for training pytorch models."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "def train_model(model, train_x, train_y, num_epochs, batch_size, optimizer):\n",
-    "    losses = []\n",
-    "\n",
-    "    for epoch in range(num_epochs):\n",
-    "        print('Starting epoch %d' % epoch)\n",
-    "        batch_index = 0\n",
-    "        total_loss = 0\n",
-    "        hits = 0\n",
-    "\n",
-    "        while batch_index < len(train_x):\n",
-    "            # get the data for this batch\n",
-    "            next_index = batch_index + batch_size\n",
-    "            batch_x = torch.tensor(train_x[batch_index:next_index], dtype=torch.float)\n",
-    "            batch_y = torch.tensor(train_y[batch_index:next_index], dtype=torch.long)\n",
-    "            batch_index = next_index\n",
-    "\n",
-    "            # forward pass\n",
-    "            logits = model(batch_x)\n",
-    "\n",
-    "            # compute the loss\n",
-    "            loss = loss_function(logits, batch_y)\n",
-    "            loss_value = loss.item()\n",
-    "            total_loss += loss_value\n",
-    "            losses.append(loss_value)\n",
-    "            \n",
-    "            y_pred = logits.argmax(dim=1)\n",
-    "            \n",
-    "            hits += torch.sum(y_pred == batch_y).item()\n",
-    "\n",
-    "            # important: zero the gradients before recomputing them again\n",
-    "            model.zero_grad()\n",
-    "            loss.backward()\n",
-    "\n",
-    "            # after determining the gradients, take a step toward their direction\n",
-    "            optimizer.step()\n",
-    "\n",
-    "        avg_loss = total_loss / len(train_x)\n",
-    "        print('Epoch loss: %.4f' % avg_loss)\n",
-    "        acc = hits / len(train_x)\n",
-    "        print('Epoch accuracy: %.4f' % acc)\n",
-    "    \n",
-    "    return np.array(losses)"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "losses = train_model(linear_model, train_x, train_y, 5, 8, optimizer)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "Knowing the loss decreases is good, but in classification problems, we usually want to know other metrics such as accuracy or F1.\n",
-    "\n",
-    "**Exercise:** Include accuracy report!"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "Graphics are good to understand the performance of a model. Let's plot the loss curve by batch:"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "%matplotlib inline\n",
-    "pl.rcParams['figure.figsize'] = [10, 5]\n",
-    "pl.plot(losses)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "That might be too dense, although we can see that the loss doesn't decrease smoothly. Let's downsample the array, picking only every 10th value, remove the lines and try again."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "pl.plot(losses[::10], 'b.')"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "Now it is clearer to see that the bulk of the batches have a lower loss. Interestingly, some patterns of hard examples to classify are repeated every epoch."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## Multilayer Perceptron\n",
-    "\n",
-    "We can now proceed to a more sofisticated classifier: a multilayer perceptron. Let's build one using the Sequential API."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "hidden_size = 200\n",
-    "learning_rate = 1e-2\n",
-    "\n",
-    "linear_layer1 = torch.nn.Linear(num_features, hidden_size)\n",
-    "linear_layer2 = torch.nn.Linear(hidden_size, num_classes)\n",
-    "mlp = torch.nn.Sequential(linear_layer1, \n",
-    "                          torch.nn.ReLU(), \n",
-    "                          linear_layer2)\n",
-    "\n",
-    "optimizer = torch.optim.SGD(mlp.parameters(), lr=learning_rate)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "Now let's train the model. How do the loss and accuracy compare with the linear model?\n",
-    "\n",
-    "You probably also noticed a difference in running time!"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "losses = train_model(mlp, train_x, train_y, 3, 8, optimizer)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "Notice the different concentration of dots in the MLP and Linear graphics!"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "pl.plot(losses[::10], 'b.')"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "### Validation data\n",
-    "\n",
-    "Evaluating the performance on training data is important to understand if the model is actually learning, but if we want to know if our model has any usefulness, we should evaluate its performance on validation or test data.\n",
-    "\n"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "def evaluate_model(model, test_x, test_y):\n",
-    "    test_x = torch.tensor(test_x, dtype=torch.float)\n",
-    "    test_y = torch.tensor(test_y, dtype=torch.long)\n",
-    "    loss_function = torch.nn.CrossEntropyLoss()\n",
-    "    logits = model(test_x)\n",
-    "    loss = loss_function(logits, test_y)\n",
-    "    return loss"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "evaluate_model(mlp, test_x, test_y)"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "evaluate_model(linear_model, test_x, test_y)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "Validation loss is way higher than training loss: that's plain overfitting.\n",
-    "\n",
-    "How can we remedy that? There are two things to be done:\n",
-    "\n",
-    "1. **Generalize better**. This include either finding some better feature representation or regularizing, i.e., add some kind of penalty to the model weights that encourages it to find a more general solution. Examples: L2-norm weight regularization, dropout.\n",
-    "1. **Early stop**. Evaluate the model on validation data after each epoch or some number of batches; only save it when validation performance increases. This means detecting when the model achieved its performance peak."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "#### Dropout\n",
-    "\n",
-    "We could try dropout. It effectivelly deactivates some neural connections at random, forcing the network to avoid depending on specific inputs."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "hidden_size = 200\n",
-    "learning_rate = 1e-2\n",
-    "\n",
-    "linear_layer1 = torch.nn.Linear(num_features, hidden_size)\n",
-    "linear_layer2 = torch.nn.Linear(hidden_size, num_classes)\n",
-    "mlp_dropout = torch.nn.Sequential(linear_layer1, \n",
-    "                          torch.nn.ReLU(), \n",
-    "                          torch.nn.Dropout(0.25),  # drop 25% of the connections\n",
-    "                          linear_layer2)\n",
-    "\n",
-    "optimizer_dropout = torch.optim.SGD(mlp_dropout.parameters(), lr=learning_rate)"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "losses = train_model(mlp_dropout, train_x, train_y, 3, 8, optimizer_dropout)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "Training loss is a bit worse, as expected. After all, we are obstructing some connections.\n",
-    "\n",
-    "Now let's check validation performance:"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "evaluate_model(mlp_dropout, test_x, test_y)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "No improvement. Ideally, we should retrain our model with different hyperparamters (learning rates, layer sizes, number of layers, dropout rate) as well as some changes in the structure (different optimizers, activation functions, losses).\n",
-    "\n",
-    "However, data representation plays a key role. Do you think representing the input as independent pixels is a good idea for recognizing digits?"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "### Saving\n",
-    "\n",
-    "Persisting the model after training is obviously important to reuse it later.\n",
-    "\n",
-    "In Pytorch, we can save the model calling `save()` and passing  the model's `state_dict`."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "torch.save(mlp.state_dict(), 'mlp.model')"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "Later, recreate the model and load the data."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "mlp2 = torch.nn.Sequential(linear_layer1, \n",
-    "                          torch.nn.ReLU(), \n",
-    "                          linear_layer2)\n",
-    "mlp2.load_state_dict(torch.load('mlp.model'))"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "Let's check the performance to see if it's the same!"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "evaluate_model(mlp, test_x, test_y)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "**Exercise:** implement early stopping, so the model is saved whenever it reaches a lower loss best."
-   ]
-  }
- ],
- "metadata": {
-  "kernelspec": {
-   "display_name": "Python 3",
-   "language": "python",
-   "name": "python3"
-  },
-  "language_info": {
-   "codemirror_mode": {
-    "name": "ipython",
-    "version": 3
-   },
-   "file_extension": ".py",
-   "mimetype": "text/x-python",
-   "name": "python",
-   "nbconvert_exporter": "python",
-   "pygments_lexer": "ipython3",
-   "version": "3.6.6"
-  }
- },
- "nbformat": 4,
- "nbformat_minor": 2
-}
diff --git a/bonus-computational-efficiency.ipynb b/bonus-computational-efficiency.ipynb
new file mode 100644
index 0000000..43d0dec
--- /dev/null
+++ b/bonus-computational-efficiency.ipynb
@@ -0,0 +1,376 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Numpy and Computation Efficiency\n",
+    "\n",
+    "This notebooks illustrates the computational efficiency of running linear algebra with the proper tools - such as numpy.\n",
+    "\n",
+    "Let's compute an array dot product in Python:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import numpy as np"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "v1 = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99]\n",
+      "\n",
+      "v2 = [100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199]\n",
+      "\n",
+      "v1 dot v2 = 823350\n"
+     ]
+    }
+   ],
+   "source": [
+    "def array_dot_product(v1, v2):\n",
+    "    dot_product = 0\n",
+    "    \n",
+    "    for v1_i, v2_i in zip(v1, v2):\n",
+    "        dot_product += v1_i * v2_i\n",
+    "    \n",
+    "    return dot_product\n",
+    "\n",
+    "v1 = list(range(100))\n",
+    "v2 = list(range(100, 200))\n",
+    "\n",
+    "print(\"v1 = %s\\n\" % v1)\n",
+    "print(\"v2 = %s\\n\" % v2)\n",
+    "\n",
+    "result = array_dot_product(v1, v2)\n",
+    "print(\"v1 dot v2 = %d\" % result)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Okay, it works, but how long does it take?"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "7.52 µs ± 80 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)\n"
+     ]
+    }
+   ],
+   "source": [
+    "%timeit array_dot_product(v1, v2)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Now let's try with numpy -- it uses data structures like in C, optimized for mathematical operations, without the Python overhead."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "v1: [ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23\n",
+      " 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47\n",
+      " 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71\n",
+      " 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95\n",
+      " 96 97 98 99]\n",
+      "\n",
+      "v2: [100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117\n",
+      " 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135\n",
+      " 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153\n",
+      " 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171\n",
+      " 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189\n",
+      " 190 191 192 193 194 195 196 197 198 199]\n",
+      "\n",
+      "v1 dot v2 = 823350\n"
+     ]
+    }
+   ],
+   "source": [
+    "v1_np = np.arange(100)\n",
+    "v2_np = np.arange(100, 200)\n",
+    "print(\"v1: %s\\n\" % v1_np)\n",
+    "print(\"v2: %s\\n\" % v2_np)\n",
+    "\n",
+    "result = v1_np.dot(v2_np)\n",
+    "print(\"v1 dot v2 = %d\" % result)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Nice, aligned formatting. Now let's check the running time."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "765 ns ± 3.69 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)\n"
+     ]
+    }
+   ],
+   "source": [
+    "%timeit v1_np.dot(v2_np)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "What about matrices?"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def matrix_dot_product(m1, m2_t):\n",
+    "    num_rows = len(m1)\n",
+    "    num_columns = len(m2_t)\n",
+    "    internal_dim = len(m1[0])\n",
+    "    result = []\n",
+    "    \n",
+    "    for i in range(num_rows):\n",
+    "        new_row = []\n",
+    "        for j in range(num_columns):\n",
+    "            total = 0\n",
+    "            for k in range(internal_dim):\n",
+    "                total += m1[i][k] * m2_t[j][k]\n",
+    "            new_row.append(total)\n",
+    "        result.append(new_row)\n",
+    "    \n",
+    "    return result"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "m1 = np.random.rand(100, 200)\n",
+    "m2 = np.random.rand(200, 300)\n",
+    "\n",
+    "m2_t = m2.T\n",
+    "m1_list = m1.tolist()\n",
+    "m2_t_list = m2_t.tolist()\n",
+    "result_list = matrix_dot_product(m1_list, m2_t_list)\n",
+    "result_numpy = m1.dot(m2)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Checking the results..."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 8,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "array([[ True,  True,  True, ...,  True,  True,  True],\n",
+       "       [ True,  True,  True, ..., False,  True, False],\n",
+       "       [False,  True,  True, ..., False,  True,  True],\n",
+       "       ...,\n",
+       "       [ True,  True,  True, ...,  True, False, False],\n",
+       "       [False, False, False, ...,  True,  True, False],\n",
+       "       [False, False, False, ..., False,  True,  True]])"
+      ]
+     },
+     "execution_count": 8,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "result_list = np.array(result_list)\n",
+    "result_list == result_numpy"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Different? How much?"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 9,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "array([[ 0.00000000e+00,  0.00000000e+00,  0.00000000e+00, ...,\n",
+       "         0.00000000e+00,  0.00000000e+00,  0.00000000e+00],\n",
+       "       [ 0.00000000e+00,  0.00000000e+00,  0.00000000e+00, ...,\n",
+       "        -7.10542736e-15,  0.00000000e+00,  7.10542736e-15],\n",
+       "       [-7.10542736e-15,  0.00000000e+00,  0.00000000e+00, ...,\n",
+       "        -7.10542736e-15,  0.00000000e+00,  0.00000000e+00],\n",
+       "       ...,\n",
+       "       [ 0.00000000e+00,  0.00000000e+00,  0.00000000e+00, ...,\n",
+       "         0.00000000e+00,  7.10542736e-15, -7.10542736e-15],\n",
+       "       [-7.10542736e-15, -7.10542736e-15,  7.10542736e-15, ...,\n",
+       "         0.00000000e+00,  0.00000000e+00, -7.10542736e-15],\n",
+       "       [ 3.55271368e-14, -1.42108547e-14, -2.84217094e-14, ...,\n",
+       "        -1.42108547e-14,  0.00000000e+00,  0.00000000e+00]])"
+      ]
+     },
+     "execution_count": 9,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "result_list - result_numpy"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 10,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "9.776357501323218e-11"
+      ]
+     },
+     "execution_count": 10,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "np.abs(result_list - result_numpy).sum()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Okay. Now lets time it again."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 11,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "706 ms ± 2.84 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)\n"
+     ]
+    }
+   ],
+   "source": [
+    "%timeit matrix_dot_product(m1_list, m2_t_list)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 12,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "176 µs ± 7.43 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)\n"
+     ]
+    }
+   ],
+   "source": [
+    "%timeit m1.dot(m2)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 13,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "3509433.962264151"
+      ]
+     },
+     "execution_count": 13,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "time1 = 747-3\n",
+    "time2 = 212e-6\n",
+    "time1 / time2"
+   ]
+  }
+ ],
+ "metadata": {
+  "anaconda-cloud": {},
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.6.8"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}