diff --git a/notebooks/12b_mnist_loglike.ipynb b/notebooks/12b_mnist_loglike.ipynb index 9e978cf..b043d4e 100644 --- a/notebooks/12b_mnist_loglike.ipynb +++ b/notebooks/12b_mnist_loglike.ipynb @@ -1 +1,620 @@ -{"cells":[{"cell_type":"markdown","metadata":{"id":"tbTYFRhJoaBu"},"source":["# Calculation of the cross entropy loss (NLL) for a classification tasks\n","\n","\n","**Goal:** In this notebook you will use Keras to set up a CNN for classification of MNIST images and calculate the cross entropy before the CNN was trained. You will use basic numpy functions to calculate the loss that is expected from random guessing and see that an untrained CNN is not better than guessing.\n","\n","**Usage:** The idea of the notebook is that you try to understand the provided code by running it, checking the output and playing with it by slightly changing the code and rerunning it.\n","\n","**Dataset:** You work with the MNIST dataset. You have 60'000 28x28 pixel greyscale images of digits (0-9).\n","\n","**Content:**\n","* load the original MNIST data\n","* define a CNN in Keras\n","* evaluation of the cross entropy loss function of the untrained CNN for all classes\n","* implement the loss function yourself using the predicted probabilities and numpy\n","\n","\n","| [open in colab](https://colab.research.google.com/github/tensorchiefs/dl_book/blob/master/chapter_04/nb_ch04_02.ipynb)\n","\n"]},{"cell_type":"markdown","metadata":{"id":"oqKCMDTIL-A0"},"source":["#### Install correct TF version (colab only)"]},{"cell_type":"markdown","metadata":{"id":"PEIS4WvpsT5t"},"source":["#### Imports\n","\n","First you load all the required libraries."]},{"cell_type":"code","execution_count":null,"metadata":{"id":"-GphWmaNLfGg"},"outputs":[],"source":["import tensorflow as tf\n","if (not tf.__version__.startswith('2')): #Checking if tf 2.0 is installed\n"," print('Please install tensorflow 2.0 to run this notebook')\n","print('Tensorflow version: ',tf.__version__)"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"Y6S_hQX5oaBw","scrolled":true},"outputs":[],"source":["# load required libraries:\n","import numpy as np\n","import matplotlib.pyplot as plt\n","%matplotlib inline\n","plt.style.use('default')\n","from sklearn.metrics import confusion_matrix\n","\n","import tensorflow\n","from tensorflow.keras.models import Sequential\n","from tensorflow.keras.layers import Dense, Convolution2D, MaxPooling2D, Flatten , Activation\n","from tensorflow.keras.utils import to_categorical\n","from tensorflow.keras import optimizers\n"]},{"cell_type":"markdown","metadata":{"id":"4h_3TS0CtJJb"},"source":["\n","\n","\n","\n","#### Loading and preparing the MNIST data\n"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"4sZ8lqFfoaB2"},"outputs":[],"source":["from tensorflow.keras.datasets import mnist\n","(x_train, y_train), (x_test, y_test) = mnist.load_data()\n","\n","X_train=x_train / 255 #divide by 255 so that they are in range 0 to 1\n","X_train=np.reshape(X_train, (X_train.shape[0],28,28,1))\n","Y_train=tensorflow.keras.utils.to_categorical(y_train,10) # one-hot encoding\n","\n","\n","\n","Y_train.shape, X_train.shape"]},{"cell_type":"markdown","metadata":{"id":"ZaRFUEP8HJkq"},"source":["## CNN model\n","\n"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"JSfYQ4f1KYVp"},"outputs":[],"source":["# here you define hyperparameter of the CNN\n","batch_size = 128\n","nb_classes = 10\n","img_rows, img_cols = 28, 28\n","kernel_size = (3, 3)\n","input_shape = (img_rows, img_cols, 1)\n","pool_size = (2, 2)"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"f0q16a2uIBkg"},"outputs":[],"source":["\n","# define CNN with 2 convolution blocks and 2 fully connected layers\n","model = Sequential()\n","\n","model.add(Convolution2D(8,kernel_size,padding='same',input_shape=input_shape))\n","model.add(Activation('relu'))\n","model.add(Convolution2D(8, kernel_size,padding='same'))\n","model.add(Activation('relu'))\n","model.add(MaxPooling2D(pool_size=pool_size))\n","\n","model.add(Convolution2D(16, kernel_size,padding='same'))\n","model.add(Activation('relu'))\n","model.add(Convolution2D(16,kernel_size,padding='same'))\n","model.add(Activation('relu'))\n","model.add(MaxPooling2D(pool_size=pool_size))\n","\n","model.add(Flatten())\n","model.add(Dense(40))\n","model.add(Activation('relu'))\n","model.add(Dense(nb_classes))\n","model.add(Activation('softmax'))\n","\n","# compile model and intitialize weights\n","model.compile(loss='categorical_crossentropy',\n"," optimizer='adam',\n"," metrics=['accuracy'])"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"dWx9gqJ6IUpZ"},"outputs":[],"source":["# summarize model along with number of model weights\n","model.summary()"]},{"cell_type":"markdown","metadata":{"id":"3-mY30BBI4LJ"},"source":["Here you predict the probabilities for all images in the training data set. You did not train the network yet, therefore the probabilities will be around 10% for each class."]},{"cell_type":"code","execution_count":null,"metadata":{"id":"430DSTDIHJlP"},"outputs":[],"source":["# Calculate the probailities for the training data\n","Pred_prob = model.predict(X_train)"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"TTyxe7xMJUKC"},"outputs":[],"source":["Pred_prob[0:5]"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"XcqY_UbNYyP2"},"outputs":[],"source":["Pred_prob.shape, Y_train.shape"]},{"cell_type":"markdown","metadata":{"id":"9W46_Euob-ux"},"source":["### Exercise : Calculate the loss function using numpy\n"," \n","\n","*Exercise : Use numpy to calculate the value of the negative log-likelihood loss (=cross entropy) that you expect for the untrained CNN, which you have constructed above to discriminate between the 10 classes. Determine the cross entropy that results from the predicted probabilities (Pred_prob). To determine the cross entropy of the prediction, you can loop over each example and use its true label (Y_train) and the predicted probability for the true class. Do you get the cross entropy value that you have expected?*\n","\n","\n"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"n-nSWXYadTft"},"outputs":[],"source":["# Write your code here"]},{"cell_type":"markdown","metadata":{"id":"Hv1IEA74dPF6"},"source":["Scroll down to see the solution.\n","\n","
\n","
\n","
\n","
\n","
\n","
\n","
\n","
\n","
\n","
\n","
\n","
\n","
\n","
\n","
\n","
\n","
\n","
\n","
\n","
\n","
"]},{"cell_type":"markdown","metadata":{"id":"KM1EOk9WLkeh"},"source":["In the next cell you calculate the cross entropy loss of each single image, then you sum up all individual losses and divide the sum with the nr of training examples. You take the negative of this result to get the NLL, also known as categorical cross entropy."]},{"cell_type":"code","execution_count":null,"metadata":{"id":"v9GkdLKcY5OU"},"outputs":[],"source":["loss=np.zeros(len(X_train))\n","Y=np.argmax(Y_train,axis=1)\n","for i in range(0,len(X_train)):\n"," loss[i]=np.log(Pred_prob[i][Y[i]])\n","-np.sum(loss)/len(X_train)"]},{"cell_type":"markdown","metadata":{"id":"CxOvZwJiMZDg"},"source":["If you have no idea about the training dataset, your guess for every image would be 1/nr_of_classes, in the case with 10 classes, you would predicit every image with a probability around 0.1. The corresponding NLL is calculated below:"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"oWml2J8MKqwQ"},"outputs":[],"source":["nr_of_classes=10\n","-np.log(1/nr_of_classes)"]},{"cell_type":"markdown","metadata":{"id":"eJElA61ZMeZM"},"source":["You get more or less the same result as as you got with the model.evaluate function for the untrained CNN. "]},{"cell_type":"code","execution_count":null,"metadata":{"id":"n60Ql16SLZac"},"outputs":[],"source":["model.evaluate(X_train, Y_train,verbose=2)"]}],"metadata":{"accelerator":"GPU","colab":{"provenance":[],"toc_visible":true},"kernelspec":{"display_name":"Python 3 (ipykernel)","language":"python","name":"python3"},"language_info":{"codemirror_mode":{"name":"ipython","version":3},"file_extension":".py","mimetype":"text/x-python","name":"python","nbconvert_exporter":"python","pygments_lexer":"ipython3","version":"3.7.11"}},"nbformat":4,"nbformat_minor":0} \ No newline at end of file +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "id": "tbTYFRhJoaBu" + }, + "source": [ + "# Calculation of the cross entropy loss (NLL) for a classification tasks\n", + "\n", + "\n", + "**Goal:** In this notebook you will use Keras to set up a CNN for classification of MNIST images and calculate the cross entropy before the CNN was trained. You will use basic numpy functions to calculate the loss that is expected from random guessing and see that an untrained CNN is not better than guessing.\n", + "\n", + "**Usage:** The idea of the notebook is that you try to understand the provided code by running it, checking the output and playing with it by slightly changing the code and rerunning it.\n", + "\n", + "**Dataset:** You work with the MNIST dataset. You have 60'000 28x28 pixel greyscale images of digits (0-9).\n", + "\n", + "**Content:**\n", + "* load the original MNIST data\n", + "* define a CNN in Keras\n", + "* evaluation of the cross entropy loss function of the untrained CNN for all classes\n", + "* implement the loss function yourself using the predicted probabilities and numpy\n", + "\n", + "\n", + "| [open in colab](https://colab.research.google.com/github/tensorchiefs/dl_book/blob/master/chapter_04/nb_ch04_02.ipynb)\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "PEIS4WvpsT5t" + }, + "source": [ + "#### Imports\n", + "\n", + "First you load all the required libraries." + ] + }, + { + "cell_type": "code", + "execution_count": 26, + "metadata": { + "id": "Y6S_hQX5oaBw", + "scrolled": true + }, + "outputs": [], + "source": [ + "# load required libraries:\n", + "import tensorflow as tf\n", + "import numpy as np\n", + "import matplotlib.pyplot as plt\n", + "%matplotlib inline\n", + "plt.style.use('default')\n", + "from sklearn.metrics import confusion_matrix\n", + "\n", + "import tensorflow\n", + "from tensorflow.keras.models import Sequential\n", + "from tensorflow.keras.layers import Dense, Convolution2D, MaxPooling2D, Flatten , Activation\n", + "from tensorflow.keras.utils import to_categorical\n", + "from tensorflow.keras import optimizers" + ] + }, + { + "cell_type": "markdown", + "source": [ + "### Exercise: Likelihood if you have no clue\n", + " \n", + "If you have no idea about the training dataset, your guess for every image would be 1/nr_of_classes. Calculate the NLL for that case." + ], + "metadata": { + "id": "ePknaVwI0YBB" + } + }, + { + "cell_type": "code", + "source": [ + "# @title Solution\n", + "nr_of_classes=10\n", + "-np.log(1/nr_of_classes)" + ], + "metadata": { + "cellView": "form", + "id": "ZtsW-gxA0xx4", + "outputId": "e90b019d-a514-447a-ce52-9e615de7e61f", + "colab": { + "base_uri": "https://localhost:8080/" + } + }, + "execution_count": 27, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "2.3025850929940455" + ] + }, + "metadata": {}, + "execution_count": 27 + } + ] + }, + { + "cell_type": "markdown", + "source": [ + "## Likelihood of untrained CNN" + ], + "metadata": { + "id": "mPHUVTZV1YF7" + } + }, + { + "cell_type": "markdown", + "metadata": { + "id": "4h_3TS0CtJJb" + }, + "source": [ + "\n", + "\n", + "\n", + "\n", + "#### Loading and preparing the MNIST data\n" + ] + }, + { + "cell_type": "code", + "execution_count": 39, + "metadata": { + "id": "4sZ8lqFfoaB2", + "outputId": "0093ba7e-1057-4521-e09a-f737a20d4c52", + "colab": { + "base_uri": "https://localhost:8080/" + } + }, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "((60000, 10), (60000, 28, 28, 1))" + ] + }, + "metadata": {}, + "execution_count": 39 + } + ], + "source": [ + "from tensorflow.keras.datasets import mnist\n", + "(x_train, y_train), (x_test, y_test) = mnist.load_data()\n", + "\n", + "X_train=x_train #/ 255 #divide by 255 so that they are in range 0 to 1\n", + "X_train=np.reshape(X_train, (X_train.shape[0],28,28,1))\n", + "Y_train=tensorflow.keras.utils.to_categorical(y_train,10) # one-hot encoding\n", + "\n", + "\n", + "Y_train.shape, X_train.shape" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ZaRFUEP8HJkq" + }, + "source": [ + "## CNN model\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": 40, + "metadata": { + "id": "JSfYQ4f1KYVp" + }, + "outputs": [], + "source": [ + "# here you define hyperparameter of the CNN\n", + "batch_size = 128\n", + "nb_classes = 10\n", + "img_rows, img_cols = 28, 28\n", + "kernel_size = (3, 3)\n", + "input_shape = (img_rows, img_cols, 1)\n", + "pool_size = (2, 2)" + ] + }, + { + "cell_type": "code", + "execution_count": 41, + "metadata": { + "id": "f0q16a2uIBkg" + }, + "outputs": [], + "source": [ + "# define CNN with 2 convolution blocks and 2 fully connected layers\n", + "model = Sequential()\n", + "\n", + "model.add(Convolution2D(8,kernel_size,padding='same',input_shape=input_shape))\n", + "model.add(Activation('relu'))\n", + "model.add(Convolution2D(8, kernel_size,padding='same'))\n", + "model.add(Activation('relu'))\n", + "model.add(MaxPooling2D(pool_size=pool_size))\n", + "\n", + "model.add(Convolution2D(16, kernel_size,padding='same'))\n", + "model.add(Activation('relu'))\n", + "model.add(Convolution2D(16,kernel_size,padding='same'))\n", + "model.add(Activation('relu'))\n", + "model.add(MaxPooling2D(pool_size=pool_size))\n", + "\n", + "model.add(Flatten())\n", + "model.add(Dense(40))\n", + "model.add(Activation('relu'))\n", + "model.add(Dense(nb_classes))\n", + "model.add(Activation('softmax'))\n", + "\n", + "# compile model and intitialize weights\n", + "model.compile(loss='categorical_crossentropy',\n", + " optimizer='adam',\n", + " metrics=['accuracy'])" + ] + }, + { + "cell_type": "code", + "execution_count": 42, + "metadata": { + "id": "dWx9gqJ6IUpZ", + "outputId": "4f3cc774-88a3-4bc5-95af-27d488d4579f", + "colab": { + "base_uri": "https://localhost:8080/" + } + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Model: \"sequential_3\"\n", + "_________________________________________________________________\n", + " Layer (type) Output Shape Param # \n", + "=================================================================\n", + " conv2d_12 (Conv2D) (None, 28, 28, 8) 80 \n", + " \n", + " activation_18 (Activation) (None, 28, 28, 8) 0 \n", + " \n", + " conv2d_13 (Conv2D) (None, 28, 28, 8) 584 \n", + " \n", + " activation_19 (Activation) (None, 28, 28, 8) 0 \n", + " \n", + " max_pooling2d_6 (MaxPoolin (None, 14, 14, 8) 0 \n", + " g2D) \n", + " \n", + " conv2d_14 (Conv2D) (None, 14, 14, 16) 1168 \n", + " \n", + " activation_20 (Activation) (None, 14, 14, 16) 0 \n", + " \n", + " conv2d_15 (Conv2D) (None, 14, 14, 16) 2320 \n", + " \n", + " activation_21 (Activation) (None, 14, 14, 16) 0 \n", + " \n", + " max_pooling2d_7 (MaxPoolin (None, 7, 7, 16) 0 \n", + " g2D) \n", + " \n", + " flatten_3 (Flatten) (None, 784) 0 \n", + " \n", + " dense_6 (Dense) (None, 40) 31400 \n", + " \n", + " activation_22 (Activation) (None, 40) 0 \n", + " \n", + " dense_7 (Dense) (None, 10) 410 \n", + " \n", + " activation_23 (Activation) (None, 10) 0 \n", + " \n", + "=================================================================\n", + "Total params: 35962 (140.48 KB)\n", + "Trainable params: 35962 (140.48 KB)\n", + "Non-trainable params: 0 (0.00 Byte)\n", + "_________________________________________________________________\n" + ] + } + ], + "source": [ + "# summarize model along with number of model weights\n", + "model.summary()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "3-mY30BBI4LJ" + }, + "source": [ + "Here you predict the probabilities for all images in the training data set. You did not train the network yet, therefore the probabilities will be around 10% for each class." + ] + }, + { + "cell_type": "code", + "execution_count": 43, + "metadata": { + "id": "430DSTDIHJlP", + "outputId": "d509f70f-2e2a-47ef-e55e-2266dc585b61", + "colab": { + "base_uri": "https://localhost:8080/" + } + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "1875/1875 [==============================] - 4s 2ms/step\n" + ] + } + ], + "source": [ + "# Calculate the probailities for the training data\n", + "Pred_prob = model.predict(X_train)" + ] + }, + { + "cell_type": "code", + "execution_count": 44, + "metadata": { + "id": "TTyxe7xMJUKC", + "outputId": "9b224c9a-9ac0-4ee5-e45f-c7df310f1e21", + "colab": { + "base_uri": "https://localhost:8080/" + } + }, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "array([[1.3205596e-18, 5.2988291e-12, 2.5350449e-20, 1.2180353e-22,\n", + " 1.8144437e-16, 4.7060991e-21, 1.5764765e-08, 1.5844094e-17,\n", + " 5.9404232e-11, 1.0000000e+00],\n", + " [2.7667373e-24, 2.3641273e-11, 2.2651152e-25, 4.4870147e-20,\n", + " 3.5441853e-15, 9.4421481e-27, 9.9942786e-01, 2.2959579e-11,\n", + " 2.4655371e-09, 5.7209819e-04],\n", + " [1.3653172e-02, 8.3011842e-01, 6.5737753e-03, 1.5207104e-10,\n", + " 4.7123001e-14, 3.8285774e-20, 2.3730610e-05, 3.8681562e-12,\n", + " 3.6684156e-11, 1.4963086e-01],\n", + " [9.1495855e-14, 1.3210382e-09, 1.6606809e-12, 5.4113473e-16,\n", + " 2.6305097e-15, 7.8905446e-23, 9.9965596e-01, 4.8645860e-10,\n", + " 4.0532360e-09, 3.4400207e-04],\n", + " [3.3100608e-08, 4.6578123e-12, 3.2831602e-09, 4.1076941e-18,\n", + " 8.9987955e-19, 3.1901950e-16, 4.6687070e-03, 8.5858165e-10,\n", + " 1.3493828e-05, 9.9531770e-01]], dtype=float32)" + ] + }, + "metadata": {}, + "execution_count": 44 + } + ], + "source": [ + "Pred_prob[0:5]" + ] + }, + { + "cell_type": "code", + "execution_count": 45, + "metadata": { + "id": "XcqY_UbNYyP2", + "outputId": "de17541c-4818-4eac-f3b2-d72f24bc952a", + "colab": { + "base_uri": "https://localhost:8080/" + } + }, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "((60000, 10), (60000, 10))" + ] + }, + "metadata": {}, + "execution_count": 45 + } + ], + "source": [ + "Pred_prob.shape, Y_train.shape" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "9W46_Euob-ux" + }, + "source": [ + "### Exercise : Calculate the loss function using numpy\n", + " \n", + "\n", + "*Exercise : Use numpy to calculate the value of the negative log-likelihood loss (=cross entropy) that you expect for the untrained CNN, which you have constructed above to discriminate between the 10 classes. Determine the cross entropy that results from the predicted probabilities (Pred_prob). To determine the cross entropy of the prediction, you can loop over each example and use its true label (Y_train) and the predicted probability for the true class. Do you get the cross entropy value that you have expected?*\n", + "\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": 46, + "metadata": { + "id": "n-nSWXYadTft" + }, + "outputs": [], + "source": [ + "# Write your code here" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Hv1IEA74dPF6" + }, + "source": [ + "Scroll down to see the solution.\n", + "\n", + "
\n", + "
\n", + "
\n", + "
\n", + "
\n", + "
\n", + "
\n", + "
\n", + "
\n", + "
\n", + "
\n", + "
\n", + "
\n", + "
\n", + "
\n", + "
\n", + "
\n", + "
\n", + "
\n", + "
\n", + "
" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "KM1EOk9WLkeh" + }, + "source": [ + "In the next cell you calculate the cross entropy loss of each single image, then you sum up all individual losses and divide the sum with the nr of training examples. You take the negative of this result to get the NLL, also known as categorical cross entropy." + ] + }, + { + "cell_type": "code", + "execution_count": 47, + "metadata": { + "id": "v9GkdLKcY5OU", + "outputId": "c34d1f71-99d5-4528-a19a-965cbbd9597e", + "colab": { + "base_uri": "https://localhost:8080/" + } + }, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "25.79395893874508" + ] + }, + "metadata": {}, + "execution_count": 47 + } + ], + "source": [ + "loss=np.zeros(len(X_train))\n", + "Y=np.argmax(Y_train,axis=1)\n", + "for i in range(0,len(X_train)):\n", + " loss[i]=np.log(Pred_prob[i][Y[i]])\n", + "-np.sum(loss)/len(X_train)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "eJElA61ZMeZM" + }, + "source": [ + "You get more a similar result as as you got with the model.evaluate function for the untrained CNN. " + ] + }, + { + "cell_type": "code", + "execution_count": 48, + "metadata": { + "id": "n60Ql16SLZac", + "outputId": "81a4c9c0-616a-43c2-c9b1-319bd222294c", + "colab": { + "base_uri": "https://localhost:8080/" + } + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "1875/1875 - 4s - loss: 25.7940 - accuracy: 0.1081 - 4s/epoch - 2ms/step\n" + ] + }, + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "[25.793962478637695, 0.1080833300948143]" + ] + }, + "metadata": {}, + "execution_count": 48 + } + ], + "source": [ + "model.evaluate(X_train, Y_train,verbose=2)" + ] + }, + { + "cell_type": "markdown", + "source": [ + "### Exercise: Change normalization\n", + " \n", + "\n", + "Load the data again, but this time do not scale the data. Repeat the analysis. What is the result? Why is it a good idea to check the loss for untrained networks." + ], + "metadata": { + "id": "kDiZFteB14_b" + } + }, + { + "cell_type": "code", + "source": [ + "# Scroll Down for solution\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "#" + ], + "metadata": { + "id": "FU565zi4yUS7" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "source": [ + "In the unscaled case, you do not necessarily get 1/10 for the probabilities, and thus the loss is usually larger than 2.3. So, the training starts worse than what you would get from pure guessing. This poor start not only leads to a higher initial loss but also increases training time. Effective initialization ensures a smoother and more efficient learning trajectory." + ], + "metadata": { + "id": "E4sXJukR20iN" + } + }, + { + "cell_type": "code", + "source": [], + "metadata": { + "id": "L1S-pzgr3Lpe" + }, + "execution_count": 48, + "outputs": [] + } + ], + "metadata": { + "accelerator": "GPU", + "colab": { + "provenance": [], + "toc_visible": true + }, + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.7.11" + } + }, + "nbformat": 4, + "nbformat_minor": 0 +} \ No newline at end of file