From 6fec5a5a3ecda85dc7ffbafc8eb85f49234d596f Mon Sep 17 00:00:00 2001 From: uu-sml Date: Tue, 17 Jan 2023 16:28:05 +0000 Subject: [PATCH] [github actions] deployed from uu-sml/course-sml --- exercises/SML-session_0.ipynb | 2 +- exercises/solutions/SML-session_0.html | 13 +++++++------ 2 files changed, 8 insertions(+), 7 deletions(-) diff --git a/exercises/SML-session_0.ipynb b/exercises/SML-session_0.ipynb index 5311156..6947e4d 100644 --- a/exercises/SML-session_0.ipynb +++ b/exercises/SML-session_0.ipynb @@ -1 +1 @@ -{"cells": [{"cell_type": "markdown", "metadata": {"pycharm": {"name": "#%% md\n"}, "tags": []}, "source": "# L0: Introduction to Python\nIn this exercise session, we will review the fundamentals of three important Python libraries: Numpy, Pandas and Matplotlib. Throughout the course, you will need to get familiar with several Python libraries that provide convenient functionality for machine learning purposes. It is good to get into the habit of using the available documentation to your advantage. Some efficient ways of doing this are described below:"}, {"cell_type": "markdown", "metadata": {"pycharm": {"name": "#%% md\n"}, "tags": []}, "source": "#### help() function\nIn Python, the *help()*-function can be used to display the documentation for a module, function, or object. When called with no arguments it opens an interactive help session. When called with a specific object as an argument, it displays the documentation for that object. For example, you can use *help(print)* to view the documentation for the built-in print function, or *help(str)* to view the documentation for the str class. Additionally, you can use the *dir()*-function to get all methods and properties of the object passed as an argument to it. It can be used to check all the attributes of a module or class. For example, *dir(str)* will give the methods and properties of the str class.\n\n\n#### SHIFT + TAB\n\nIn Jupyter notebook, *shift+tab* is a keyboard shortcut that can be used to access the documentation for the function or object that appears immediately before the cursor. When you press *shift+tab*, a small pop-up window will appear that contains information about the function or object, including its arguments and their types. Pressing *shift+tab* multiple times will cycle through different levels of documentation. If nothing is selected it will show the tip of the current cell. When running notebooks on Google Colab, you can trigger the documentation by clicking the function and then hovering over it with the cursor."}, {"cell_type": "markdown", "source": "Before getting started, we make sure that the libraries are properly imported in our current environment. Do this by running the cell below.", "metadata": {"collapsed": false, "pycharm": {"name": "#%% md\n"}, "tags": []}}, {"cell_type": "code", "execution_count": null, "outputs": [], "source": "import numpy as np\nimport pandas as pd\nimport matplotlib.pyplot as plt\n", "metadata": {"collapsed": false, "pycharm": {"name": "#%%\n"}, "tags": []}}, {"cell_type": "markdown", "metadata": {"pycharm": {"name": "#%% md\n"}, "tags": []}, "source": "## 0.1 Numpy Fundamentals\n### Basic data structures\n\n### (a)\n\nVectors, matrices and tensors can be represented as *numpy arrays*. Numpy arrays are often initialized from regular Python lists. For example, the Python list [1, 2, 3] can be converted into a 1D numpy array using the command np.array([1, 2, 3]). Create this numpy array in the cell below and print its shape. You can find the shape of a numpy array A using *np.shape(A)*.\n\nYou can create 1D arrays with n elements using for example np.zeros(n), np.ones(n), np.arange(a, b), np.random.rand(n), np.linspace(a, b, n)... Try this out and figure out what the different functions do.\n\n### (b)\n\nSimilarly, a 2D numpy array can be created from a nested Python list (a list of lists). Convert the nested lists [ [1, 2, 3] ] and [ [1], [2], [3] ] into numpy arrays and print their shapes. Which one represents a column vector, and which one represents a row vector?\n\n### (c)\n\nCreate a 2D numpy array to represent the matrix D and inspect its shape.\n$$\n\\textbf{D} = \\begin{bmatrix}\n 1 & 2 & 3 \\\\\n 4 & 5 & 6 \\\\\n 7 & 8 & 9 \\\\\n 10 & 11 & 12 \\\\\n\\end{bmatrix}\n$$\n\nWe can create higher dimensional *ndarrays*. in a similar way. You can create ndarrays of shape (n, m) using for example np.zeros((n, m)), np.ones((n, m), np.random.rand(n, m). You can also use np.eye(n) to create a diagonal matrix of shape (n,n). Try this out and make sure you understand what the different functions do.\n"}, {"cell_type": "code", "execution_count": null, "metadata": {"pycharm": {"name": "#%%\n"}, "tags": []}, "outputs": [], "source": "# enter your code here\n"}, {"cell_type": "markdown", "metadata": {"pycharm": {"name": "#%% md\n"}, "tags": []}, "source": "### Slicing and Indexing\n### (d)\n\nWe can access elements and slice numpy arrays easily. The cell below defines a 1D array F and a 2D array G. Try the commands: F[0], F[-1], F[:-2], G[2,3], G[:,2], G[-1,:3] and figure out what they mean. Make sure you understand how to index and slice numpy arrays of different dimensions.\n\n### (e)\nIt is also easy to assign new values to specific elements or entire rows and columns of numpy arrays. For example, F[0] = 5 replaces the first value of F with 5. Figure out how to replace the last column of G with the array F."}, {"cell_type": "code", "execution_count": null, "metadata": {"pycharm": {"name": "#%%\n"}, "tags": []}, "outputs": [], "source": "# (d)\nF = np.arange(4)\nG = np.array([[1, 4, 5, 6, 1], [2, 5, 6, 6, 1], [2, 3, 1, 1, 1], [8, 12, 14, 20, 1]])\n\n# enter your code here\n"}, {"cell_type": "markdown", "metadata": {"pycharm": {"name": "#%% md\n"}, "tags": []}, "source": "### Reshaping and Stacking\n### (f)\nIf we have access to a 1D array H, there are multiple ways of converting this array into a 2D array, i.e. adding a dimension to the numpy array. This can be done using the function *np.reshape(H, (n, m))*, where (n, m) is the desired shape of the array. Alternatively, we can write *H.reshape(n, m)*. Convert the array H given below into 2D arrays of different shapes (n,m) and inspect the result. What is the requirement on n and m? Can you use the reshape-function to convert H into a column vector and a row vector?\n\n### (g)\nIf we want to reshape a 1D array into a column- or row vector and we do not know the size of the array, we can use -1 in place of the unknown size, i.e. *np.reshape(array, -1, 1)*. Use this method to convert the array M into a column- and a row vector, and confirm the dimensions using np.shape().\n\nWe can also use *np.newaxis* to expand the dimension of an array M: M2 = M[*np.newaxis*, :]. Use this method to convert the array M into a column and a row vector as well. This can be important for example when a function requires a 2D array and you have access to your data in a 1D array.\n\n### (h)\nWe can also stack numpy arrays to create new arrays, using for example *np.vstack() and np.hstack()*. In the example below, F and G are stacked vertically and horizontally. Inspect the results to understand how to stack numpy arrays. Then, create a new array X and add a row and a column of ones using the appropriate functions. Let X be a 3x3 diagonal matrix with 4's along the diagonal. Confirm the shape of the resulting array. Remember that you can create a digaonal matrix with np.eye()."}, {"cell_type": "code", "execution_count": null, "metadata": {"pycharm": {"name": "#%%\n"}, "tags": []}, "outputs": [], "source": "# f)\nH = np.array([10, 20, 30, 40, 50, 60, 70, 80, 90, 100])\n\n# enter your code here\n# g)\nM = np.linspace(1, 17, 100)\n\n# enter your code here\n# h)\nF = np.zeros(4)\nG = np.array([[2, 2, 2, 2], [3, 3, 3, 3], [4, 4, 4, 4], [5, 5, 5, 5]])\nG_rowstack = np.vstack((G, F)) # adds a row of zeros to G at the bottom (vstack --> vertical additon)\nF = F.reshape(-1,1) # to add a column, we need to reshape F into the appropriate 2D array\nG_colstack = np.hstack((G, F)) # adds a column of zeros to G at the right (hstack --> horizontal additon)\nprint(f'Row extended G:\\n {G_rowstack}')\nprint(f'Column extended G:\\n {G_colstack}')\n\n# enter your code here\n"}, {"cell_type": "markdown", "metadata": {"pycharm": {"name": "#%% md\n"}, "tags": []}, "source": "### Aggregation and Linear Algebra\n### (i)\nThere is a sea of useful numpy functions that you may want to become familiar with. For example, you can find the minimum and maximum element of a numpy array z using *np.min(z)* and *np.max(z)*. If z is a matrix, you can find the minimum and maximum across rows and columns (i.e. across each *axis* of the 2D array) using *np.min(z, axis=0)* and *np.max(z, axis=1)*. You can also find the sum of an entire array, or the sum across columns or rows, using *np.sum()*. Find the minimum and maximum element of the matrix Z defined below, as well as the sum across the columns of the matrix.\n$$\n\\textbf{Z} = \\begin{bmatrix}\n 10 & 0 & 0 \\\\\n 1 & 11 & 1 \\\\\n 2 & 2 & 12 \\\\\n\\end{bmatrix}\n$$\n\n### (j)\nArithmetic operations on numpy arrays are straightforward. For example, you may add two arrays A and B of appropriate size simply through *A+B*, or *np.add(A, B)*. Many useful linear algebra operations are also available in numpy. For example, you can find the transpose of a matrix Z defined as a numpy array using *np.linalg.transpose(Z)* (or simply *Z.T*). You can find the inverse using *np.linalg.inv(Z)*. Matrix multiplication of two matrices A and B can be performed using *np.matmul(A, B)* (or simply A@B, where the @-operator implements np.matmul). Note that $A*B$ returns the elementwise multiplication of A and B.\n\nAnother useful function is the linear system solver. A linear system of the form $Z\\cdot x=b$ can be solved efficiently using np.linalg.solve($Z$, $b$). Solve the following linear system both using the matrix inverse and np.linalg.solve:\n\n$$\n\\begin{bmatrix}\n 10 & 0 & 0 \\\\\n 1 & 11 & 1 \\\\\n 2 & 2 & 12 \\\\\n\\end{bmatrix} x = \\begin{bmatrix}\n 2\\\\\n 1\\\\\n 10\\\\\n\\end{bmatrix}\n$$"}, {"cell_type": "code", "execution_count": null, "metadata": {"pycharm": {"name": "#%%\n"}, "tags": []}, "outputs": [], "source": "# enter your code here\n"}, {"cell_type": "markdown", "metadata": {"pycharm": {"name": "#%% md\n"}, "tags": []}, "source": "# 0.2 Pandas Fundamentals\n\nPandas dataframes can be used to store data tables, and contains functionality to analyze, explore and manipulate the data in these tables. Numpy arrays can be converted into dataframes, but in this course we will mostly load datasets from csv files to pandas dataframes.\n\n## (a)\n\nWe begin by exploring the auto-dataset. Run the cell below to load the dataset and store it in a pandas dataframe called 'Auto'. We also print the number of rows and columns in the dataset. The dataset contains information about a number of vehicles. The following features are observed:\n\n- `mpg`: miles per gallon\n- `cylinders`: Number of cylinders between 4 and 8\n- `displacement`: Engine displacement (cu. inches)\n- `horsepower`: Engine horsepower\n- `weight`: Vehicle weight (lbs.)\n- `acceleration`: Time to accelerate from 0 to 60 mph (sec.)\n- `year`: Model year (modulo 100)\n- `origin`: Origin of car (1. American, 2. European, 3. Japanese)\n- `name`: Vehicle name\n\nTo get an overview of the data, we can use the commands Auto.info(). Using Auto.describe() we get summaries of some important statistics for each column in the dataset. With Auto.head() we can take a look at the first five rows of the data. Use these functions and get an overview of the data. What information can we get from the dataset, and how many samples have we collected? Each entry in the dataframe is a sample (measurement point) that we can use to train our machine learning models."}, {"cell_type": "code", "execution_count": null, "metadata": {"pycharm": {"name": "#%%\n"}, "tags": []}, "outputs": [], "source": "url = 'https://github.com/uu-sml/course-sml-public/raw/master/data/auto.csv'\nAuto = pd.read_csv(url)\nprint(f'Auto.shape: {Auto.shape}')\n\n# enter your code here\n"}, {"cell_type": "markdown", "metadata": {"pycharm": {"name": "#%% md\n"}, "tags": []}, "source": "## (b)\n\nIf we only need a subset of the dataframe, we can create a new dataframe containing this subset. For example, we can create a dataframe X containing only the weight and acceleration features by running the cell below. Explore the new dataframe, and check the shape using X.shape."}, {"cell_type": "code", "execution_count": null, "metadata": {"pycharm": {"name": "#%%\n"}, "tags": []}, "outputs": [], "source": "X = Auto[[\"weight\", \"acceleration\"]]\n\n# enter your code here\n"}, {"cell_type": "markdown", "metadata": {"pycharm": {"name": "#%% md\n"}, "tags": []}, "source": "## (c)\n\nWe can also slice the dataframe using index. For example, we can pick out the last column of the dataframe by running the cell below. Explore the new dataframe X2 as you did in (b). Create a new dataframe containing multiple columns using index."}, {"cell_type": "code", "execution_count": null, "metadata": {"pycharm": {"name": "#%%\n"}, "tags": []}, "outputs": [], "source": "X2 = Auto.iloc[:, -1]\n\n# enter your code here\n"}, {"cell_type": "markdown", "metadata": {"pycharm": {"name": "#%% md\n"}, "tags": []}, "source": "## (d)\nIn the course, we will often divide a dataset randomly into a train and a test set. This means that we want a random subset of the entries to go in each sliced dataset. In the cell below, we use numpy's random number generator to generate a numpy array containing indices of 80% of the entries in the Auto dataset, chosen randomly. The total number of samples is N, and 80% then corresponds to n samples that should go in our train set. We use np.random.choice() to pick out n out of N random indices, which is returned in a numpy array. Then, we use Auto.index.isin() on this array. This function returns a boolean array with element False if the index in Auto is not found among the random indices, and True if it is there.\n\nInspect both the array of random indices, random_index, and the boolean array, train_samples. Finally, we create a boolean array for the test set, which is True in each element where the train boolean array is False, and vice versa. Make sure you understand what is happening in every line of the code."}, {"cell_type": "code", "execution_count": null, "metadata": {"tags": [], "pycharm": {"name": "#%%\n"}}, "outputs": [], "source": "N = Auto.shape[0] # total number of samples in the dataset\nn = round(0.8*N) # total number of samples in the train dataset\nrandom_index = np.random.choice(N, size = n, replace = False) # replace=False is needed so that the same index does not appear twice in the final list\ntrain_samples = Auto.index.isin(random_index) # boolean array containing True if the sample has been chosen or False otherwise\ntest_samples = ~train_samples # complementary boolean array"}, {"cell_type": "markdown", "metadata": {"pycharm": {"name": "#%% md\n"}, "tags": []}, "source": "## (e)\nCreate a new dataframe containing only the random indices generated. This can be done by passing the boolean array corresponding to the sliced datasets to Auto.iloc. Inspect the train and test sets. Are the shapes as you expect?"}, {"cell_type": "code", "execution_count": null, "metadata": {"pycharm": {"name": "#%%\n"}, "tags": []}, "outputs": [], "source": "# enter your code here\n"}, {"cell_type": "markdown", "metadata": {"pycharm": {"name": "#%% md\n"}, "tags": []}, "source": "# 0.3 Matplotlib Fundamentals\n\n## (a)\n\nMatplotlib is an extensive Python library for data visualization. Matplotlib is used along with numpy to provide an it is often used to create visualizations of data in machine learning, such as line plots, scatter plots, bar plots, histograms, and 3D plots. These visualizations can be useful for understanding the behavior of the data and the performance of machine learning models. Additionally, it is used also to visualize the performance of the model during the training process as well as its predictions with test data set.\n\nThere are tools for a variety of different kinds of plots. Check the documentation (**https://matplotlib.org/**) for information on design choices and different plotting options.\n\nIn the cell below you can find a simple example on how to plot 2D numpy arrays."}, {"cell_type": "code", "execution_count": null, "metadata": {"tags": [], "pycharm": {"name": "#%%\n"}}, "outputs": [], "source": "dinosaur_fossils = np.array([5, 15, 34, 9, 122, 420, 850])\nyear = np.array([1820, 1860, 1900, 1940, 1980, 2000, 2020])\nplt.figure(1)\nplt.plot(year, dinosaur_fossils, 'g-*', label='Fossil Count by Year')\nplt.legend()\nplt.title('Dinosaur Fossils Found over Time')\nplt.xlabel('Year')\nplt.ylabel('Fossil Count')\n#plt.savefig('dinosaur_fossil.png') # you can use this command to save a figure to the main project folder\nplt.show()"}, {"cell_type": "markdown", "metadata": {"pycharm": {"name": "#%% md\n"}, "tags": []}, "source": "## (b)\n\nIn the cell below, we have returned to the Auto dataframe. *Auto.groupby('Year')* returns a Pandas DataFrameGroupBy-object containing information summarized by the Model Year. Auto.groupby('year').mean() takes the mean of each entry in the remaining feature columns, grouped by the model year. Inspect the resulting dataframe and plot the mean acceleration as a function of time. You can convert a dataframe A to a numpy array using *A.to_numpy()*"}, {"cell_type": "code", "execution_count": null, "metadata": {"pycharm": {"name": "#%%\n"}, "tags": []}, "outputs": [], "source": "year_data = Auto.groupby('year').mean()\n\n# enter your code here\n"}, {"cell_type": "markdown", "source": "## (c) Subplot examples\n\nIn matplotlib, a figure can contain multiple subplots, which are organized in a grid-like pattern. You can create a new figure and add subplots to it using the *plt.figure()* and *plt.subplot()* functions. The *figure()* function creates a new figure, and the *subplot()* function is used to add subplots to the figure.\n\nIn the following cell, there is an example that creates a figure with 2 rows and 2 columns of subplots, and then plots a sine wave in each subplot. Inspect the code and, if you wish, play around with the plt.subplots-command, for example by plotting data from the Auto-dataframe, if you want further practice.", "metadata": {"collapsed": false, "pycharm": {"name": "#%% md\n"}, "tags": []}}, {"cell_type": "code", "execution_count": null, "metadata": {"tags": [], "pycharm": {"name": "#%%\n"}}, "outputs": [], "source": "# Create a new figure with 2x2 subplots\nfig, axs = plt.subplots(2, 2)\n\n# Generate data\nx = np.linspace(0, 2 * np.pi, 100)\ny = np.sin(x)\n\n# Plot a sine wave in each subplot\naxs[0, 0].plot(x, y)\naxs[0, 1].plot(x, y)\naxs[1, 0].plot(x, y)\naxs[1, 1].plot(x, y)\n\n# Add labels and titles\naxs[0, 0].set_title('Sine wave 1')\naxs[0, 1].set_title('Sine wave 2')\naxs[1, 0].set_title('Sine wave 3')\naxs[1, 1].set_title('Sine wave 4')\n\n# Show the figure\nplt.subplots_adjust(wspace= 0.5, hspace= 0.5) # function that allows us to adjust the spacing between subplots in a figure\nplt.show()"}, {"cell_type": "markdown", "metadata": {"pycharm": {"name": "#%% md\n"}, "tags": []}, "source": "You can also use *plt.subplots(*nrows*, *ncols*, *sharex* = True, *sharey* = True)*, which creates a figure with *nrows* x *ncols* subplots in it, with sharing x and y axis. In the following example, we create a figure with 2 rows and 1 columns of subplots. The *sharex* and *sharey* arguments are set to True when creating the subplots with *plt.subplots()*. This means that the x-axis of the first subplot (ax1) will be shared with the x-axis of the second subplot (ax2), and the y-axis of the first subplot (ax1) will be shared with the y-axis of the second subplot (ax2).\n\nNote that this way of creating the subplots is useful when you want to compare two plots that share the same axis scales, as it ensures that the x and y axis will be consistent across the subplots, regardless of the data that is being plotted."}, {"cell_type": "code", "execution_count": null, "metadata": {"tags": [], "pycharm": {"name": "#%%\n"}}, "outputs": [], "source": "# Create a new figure with 2x1 subplots\nfig, (ax1, ax2) = plt.subplots(2, 1, sharex = True, sharey =True)\n\n# Generate data\nx = np.linspace(0, 2 * np.pi, 100)\ny1 = np.sin(x)\ny2 = np.cos(x)\n\n# Plot the data\nax1.plot(x, y1, label='sin(x)')\nax2.plot(x, y2, label='cos(x)')\n\n# Add labels and titles\nax1.set_title('Sine wave')\nax2.set_title('Cosine wave')\n\n# Add a legend\nax1.legend()\nax2.legend()\n\n# Show the figure\nplt.show()"}, {"cell_type": "markdown", "metadata": {"pycharm": {"name": "#%% md\n"}, "tags": []}, "source": "### (d)\nCreate a bar chart that displays the population of four different cities: New York, Los Angeles, Chicago, and Houston. Use the function *plt.bar()*. Check the documentation if you need information on how this function is used. You have access to the following data:\n\n**New York:** 8.4 million\n\n**Los Angeles:** 4.0 million\n\n**Chicago:** 2.7 million\n\n**Houston:** 2.3 million\n\nMake sure to add appropriate titles and labels to your figure."}, {"cell_type": "code", "execution_count": null, "metadata": {"pycharm": {"name": "#%%\n"}, "tags": []}, "outputs": [], "source": "# enter your code here\n"}, {"cell_type": "markdown", "metadata": {"pycharm": {"name": "#%% md\n"}, "tags": []}, "source": "### (e)\nCreate a plot that shows the number of registered cars in Denmark, Norway, and Sweden from 2000 to 2020. Be sure to include axis labels and a legend. Use the following simulated data:\n\n**denmark_cars** = [390000, 390000, 410000, 425000, 430000, 450000, 450000, 450000,\n 450000, 450000, 460000, 470000, 480000, 490000, 490000, 500000,\n 510000, 520000, 530000, 550000, 560000]\n \n**norway_cars** = [200000, 200000, 200000, 210000, 220000, 230000, 240000, 250000,\n 260000, 270000, 270000, 290000, 300000, 370000, 320000, 330000,\n 340000, 350000, 360000, 370000, 380000]\n \n**sweden_cars** = [300000, 310000, 310000, 300000, 300000, 350000, 360000, 370000,\n 380000, 390000, 400000, 410000, 420000, 410000, 440000, 450000,\n 460000, 470000, 440000, 490000, 500000]"}, {"cell_type": "code", "execution_count": null, "metadata": {"pycharm": {"name": "#%%\n"}, "tags": []}, "outputs": [], "source": "# enter your code here\n"}, {"cell_type": "markdown", "metadata": {"pycharm": {"name": "#%% md\n"}, "tags": []}, "source": "### (f) Advanced example\n\nThis problem is intended to show that much more advanced plots can be created with matplotlib, compared to the basic applications we have seen so far.\n\n**Example:** Create a 3D surface plot of the function $z = -(sin(x)\\cdot cos(y)\\cdot exp(|(1 - \\sqrt{(x^2 + y^2)}/\\pi)|))$ over the range $-5 \\leq x \\leq 5$ and $-5 \\leq y \\leq 5$. Use a color map to indicate the value of $z$ and include proper axis labels and a colorbar."}, {"cell_type": "code", "execution_count": null, "metadata": {"tags": [], "scrolled": true, "pycharm": {"name": "#%%\n"}}, "outputs": [], "source": "from mpl_toolkits.mplot3d import Axes3D # library to create a 3D plot\n\n# Data\nx = np.linspace(-5, 5, 100)\ny = np.linspace(-5, 5, 100)\nx, y = np.meshgrid(x, y)\nz = -(np.sin(x) * np.cos(y) * np.exp(np.abs(1 - np.sqrt(x**2 + y**2)/np.pi)))\n\n# Plotting\nfig = plt.figure()\nax = fig.add_subplot(111, projection = '3d')\nsurf = ax.plot_surface(x, y, z, cmap = 'coolwarm')\nax.set_xlabel('$x_1$')\nax.set_ylabel('$x_2$')\ncbar = fig.colorbar(surf, shrink = 0.5, aspect = 5)\nplt.show()"}, {"cell_type": "markdown", "metadata": {"pycharm": {"name": "#%% md\n"}, "tags": []}, "source": "In this graph, $x_1$ and $x_2$ could be the inputs to our machine learning model and $z$ (the vertical axis) could be the cost function we want to optimize. Later in the course, we will see exactly what the cost function is and we will also study the gradient descent method which will allow us to find a local minimum of a differentiable function."}, {"cell_type": "markdown", "source": "# 0.4 Conditional statements and for-loops\n\nThis section contains some examples on how to use if-statements, for-loops and function definitions in Python.", "metadata": {"collapsed": false, "pycharm": {"name": "#%% md\n"}, "tags": []}}, {"cell_type": "markdown", "source": "### (a)\n\nFor example, the maximum score of the SML exam if 50, and the limits for the grades 3, 4, and 5 are clearly specified. The following exam scores are listed for four students: 47, 33, 24 and 22. The following code assigns the correct grade to each student:", "metadata": {"collapsed": false, "pycharm": {"name": "#%% md\n"}, "tags": []}}, {"cell_type": "code", "execution_count": null, "outputs": [], "source": "def assign_grade(exam_score):\n grade = 0\n if exam_score >= 43:\n grade = 5\n elif exam_score >= 33:\n grade = 4\n elif exam_score >= 23:\n grade = 3\n else:\n grade = 'U' # Failed!\n return grade\n\n# We have 4 scores: 43, 33, 23, 22\nscore1 = 47\ngrade1 = assign_grade(score1)\nprint(f'Student 1: score {score1}, grade {grade1}')\n\nscore2 = 33\ngrade2 = assign_grade(score2)\nprint(f'Student 2: score {score2}, grade {grade2}')\n\nscore3 = 24\ngrade3 = assign_grade(score3)\nprint(f'Student 3: score {score3}, grade {grade3}')\n\nscore4 = 22\ngrade4 = assign_grade(score4)\nprint(f'Student 4: score {score4}, grade {grade4}')", "metadata": {"tags": [], "collapsed": false, "pycharm": {"name": "#%%\n"}}}, {"cell_type": "markdown", "source": "## (b)\n\nThe grades can also be assigned using a for-loop:", "metadata": {"collapsed": false, "pycharm": {"name": "#%% md\n"}, "tags": []}}, {"cell_type": "code", "execution_count": null, "outputs": [], "source": "# Alternative 1: Loop over list\nprint('Loop over list:')\nscores = [47, 33, 24, 22]\ncnt = 0\nfor score in scores:\n grade = assign_grade(score)\n print(f'Student {cnt+1}: score {score}, grade {grade}')\n cnt+=1\n\n# Alternative 2: Loop using range\nprint('Loop using range:')\nnum_students = len(scores)\nfor i in range(num_students):\n grade = assign_grade(scores[i])\n print(f'Student {i+1}: score {scores[i]}, grade {grade}')", "metadata": {"tags": [], "collapsed": false, "pycharm": {"name": "#%%\n"}}}, {"cell_type": "markdown", "source": "## (c)\nIf we want to save the converted grades to a list, there is a short syntax (called **list comprehension**) to do that with one line:", "metadata": {"collapsed": false, "pycharm": {"name": "#%% md\n"}, "tags": []}}, {"cell_type": "code", "execution_count": null, "outputs": [], "source": "grades = [assign_grade(score) for score in scores]\n\nprint('List of scores: ', scores)\nprint('List of grades: ', grades)", "metadata": {"tags": [], "collapsed": false, "pycharm": {"name": "#%%\n"}}}], "metadata": {"@webio": {"lastCommId": null, "lastKernelId": null}, "celltoolbar": "Tags", "kernelspec": {"display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3"}, "language_info": {"codemirror_mode": {"name": "ipython", "version": 3}, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.12"}}, "nbformat": 4, "nbformat_minor": 2} \ No newline at end of file +{"cells": [{"cell_type": "markdown", "metadata": {"pycharm": {"name": "#%% md\n"}, "tags": []}, "source": "# L0: Introduction to Python\nIn this exercise session, we will review the fundamentals of three important Python libraries: Numpy, Pandas and Matplotlib. Throughout the course, you will need to get familiar with several Python libraries that provide convenient functionality for machine learning purposes. It is good to get into the habit of using the available documentation to your advantage. Some efficient ways of doing this are described below:"}, {"cell_type": "markdown", "metadata": {"pycharm": {"name": "#%% md\n"}, "tags": []}, "source": "#### help() function\nIn Python, the *help()*-function can be used to display the documentation for a module, function, or object. When called with no arguments it opens an interactive help session. When called with a specific object as an argument, it displays the documentation for that object. For example, you can use *help(print)* to view the documentation for the built-in print function, or *help(str)* to view the documentation for the str class. Additionally, you can use the *dir()*-function to get all methods and properties of the object passed as an argument to it. It can be used to check all the attributes of a module or class. For example, *dir(str)* will give the methods and properties of the str class.\n\n\n#### SHIFT + TAB\n\nIn Jupyter notebook, *shift+tab* is a keyboard shortcut that can be used to access the documentation for the function or object that appears immediately before the cursor. When you press *shift+tab*, a small pop-up window will appear that contains information about the function or object, including its arguments and their types. Pressing *shift+tab* multiple times will cycle through different levels of documentation. If nothing is selected it will show the tip of the current cell. When running notebooks on Google Colab, you can trigger the documentation by clicking the function and then hovering over it with the cursor."}, {"cell_type": "markdown", "source": "Before getting started, we make sure that the libraries are properly imported in our current environment. Do this by running the cell below.", "metadata": {"collapsed": false, "pycharm": {"name": "#%% md\n"}, "tags": []}}, {"cell_type": "code", "execution_count": null, "outputs": [], "source": "import numpy as np\nimport pandas as pd\nimport matplotlib.pyplot as plt\n", "metadata": {"collapsed": false, "pycharm": {"name": "#%%\n"}, "tags": []}}, {"cell_type": "markdown", "metadata": {"pycharm": {"name": "#%% md\n"}, "tags": []}, "source": "## 0.1 Numpy Fundamentals\n### Basic data structures\n\n### (a)\n\nVectors, matrices and tensors can be represented as *numpy arrays*. Numpy arrays are often initialized from regular Python lists. For example, the Python list [1, 2, 3] can be converted into a 1D numpy array using the command np.array([1, 2, 3]). Create this numpy array in the cell below and print its shape. You can find the shape of a numpy array A using *np.shape(A)*.\n\nYou can create 1D arrays with n elements using for example np.zeros(n), np.ones(n), np.arange(a, b), np.random.rand(n), np.linspace(a, b, n)... Try this out and figure out what the different functions do.\n\n### (b)\n\nSimilarly, a 2D numpy array can be created from a nested Python list (a list of lists). Convert the nested lists [ [1, 2, 3] ] and [ [1], [2], [3] ] into numpy arrays and print their shapes. Which one represents a column vector, and which one represents a row vector?\n\n### (c)\n\nCreate a 2D numpy array to represent the matrix D and inspect its shape.\n$$\n\\textbf{D} = \\begin{bmatrix}\n 1 & 2 & 3 \\\\\n 4 & 5 & 6 \\\\\n 7 & 8 & 9 \\\\\n 10 & 11 & 12 \\\\\n\\end{bmatrix}\n$$\n\nWe can create higher dimensional *ndarrays*. in a similar way. You can create ndarrays of shape (n, m) using for example np.zeros((n, m)), np.ones((n, m), np.random.rand(n, m). You can also use np.eye(n) to create a diagonal matrix of shape (n,n). Try this out and make sure you understand what the different functions do.\n"}, {"cell_type": "code", "execution_count": null, "metadata": {"pycharm": {"name": "#%%\n"}, "tags": []}, "outputs": [], "source": "# enter your code here\n"}, {"cell_type": "markdown", "metadata": {"pycharm": {"name": "#%% md\n"}, "tags": []}, "source": "### Slicing and Indexing\n### (d)\n\nWe can access elements and slice numpy arrays easily. The cell below defines a 1D array F and a 2D array G. Try the commands: F[0], F[-1], F[:-2], G[2,3], G[:,2], G[-1,:3] and figure out what they mean. Make sure you understand how to index and slice numpy arrays of different dimensions.\n\n### (e)\nIt is also easy to assign new values to specific elements or entire rows and columns of numpy arrays. For example, F[0] = 5 replaces the first value of F with 5. Figure out how to replace the last column of G with the array F."}, {"cell_type": "code", "execution_count": null, "metadata": {"pycharm": {"name": "#%%\n"}, "tags": []}, "outputs": [], "source": "# (d)\nF = np.arange(4)\nG = np.array([[1, 4, 5, 6, 1], [2, 5, 6, 6, 1], [2, 3, 1, 1, 1], [8, 12, 14, 20, 1]])\n\n# enter your code here\n"}, {"cell_type": "markdown", "metadata": {"pycharm": {"name": "#%% md\n"}, "tags": []}, "source": "### Reshaping and Stacking\n### (f)\nIf we have access to a 1D array H, there are multiple ways of converting this array into a 2D array, i.e. adding a dimension to the numpy array. This can be done using the function *np.reshape(H, (n, m))*, where (n, m) is the desired shape of the array. Alternatively, we can write *H.reshape(n, m)*. Convert the array H given below into 2D arrays of different shapes (n,m) and inspect the result. What is the requirement on n and m? Can you use the reshape-function to convert H into a column vector and a row vector?\n\n### (g)\nIf we want to reshape a 1D array into a column- or row vector and we do not know the size of the array, we can use -1 in place of the unknown size, i.e. *np.reshape(array, -1, 1)*. Use this method to convert the array M into a column- and a row vector, and confirm the dimensions using np.shape().\n\nWe can also use *np.newaxis* to expand the dimension of an array M: M2 = M[*np.newaxis*, :]. Use this method to convert the array M into a column and a row vector as well. This can be important for example when a function requires a 2D array and you have access to your data in a 1D array.\n\n### (h)\nWe can also stack numpy arrays to create new arrays, using for example *np.vstack() and np.hstack()*. In the example below, F and G are stacked vertically and horizontally. Inspect the results to understand how to stack numpy arrays. Then, create a new array X and add a row and a column of ones using the appropriate functions. Let X be a 3x3 diagonal matrix with 4's along the diagonal. Confirm the shape of the resulting array. Remember that you can create a digaonal matrix with np.eye()."}, {"cell_type": "code", "execution_count": null, "metadata": {"pycharm": {"name": "#%%\n"}, "tags": []}, "outputs": [], "source": "# f)\nH = np.array([10, 20, 30, 40, 50, 60, 70, 80, 90, 100])\n\n# enter your code here\n# g)\nM = np.linspace(1, 17, 100)\n\n# enter your code here\n# h)\nF = np.zeros(4)\nG = np.array([[2, 2, 2, 2], [3, 3, 3, 3], [4, 4, 4, 4], [5, 5, 5, 5]])\nG_rowstack = np.vstack((G, F)) # adds a row of zeros to G at the bottom (vstack --> vertical additon)\nF = F.reshape(-1,1) # to add a column, we need to reshape F into the appropriate 2D array\nG_colstack = np.hstack((G, F)) # adds a column of zeros to G at the right (hstack --> horizontal additon)\nprint(f'Row extended G:\\n {G_rowstack}')\nprint(f'Column extended G:\\n {G_colstack}')\n\n# enter your code here\n"}, {"cell_type": "markdown", "metadata": {"pycharm": {"name": "#%% md\n"}, "tags": []}, "source": "### Aggregation and Linear Algebra\n### (i)\nThere is a sea of useful numpy functions that you may want to become familiar with. For example, you can find the minimum and maximum element of a numpy array z using *np.min(z)* and *np.max(z)*. If z is a matrix, you can find the minimum and maximum across rows and columns (i.e. across each *axis* of the 2D array) using *np.min(z, axis=0)* and *np.max(z, axis=1)*. You can also find the sum of an entire array, or the sum across columns or rows, using *np.sum()*. Find the minimum and maximum element of the matrix Z defined below, as well as the sum across the columns of the matrix.\n$$\n\\textbf{Z} = \\begin{bmatrix}\n 10 & 0 & 0 \\\\\n 1 & 11 & 1 \\\\\n 2 & 2 & 12 \\\\\n\\end{bmatrix}\n$$\n\n### (j)\nArithmetic operations on numpy arrays are straightforward. For example, you may add two arrays A and B of appropriate size simply through *A+B*, or *np.add(A, B)*. Many useful linear algebra operations are also available in numpy. For example, you can find the transpose of a matrix Z defined as a numpy array using *np.linalg.transpose(Z)* (or simply *Z.T*). You can find the inverse using *np.linalg.inv(Z)*. Matrix multiplication of two matrices A and B can be performed using *np.matmul(A, B)* (or simply A@B, where the @-operator implements np.matmul). Note that $A*B$ returns the elementwise multiplication of A and B.\n\nAnother useful function is the linear system solver. A linear system of the form $Z\\cdot x=b$ can be solved efficiently using np.linalg.solve($Z$, $b$). Solve the following linear system both using the matrix inverse and np.linalg.solve:\n\n$$\n\\begin{bmatrix}\n 10 & 0 & 0 \\\\\n 1 & 11 & 1 \\\\\n 2 & 2 & 12 \\\\\n\\end{bmatrix} x = \\begin{bmatrix}\n 2\\\\\n 1\\\\\n 10\\\\\n\\end{bmatrix}\n$$"}, {"cell_type": "code", "execution_count": null, "metadata": {"pycharm": {"name": "#%%\n"}, "tags": []}, "outputs": [], "source": "# enter your code here\n"}, {"cell_type": "markdown", "metadata": {"pycharm": {"name": "#%% md\n"}, "tags": []}, "source": "# 0.2 Pandas Fundamentals\n\nPandas dataframes can be used to store data tables, and contains functionality to analyze, explore and manipulate the data in these tables. Numpy arrays can be converted into dataframes, but in this course we will mostly load datasets from csv files to pandas dataframes.\n\n## (a)\n\nWe begin by exploring the auto-dataset. Run the cell below to load the dataset and store it in a pandas dataframe called 'Auto'. We also print the number of rows and columns in the dataset. The dataset contains information about a number of vehicles. The following features are observed:\n\n- `mpg`: miles per gallon\n- `cylinders`: Number of cylinders between 4 and 8\n- `displacement`: Engine displacement (cu. inches)\n- `horsepower`: Engine horsepower\n- `weight`: Vehicle weight (lbs.)\n- `acceleration`: Time to accelerate from 0 to 60 mph (sec.)\n- `year`: Model year (modulo 100)\n- `origin`: Origin of car (1. American, 2. European, 3. Japanese)\n- `name`: Vehicle name\n\nTo get an overview of the data, we can use the commands Auto.info(). Using Auto.describe() we get summaries of some important statistics for each column in the dataset. With Auto.head() we can take a look at the first five rows of the data. Use these functions and get an overview of the data. What information can we get from the dataset, and how many samples have we collected? Each entry in the dataframe is a sample (measurement point) that we can use to train our machine learning models."}, {"cell_type": "code", "execution_count": null, "metadata": {"pycharm": {"name": "#%%\n"}, "tags": []}, "outputs": [], "source": "url = 'https://github.com/uu-sml/course-sml-public/raw/master/data/auto.csv'\nAuto = pd.read_csv(url)\nprint(f'Auto.shape: {Auto.shape}')\n\n# enter your code here\n"}, {"cell_type": "markdown", "metadata": {"pycharm": {"name": "#%% md\n"}, "tags": []}, "source": "## (b)\n\nIf we only need a subset of the dataframe, we can create a new dataframe containing this subset. For example, we can create a dataframe X containing only the weight and acceleration features by running the cell below. Explore the new dataframe, and check the shape using X.shape."}, {"cell_type": "code", "execution_count": null, "metadata": {"pycharm": {"name": "#%%\n"}, "tags": []}, "outputs": [], "source": "X = Auto[[\"weight\", \"acceleration\"]]\n\n# enter your code here\n"}, {"cell_type": "markdown", "metadata": {"pycharm": {"name": "#%% md\n"}, "tags": []}, "source": "## (c)\n\nWe can also slice the dataframe using index. For example, we can pick out the last column of the dataframe by running the cell below. Explore the new dataframe X2 as you did in (b). Create a new dataframe containing multiple columns using index."}, {"cell_type": "code", "execution_count": null, "metadata": {"pycharm": {"name": "#%%\n"}, "tags": []}, "outputs": [], "source": "X2 = Auto.iloc[:, -1]\n\n# enter your code here\n"}, {"cell_type": "markdown", "metadata": {"pycharm": {"name": "#%% md\n"}, "tags": []}, "source": "## (d)\nIn the course, we will often divide a dataset randomly into a train and a test set. This means that we want a random subset of the entries to go in each sliced dataset. In the cell below, we use numpy's random number generator to generate a numpy array containing indices of 80% of the entries in the Auto dataset, chosen randomly. The total number of samples is N, and 80% then corresponds to n samples that should go in our train set. We use np.random.choice() to pick out n out of N random indices, which is returned in a numpy array. Then, we use Auto.index.isin() on this array. This function returns a boolean array with element False if the index in Auto is not found among the random indices, and True if it is there.\n\nInspect both the array of random indices, random_index, and the boolean array, train_samples. Finally, we create a boolean array for the test set, which is True in each element where the train boolean array is False, and vice versa. Make sure you understand what is happening in every line of the code."}, {"cell_type": "code", "execution_count": null, "metadata": {"tags": [], "pycharm": {"name": "#%%\n"}}, "outputs": [], "source": "N = Auto.shape[0] # total number of samples in the dataset\nn = round(0.8*N) # total number of samples in the train dataset\nrandom_index = np.random.choice(N, size = n, replace = False) # replace=False is needed so that the same index does not appear twice in the final list\ntrain_samples = Auto.index.isin(random_index) # boolean array containing True if the sample has been chosen or False otherwise\ntest_samples = ~train_samples # complementary boolean array"}, {"cell_type": "markdown", "metadata": {"pycharm": {"name": "#%% md\n"}, "tags": []}, "source": "## (e)\nCreate a new dataframe containing only the random indices generated. This can be done by passing the boolean array corresponding to the sliced datasets to Auto.iloc. Inspect the train and test sets. Are the shapes as you expect?"}, {"cell_type": "code", "execution_count": null, "metadata": {"pycharm": {"name": "#%%\n"}, "tags": []}, "outputs": [], "source": "# enter your code here\n"}, {"cell_type": "markdown", "metadata": {"pycharm": {"name": "#%% md\n"}, "tags": []}, "source": "# 0.3 Matplotlib Fundamentals\n\n## (a)\n\nMatplotlib is an extensive Python library for data visualization. Matplotlib is used along with numpy to provide an it is often used to create visualizations of data in machine learning, such as line plots, scatter plots, bar plots, histograms, and 3D plots. These visualizations can be useful for understanding the behavior of the data and the performance of machine learning models. Additionally, it is used also to visualize the performance of the model during the training process as well as its predictions with test data set.\n\nThere are tools for a variety of different kinds of plots. Check the documentation (**https://matplotlib.org/**) for information on design choices and different plotting options.\n\nIn the cell below you can find a simple example on how to plot numpy arrays."}, {"cell_type": "code", "execution_count": null, "metadata": {"tags": [], "pycharm": {"name": "#%%\n"}}, "outputs": [], "source": "dinosaur_fossils = np.array([5, 15, 34, 9, 122, 420, 850])\nyear = np.array([1820, 1860, 1900, 1940, 1980, 2000, 2020])\nplt.figure(1)\nplt.plot(year, dinosaur_fossils, 'g-*', label='Fossil Count by Year')\nplt.legend()\nplt.title('Dinosaur Fossils Found over Time')\nplt.xlabel('Year')\nplt.ylabel('Fossil Count')\n#plt.savefig('dinosaur_fossil.png') # you can use this command to save a figure to the main project folder\nplt.show()"}, {"cell_type": "markdown", "metadata": {"pycharm": {"name": "#%% md\n"}, "tags": []}, "source": "## (b)\n\nIn the cell below, we have returned to the Auto dataframe. *Auto.groupby('Year')* returns a Pandas DataFrameGroupBy-object containing information summarized by the Model Year. Auto.groupby('year').mean() takes the mean of each entry in the remaining feature columns, grouped by the model year. Inspect the resulting dataframe and plot the mean acceleration as a function of time. You can convert a dataframe A to a numpy array using *A.to_numpy()*"}, {"cell_type": "code", "execution_count": null, "metadata": {"pycharm": {"name": "#%%\n"}, "tags": []}, "outputs": [], "source": "year_data = Auto.groupby('year').mean()\n\n# enter your code here\n"}, {"cell_type": "markdown", "source": "## (c) Subplot examples\n\nIn matplotlib, a figure can contain multiple subplots, which are organized in a grid-like pattern. You can create a new figure and add subplots to it using the *plt.figure()* and *plt.subplot()* functions. The *figure()* function creates a new figure, and the *subplot()* function is used to add subplots to the figure.\n\nIn the following cell, there is an example that creates a figure with 2 rows and 2 columns of subplots, and then plots a sine wave in each subplot. Inspect the code and, if you wish, play around with the plt.subplots-command, for example by plotting data from the Auto-dataframe, if you want further practice.", "metadata": {"collapsed": false, "pycharm": {"name": "#%% md\n"}, "tags": []}}, {"cell_type": "code", "execution_count": null, "metadata": {"tags": [], "pycharm": {"name": "#%%\n"}}, "outputs": [], "source": "# Create a new figure with 2x2 subplots\nfig, axs = plt.subplots(2, 2)\n\n# Generate data\nx = np.linspace(0, 2 * np.pi, 100)\ny = np.sin(x)\n\n# Plot a sine wave in each subplot\naxs[0, 0].plot(x, y)\naxs[0, 1].plot(x, y)\naxs[1, 0].plot(x, y)\naxs[1, 1].plot(x, y)\n\n# Add labels and titles\naxs[0, 0].set_title('Sine wave 1')\naxs[0, 1].set_title('Sine wave 2')\naxs[1, 0].set_title('Sine wave 3')\naxs[1, 1].set_title('Sine wave 4')\n\n# Show the figure\nplt.subplots_adjust(wspace= 0.5, hspace= 0.5) # function that allows us to adjust the spacing between subplots in a figure\nplt.show()"}, {"cell_type": "markdown", "metadata": {"pycharm": {"name": "#%% md\n"}, "tags": []}, "source": "You can also use *plt.subplots(*nrows*, *ncols*, *sharex* = True, *sharey* = True)*, which creates a figure with *nrows* x *ncols* subplots in it, with sharing x and y axis. In the following example, we create a figure with 2 rows and 1 columns of subplots. The *sharex* and *sharey* arguments are set to True when creating the subplots with *plt.subplots()*. This means that the x-axis of the first subplot (ax1) will be shared with the x-axis of the second subplot (ax2), and the y-axis of the first subplot (ax1) will be shared with the y-axis of the second subplot (ax2).\n\nNote that this way of creating the subplots is useful when you want to compare two plots that share the same axis scales, as it ensures that the x and y axis will be consistent across the subplots, regardless of the data that is being plotted."}, {"cell_type": "code", "execution_count": null, "metadata": {"tags": [], "pycharm": {"name": "#%%\n"}}, "outputs": [], "source": "# Create a new figure with 2x1 subplots\nfig, (ax1, ax2) = plt.subplots(2, 1, sharex = True, sharey =True)\n\n# Generate data\nx = np.linspace(0, 2 * np.pi, 100)\ny1 = np.sin(x)\ny2 = np.cos(x)\n\n# Plot the data\nax1.plot(x, y1, label='sin(x)')\nax2.plot(x, y2, label='cos(x)')\n\n# Add labels and titles\nax1.set_title('Sine wave')\nax2.set_title('Cosine wave')\n\n# Add a legend\nax1.legend()\nax2.legend()\n\n# Show the figure\nplt.show()"}, {"cell_type": "markdown", "metadata": {"pycharm": {"name": "#%% md\n"}, "tags": []}, "source": "### (d)\nCreate a bar chart that displays the population of four different cities: New York, Los Angeles, Chicago, and Houston. Use the function *plt.bar()*. Check the documentation if you need information on how this function is used. You have access to the following data:\n\n**New York:** 8.4 million\n\n**Los Angeles:** 4.0 million\n\n**Chicago:** 2.7 million\n\n**Houston:** 2.3 million\n\nMake sure to add appropriate titles and labels to your figure."}, {"cell_type": "code", "execution_count": null, "metadata": {"pycharm": {"name": "#%%\n"}, "tags": []}, "outputs": [], "source": "# enter your code here\n"}, {"cell_type": "markdown", "metadata": {"pycharm": {"name": "#%% md\n"}, "tags": []}, "source": "### (e)\nCreate a plot that shows the number of registered cars in Denmark, Norway, and Sweden from 2000 to 2020. Be sure to include axis labels and a legend. Use the following simulated data:\n\n**denmark_cars** = [390000, 390000, 410000, 425000, 430000, 450000, 450000, 450000,\n 450000, 450000, 460000, 470000, 480000, 490000, 490000, 500000,\n 510000, 520000, 530000, 550000, 560000]\n \n**norway_cars** = [200000, 200000, 200000, 210000, 220000, 230000, 240000, 250000,\n 260000, 270000, 270000, 290000, 300000, 370000, 320000, 330000,\n 340000, 350000, 360000, 370000, 380000]\n \n**sweden_cars** = [300000, 310000, 310000, 300000, 300000, 350000, 360000, 370000,\n 380000, 390000, 400000, 410000, 420000, 410000, 440000, 450000,\n 460000, 470000, 440000, 490000, 500000]"}, {"cell_type": "code", "execution_count": null, "metadata": {"pycharm": {"name": "#%%\n"}, "tags": []}, "outputs": [], "source": "# enter your code here\n"}, {"cell_type": "markdown", "metadata": {"pycharm": {"name": "#%% md\n"}, "tags": []}, "source": "### (f) Advanced example\n\nThis problem is intended to show that much more advanced plots can be created with matplotlib, compared to the basic applications we have seen so far.\n\n**Example:** Create a 3D surface plot of the function $z = -(sin(x)\\cdot cos(y)\\cdot exp(|(1 - \\sqrt{(x^2 + y^2)}/\\pi)|))$ over the range $-5 \\leq x \\leq 5$ and $-5 \\leq y \\leq 5$. Use a color map to indicate the value of $z$ and include proper axis labels and a colorbar."}, {"cell_type": "code", "execution_count": null, "metadata": {"tags": [], "scrolled": true, "pycharm": {"name": "#%%\n"}}, "outputs": [], "source": "from mpl_toolkits.mplot3d import Axes3D # library to create a 3D plot\n\n# Data\nx = np.linspace(-5, 5, 100)\ny = np.linspace(-5, 5, 100)\nx, y = np.meshgrid(x, y)\nz = -(np.sin(x) * np.cos(y) * np.exp(np.abs(1 - np.sqrt(x**2 + y**2)/np.pi)))\n\n# Plotting\nfig = plt.figure()\nax = fig.add_subplot(111, projection = '3d')\nsurf = ax.plot_surface(x, y, z, cmap = 'coolwarm')\nax.set_xlabel('$x_1$')\nax.set_ylabel('$x_2$')\ncbar = fig.colorbar(surf, shrink = 0.5, aspect = 5)\nplt.show()"}, {"cell_type": "markdown", "metadata": {"pycharm": {"name": "#%% md\n"}, "tags": []}, "source": "In this graph, $x_1$ and $x_2$ could be the parameters of our machine learning model and $z$ (the vertical axis) could be the cost function we want to optimize. Later in the course, we will see exactly what the cost function is and we will also study the gradient descent method which will allow us to find a local minimum of a differentiable function."}, {"cell_type": "markdown", "source": "# 0.4 Conditional statements and for-loops\n\nThis section contains some examples on how to use if-statements, for-loops and function definitions in Python.", "metadata": {"collapsed": false, "pycharm": {"name": "#%% md\n"}, "tags": []}}, {"cell_type": "markdown", "source": "### (a)\n\nFor example, the maximum score of the SML exam if 50, and the limits for the grades 3, 4, and 5 are clearly specified. The following exam scores are listed for four students: 47, 33, 24 and 22. The following code assigns the correct grade to each student:", "metadata": {"collapsed": false, "pycharm": {"name": "#%% md\n"}, "tags": []}}, {"cell_type": "code", "execution_count": null, "outputs": [], "source": "def assign_grade(exam_score):\n grade = 0\n if exam_score >= 43:\n grade = 5\n elif exam_score >= 33:\n grade = 4\n elif exam_score >= 23:\n grade = 3\n else:\n grade = 'U' # Failed!\n return grade\n\n# We have 4 scores: 43, 33, 23, 22\nscore1 = 47\ngrade1 = assign_grade(score1)\nprint(f'Student 1: score {score1}, grade {grade1}')\n\nscore2 = 33\ngrade2 = assign_grade(score2)\nprint(f'Student 2: score {score2}, grade {grade2}')\n\nscore3 = 24\ngrade3 = assign_grade(score3)\nprint(f'Student 3: score {score3}, grade {grade3}')\n\nscore4 = 22\ngrade4 = assign_grade(score4)\nprint(f'Student 4: score {score4}, grade {grade4}')", "metadata": {"tags": [], "collapsed": false, "pycharm": {"name": "#%%\n"}}}, {"cell_type": "markdown", "source": "## (b)\n\nThe grades can also be assigned using a for-loop:", "metadata": {"collapsed": false, "pycharm": {"name": "#%% md\n"}, "tags": []}}, {"cell_type": "code", "execution_count": null, "outputs": [], "source": "# Alternative 1: Loop over list\nprint('Loop over list:')\nscores = [47, 33, 24, 22]\ncnt = 0\nfor score in scores:\n grade = assign_grade(score)\n print(f'Student {cnt+1}: score {score}, grade {grade}')\n cnt+=1\n\n# Alternative 2: Loop using range\nprint('Loop using range:')\nnum_students = len(scores)\nfor i in range(num_students):\n grade = assign_grade(scores[i])\n print(f'Student {i+1}: score {scores[i]}, grade {grade}')", "metadata": {"tags": [], "collapsed": false, "pycharm": {"name": "#%%\n"}}}, {"cell_type": "markdown", "source": "## (c)\nIf we want to save the converted grades to a list, there is a short syntax (called **list comprehension**) to do that with one line:", "metadata": {"collapsed": false, "pycharm": {"name": "#%% md\n"}, "tags": []}}, {"cell_type": "code", "execution_count": null, "outputs": [], "source": "grades = [assign_grade(score) for score in scores]\n\nprint('List of scores: ', scores)\nprint('List of grades: ', grades)", "metadata": {"tags": [], "collapsed": false, "pycharm": {"name": "#%%\n"}}}], "metadata": {"@webio": {"lastCommId": null, "lastKernelId": null}, "celltoolbar": "Tags", "kernelspec": {"display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3"}, "language_info": {"codemirror_mode": {"name": "ipython", "version": 3}, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.12"}}, "nbformat": 4, "nbformat_minor": 2} \ No newline at end of file diff --git a/exercises/solutions/SML-session_0.html b/exercises/solutions/SML-session_0.html index 9e22551..a01c861 100644 --- a/exercises/solutions/SML-session_0.html +++ b/exercises/solutions/SML-session_0.html @@ -13107,7 +13107,7 @@

SHIFT + TAB
-
In [2]:
+
In [3]:
import numpy as np
@@ -14081,7 +14081,7 @@ 

(e)

Create a new da

0.3 Matplotlib Fundamentals

(a)

Matplotlib is an extensive Python library for data visualization. Matplotlib is used along with numpy to provide an it is often used to create visualizations of data in machine learning, such as line plots, scatter plots, bar plots, histograms, and 3D plots. These visualizations can be useful for understanding the behavior of the data and the performance of machine learning models. Additionally, it is used also to visualize the performance of the model during the training process as well as its predictions with test data set.

There are tools for a variety of different kinds of plots. Check the documentation (https://matplotlib.org/) for information on design choices and different plotting options.

-

In the cell below you can find a simple example on how to plot 2D numpy arrays.

+

In the cell below you can find a simple example on how to plot numpy arrays.

@@ -14417,12 +14417,12 @@

(e)

Create a plot t

-
In [22]:
+
In [12]:
# enter your code here
 # Data
-years = list(range(2000, 2021))
+years = np.arange(2000, 2021)
 denmark_cars = [390000, 390000, 410000, 425000, 430000, 450000, 450000, 450000,
                 450000, 450000, 460000, 470000, 480000, 490000, 490000, 500000,
                 510000, 520000, 530000, 550000, 560000]
@@ -14437,6 +14437,7 @@ 

(e)

Create a plot t plt.plot(years, denmark_cars, label='Denmark') plt.plot(years, norway_cars, label='Norway') plt.plot(years, sweden_cars, label='Sweden') +plt.xticks(range(years[0],years[-1]+1,5)) plt.xlabel('Year') plt.ylabel('Number of registered cars') plt.legend() @@ -14459,7 +14460,7 @@

(e)

Create a plot t

-
@@ -14530,7 +14531,7 @@

(f) Advanced example
-

In this graph, $x_1$ and $x_2$ could be the inputs to our machine learning model and $z$ (the vertical axis) could be the cost function we want to optimize. Later in the course, we will see exactly what the cost function is and we will also study the gradient descent method which will allow us to find a local minimum of a differentiable function.

+

In this graph, $x_1$ and $x_2$ could be the parameters of our machine learning model and $z$ (the vertical axis) could be the cost function we want to optimize. Later in the course, we will see exactly what the cost function is and we will also study the gradient descent method which will allow us to find a local minimum of a differentiable function.