Skip to content

Latest commit

 

History

History
250 lines (206 loc) · 6.49 KB

material.md

File metadata and controls

250 lines (206 loc) · 6.49 KB

Deep Learning

Goal

  • Practial, not theoretical
  • Use R
  • Introduce assignment in first week
  • Should provide good basics
  • Should be focused on time-series predictions
  • Should be in Dutch
  • Should be fairly easy

Structure

Below is a proposal of course structure and notebook content plan.

Each lecture

  • Overview & recap last week, question
  • ...
  • Summary -> Hand-over to notebooks
  • Notebooks practice what was covered
  • Homework

1. Introduction

  • Course organisation
  • Introduce the field of DL.
  • Refresher on R and tensors.
  • Neural Networks introduction.
  • The XOR problem.

Slides 1

  • What we will do

  • DL

    • What is it?
    • Why is it a thing now?
    • Cool examples
  • ML overview

    • Supervised, semi-supervised, unsupervised & reinforcement
    • Classification, multilable classification, regression
  • Software we will be using

    • Keras + R
    • Eager vs lazy evaluations
    • Tensorboard

Notebook 1

Soft refresher on R and tensors.

Slides 2

  • Easy intro to NN
    • Nodes, layers and edges
    • In/hidden/out
    • Activations (sigmoid only)
    • Parameters

Notebook 2

Implement and experience XOR problem.

Slides 3

  • Recap/Summary
  • Homework?

2. ML & DL fundamentals

  • Define the learning optimisation task.
  • Solving the XOR problem using Keras.
  • Model evaluations and capacity.
  • Tuning a model.

Slides 1

  • Recap of basic NN components
  • Loss functions
  • Gradient descent
  • Backprop
  • Learning rate (hyperparameter).
  • Gradient descent
  • Stochastic gradient descent
  • Pre-processing/formatting data
    • Cleaning (outliers, wrong values)
    • Real-valued
    • scaling

Notebook 1

  • Solving the XOR problem.
  • Training & Test split.
  • Epochs
  • Hint of overfitting
  • Notebook: showing underfitting and overfitting with a fancier XOR problem (point clouds, outliers) - show with sufficient #neurons/#layers we can model everything Bonus/at home: MNIST? - may be difficult because they haven't had multiclass classification yet

Slides 2

  • Performance evaluation, metrics
  • train/test/val
  • Model complexity
  • Over- and underfitting
  • Hyperparameter tuning
  • Optimizers, briefly, Momentum, lr per weights

Notebook 2

  • Fancy XOR problem
  • Show under and overfitting using different complexity models.
  • Hyperparameter tuning (LR).

Slides 3

Recap/Homework.

Notes

  • Training process: exploration & preprocessing, then training followed by evaluation
  • Most popular optimizers: how do they work?
  • Properly-tuned SGD outperforms all other optimizers (papers!)
  • Effect of network complexity on convergence
  • Effect of learning rate and momentum on convergence
  • How to choose the 'best' model by the validation/training loss curves
  • Mention cross-validation
  • Which metric to choose? - RMSE vs MSE, MAE, etc. Why one for loss, why the other as a metric?
  • How to interpret the network from the notebook?
  • What does convergence mean exactly: for higher learning rates we see a steep decline in loss from e.g. 2500 to 61 at its lowest, which is convergence, but not to a great result.
  • Effect of batch size and compute power on compute performance and quality of network results.
  • Emphasise complex interaction between optimizer (hyperparameters), batch size and network complexity.
  • Rules of thumb when choosing an optimizer and its hyperparameters
  • Overfitting and the capacity of a network: NNs of sufficient complexity will memorize your data set (batch shuffling)

3. Classification

First part:

  • Slides: controlling overfitting: dropout & regularization, maybe other stuff as well?
  • Notebook: Second part:
  • Slides: other optimizers, vanishing gradient (sigmoids), the role of dense layers
  • Notebooks: Boston housing data set (regression), compared with ordinary LM, trying different optimizers, deep networks with sigmoids

Slides 1

  • One-hot encoding
  • logits
  • Softmax as generalization of logistic regression
  • Cross entropy
  • Pre-processing/formatting data
    • Categorical (one-hot encoding)
    • Missing values?
    • Binning
  • Class weights

Notebook 1

MNIST - multiclass classification

Slides 2

  • Explain vanishing
  • Introduce ReLU
  • Introduce drop-out

Notebook 2 - implement ReLU, dropout, more if they're up for it

show effects on convergence with dropout and reg.

4. RNNs, time-series predictions

  • Input can be a sequence, output can be a sequence, notation
  • RNN, share weights over time. We use the same parameters for every element in the sequence.
  • Backprop through time.
  • Different architectures
    • one-to-one
    • many-to-one
    • one-to-many
    • many-to-many (two types)
  • Using backprop this deep is hard. Many updates to the same weight for different inputs -> SGD explodes or it's very small.
    • Gradient clipping to stop exploding
    • Vanishing

5.

  • GRUs addressing vanishing gradient
    • Simpler, more recent (2014) than LSTM
    • Scales better
    • memory cell
    • update gate
    • relevance gate
  • LSTMs
    • Old (1997), more complex than GRU
    • Often the go-to model
    • memory cell
    • hidden cell
    • update gate
    • forget gate
    • output gate
  • Bidirectional RNNs (BRNNs)

Time-series prediction notebook

  • From data to model

5. Improving NNs

  • Data augmentation
  • Vanishing gradient, ReLU helps
  • Exploding gradient, batch normalization helps
  • Dead ReLU, lower learning rate
  • Apply dropout
  • Initializations of weights (gaussian)
  • Regularization (structural risk minimization)
    • L2 - lambda hyperparameter

Improve time-series notebook

  • try using sigmoid
  • apply dropout
  • bn
  • Transfer learning

6. CNNs?

Convs

  • Convs
    • Sharing weights
    • padding
  • Pooling
  • Visualizing parameters
  • Visualizing areas of interest
  • Architectures

CNN notebook

7. Review & Summary

  • Reinforcement learning?

Other material

Refreshers/Cheatsheets

https://seeing-theory.brown.edu/#firstPage https://stanford.edu/~shervine/teaching/cs-229.html

VU (BA) machine learning course

Held by Peter Bloem.

UvA (MA) Deep Learning course

Very mathematical

Google ML crash course

Focuses on ML and is very practical

Coursera DL spec.

A great deep dive into DL. https://www.coursera.org/specializations/deep-learning https://karpathy.github.io/2015/05/21/rnn-effectiveness/

Udacity DL by Google

A brief introduction to DL https://eu.udacity.com/course/deep-learning--ud730

fast.ai

https://www.fast.ai/