Skip to content

Latest commit

 

History

History
54 lines (40 loc) · 6.96 KB

week1.1_linear_regression.md

File metadata and controls

54 lines (40 loc) · 6.96 KB

Information

Machine Learning

  • Definition of Machine Learning:
    Finding a model and its parameters so that the resulting predictor performs well on unseen data

  • Probabilistic interpretation

    • Estimation:

Linear Regression

  • Cost function: cost function             (1)
    where theta are parameters, x are training examples, and y are targets.

  • Section 1: LMS algorithm

    • Gradeint Descent: gradient descent. This becomes gradient descent2. This is called batch gradient descent becuase you are using an entire training set.
      On the other hand, if you update the following way, you're using stochastic gradient descent. But you have to update parameters at the same time, i.e., you can't update the first element of parameters before updating a second parameter.
        Loop{  
              for i = 1 to n,  
                  {

                                   sto gd    (for every j)

                  }//for  
            }//loop  
  • Section 2: The normal equations
    Using matrix, you can also transform (1) into
    Then you can take gradient with respect to and get that minimizes the cost function.

  • Section 3: Probabilistic Interpretation
    When approaching regression problem, why bother using specifically the least square function J?

    Let's redefine the relation between the inputs and target varaibles as the following: , where is error term that represents, e.g. random noise. We assume the error follows Normal distrubution (or Gaussian distribution).
    Then we can that              (2)

    Interpreting (2) as a function of , we can instead call it the likelihood function:

    According to maximum likelihoold, we should choose that makes the data as high probability as possible. For the convenience of calculation, we use log likelihood as the following:

    To make it the maximum, we need to minimize