- Bias Variance Decomposition
- Assumptions and intuition
- when do the assumptions not important? (when do analyze rather than prediction)
- indeed a projection from y to **X **vector space
- diagnostics and treatments
- fitted and residual plot
- heteroskedasticity
- co-linearity
- regularization via Lasso
- PCA (matrix transformation)
- when do the assumptions not important? (when do analyze rather than prediction)
- MLE and OLS
- compare difference
- proof MLE Estimate <=> RSS Min Estimate
- use p.d.f of
$$\epsilon$$ on beta
- Derive
- xy vs yx regression (notice whose variance get "contributed averaged")
- Single variable max MLE - min RSS =
$$\sum (y_i - (\beta_1 x_i + \beta_0))^2$$ - Muti-variable
$$\mathbf{w^*} = (\mathbf{X^T X})^{-1} \mathbf{X^T y} $$
- Loss Function
- proof: min Loss is solution for linear problem (MLE/OLS)
- Blue
- Other loss functions
- Absolute Loss
- Huber Loss Function
- Regularization
- Ridge, Lasso (Scarcity)
-
Intuition - Why do we have it
- different contribution of large/small data
- exponential family and penalize
- assumption - Bernoulli distribution
- odds, log odds
- different contribution of large/small data
-
Loss function derive parameter estimation
- MLE - y is bernoulli function (entropy loss)
-
implement Gradient Descent
- entropy loss (Bernoulli MLE Loss)
- Mean Square Loss
- Confusion Matrix and Model Evaluation
- Precision, Recall
- Accuracy
- ROC Curve
- how to do multi-class calssification
- multiple binary classficiation
- softmax
- Algo
- ID3
- ID4.5
- CART
- Hyperparamters
- max__depth, min__samples__split, min_samples_leaf, max_leaf_nodes_
- Hyperparamters
- Loss
- entropy
- Gini Index
- Advantages
- interaction handling
- insensitivity to outliers
- Random Forest
- bagging
- bagging of data
- bagging of feature
- rule of sum - select sqrt(k) features
- feature importance calculation
- out-of-bag performance
- bagging
- Advantage and Disadvantage
- natural parallel (embarrassed parallel algo)
- feature importance
- Hinge Loss
- Kernel Trick
- No Training stage
- higher K is, more robust the model can be