Comparison of Model Selection Using PCR, PLSR, Best subsets, Ridge Regression and Lasso Regression on cystfibr dataset
The performances of five model selection methods, Principal Component regression (PCR), Partial Least Squares regression (PLSR), Best subsets, Ridge regression and Lasso regression, have been compared using the ‘cystfibr’ dataset from the ‘ISwR’ library. A Monte Carlo cross validation with sampling size of 100 is used to determine the optimal model that regressed maximum expiratory pressure (pemax) on 9 predictors. First, a detailed analysis was performed for PCR and PLSR. Then, a comparison of the five model selection methods was performed in terms of Test MSE and predictor selection. The PCR, PLSR and Best subsets selection method had the lowest Test MSE. A spectral analysis was used to determine the predictors that had the postive and negative contributions on ‘pemax’.
GitHub_Proj6.pdf: Project report in PDF
GitHub_Proj6.R: R script
You can view the Project Report in HTML by clicking here.