More things
- Try to exclude least useful features
Use scikit-learn in project of last week
- Re-implement train/val/test split using scikit-learn in the project from the last week
- Also, instead of our own linear regression, use
LinearRegression
(not regularized) andRidgeRegression
(regularized). Find the best regularization parameter for Ridge - There are other ways to implement one-hot encoding. E.g. using the
OneHotEncoding
class. Check how to use it here. - Sometimes numerical features require scaling, especially for iterative solves like "lbfgs". Check how to use
StandardScaler
for that here.
Other projects
- Lead scoring - https://www.kaggle.com/ashydv/leads-dataset
- Default prediction - https://archive.ics.uci.edu/ml/datasets/default+of+credit+card+clients