General Workshop Improvements #33

stemlock · 2022-03-03T03:00:03Z

Replace Iris dataset
There is no baseline model for classification (decision tree?). What about logistic regression?
Feel like some sections lack explanations (e.g., feature importances, comparing different algorithms, no ROC curves?)
Other types of hyperparameter tuning (RandomSearch, Bayes Search)
XGBoost is generally considered the gold standard for shallow learning models. Replace AdaBoost?
Code could be cleaned up in general/more comments
Regression section would be a great place to introduce general modeling pipelines (data cleaning, feature transformation, feature engineering (maybe not applicable here), model training, hyperparameter tuning/cross-validation, model evaluation)
No need for a separate dummyencoder class -> this can be handled using onehotencoder or even Pandas get_dummies
If we are going to use a transformer + pipelines, we should think about adding the model object to the pipeline as well. In general, this is a better practice as you can then save off entire model pipelines vs just feature transformation pipelines.
I typically see KNN used for more naive classification vs regression. Not sure if it is necessary to include
We don't talk about Naive Bayes in classification. I feel this is a canonical algorithm that could be introduced
No mention of any dimensionality reduction/latent variable techniques for clustering seems like a gap

Provide feedback