The summary and the relevant blog post for this project can be found here
This notebook aims to classify 7 different types of trees and give some clues about where to find them. I built an extra random forest classifier to detect fantastic trees in he Roosevelt National Forest of northern Colorado. I was able to classify the test set consisting 500.000 rows with 78% acuracy, placing this kernel among 28% among all competitors.
The notebook will follow the workflow suggested by Will Koehrsen in this article.
-
Undserstand, Clean and Format Data
-
Exploratory Data Analysis
-
Feature Engineering & Selection
-
Compare Several Machine Learning Models
-
Perform Hyperparameter Tuning on the Best Model
-
Evaluate the Best Model with Test Data
-
Interpret Model Results
-
Summary & Conclusions
Original kaggle kernel is here.
It is one big notebook, for the summary and results you can move directly to the 8. Summary & Conclusions but I cannot gurantee that you are not going to miss some beautiful visualizations and interesting insights about data science and machine learning. Enjoy Reading!