Our aim was to utilize unsupervised machine learning to determine if wildfire burn acreage in California can be accurately predicted by the amount of precipitation or the severity of drought. We collected data from CALFIRE and National Oceanic and Atmospheric Administration (NOAA) that included historic datapoints for wildfires, drought conditions, and precipitation by county in California.
Main application | app.py
Cleaning Notebooks | Three Notebooks
Machine Learning Notebook | ML Notebook
Joseph Chancey: Git/Github management, Python3 Streamlit back/front end, Jupyter Conversion, Refactoring.
Breanna Sewell: Data collection, Data transformation, Data encoding, Data cleaning.
David Koski: Statistical analysis, Data visualization, Machine learning, Model fitting.
Streamlit, Matplotlib, SciKitLearn, Pandas, HTML, & numpy.
Breanna kickstarted this project by collecting and cleaning the data we had all agreed would best suit our desired implementation. From there, David took that cleaned data and fit it to machine learning models (linear regression, lasso, random forest) to see how our data interacted with these models. After that, Joseph collected the work from these notebooks and converted them into the app.py file and implemented Streamlit to display what was going on throughout the process. Once this was all done, the three of us assessed the results and constructed a presentation to reflect the efficacy of our models.
If you would like to run this application locally, you will need to clone the repository and ensure each item within the requirements.txt is installed on your machine. Then, double check to ensure streamlit is installed on your machine pip install streamlit
. Path into the project folder and use the command streamlit run app.py
to launch the application.
You could also visit the live build of this project here
From all of this, we can conclude that the Linear Regression model and Lasso model are susceptible to overfitting with our data. The Random Forest classification model was wildly innacurate, but did not suffer from overfitting.