Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GitHub Classroom: Sync Assignment #1

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
178 changes: 178 additions & 0 deletions notebooks/redlining-41-zonal-stats.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,178 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# STEP 6: Calculate zonal statistics\n",
"\n",
"In order to evaluate the connection between vegetation health and\n",
"redlining, we need to summarize NDVI across the same geographic areas as\n",
"we have redlining information.\n",
"\n",
"First, import variables from previous notebooks:"
],
"id": "c6d4416c-848e-45b9-b5f0-b22939521c17"
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"store -r denver_redlining_gdf denver_ndvi_da"
],
"id": "8f1f6fdf"
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<link rel=\"stylesheet\" type=\"text/css\" href=\"./assets/styles.css\"><div class=\"callout callout-style-default callout-titled callout-task\"><div class=\"callout-header\"><div class=\"callout-icon-container\"><i class=\"callout-icon\"></i></div><div class=\"callout-title-container flex-fill\">Try It: Import packages</div></div><div class=\"callout-body-container callout-body\"><p>Some packages are included that will help you calculate statistics\n",
"for areas imported below. Add packages for:</p>\n",
"<ol type=\"1\">\n",
"<li>Interactive plotting of tabular and vector data</li>\n",
"<li>Working with categorical data in <code>DataFrame</code>s</li>\n",
"</ol></div></div>"
],
"id": "0892b3e0-9c80-4c96-bfd4-be5f96b47fb4"
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"highlight": true
},
"outputs": [],
"source": [
"# Interactive plots with pandas\n",
"# Ordered categorical data\n",
"import regionmask # Convert shapefile to mask\n",
"from xrspatial import zonal_stats # Calculate zonal statistics"
],
"id": "1962e9a6"
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<link rel=\"stylesheet\" type=\"text/css\" href=\"./assets/styles.css\"><div class=\"callout callout-style-default callout-titled callout-task\"><div class=\"callout-header\"><div class=\"callout-icon-container\"><i class=\"callout-icon\"></i></div><div class=\"callout-title-container flex-fill\">Try It: Convert vector to raster</div></div><div class=\"callout-body-container callout-body\"><p>You can convert your vector data to a raster mask using the\n",
"<code>regionmask</code> package. You will need to give\n",
"<code>regionmask</code> the geographic coordinates of the grid you are\n",
"using for this to work:</p>\n",
"<ol type=\"1\">\n",
"<li>Replace <code>gdf</code> with your redlining\n",
"<code>GeoDataFrame</code>.</li>\n",
"<li>Add code to put your <code>GeoDataFrame</code> in the same CRS as\n",
"your raster data.</li>\n",
"<li>Replace <code>x_coord</code> and <code>y_coord</code> with the x and\n",
"y coordinates from your raster data.</li>\n",
"</ol></div></div>"
],
"id": "3a9e01b4-ccea-4b3f-9e7f-785db5177aed"
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"highlight": true
},
"outputs": [],
"source": [
"denver_redlining_mask = regionmask.mask_geopandas(\n",
" gdf,\n",
" x_coord, y_coord,\n",
" # The regions do not overlap\n",
" overlap=False,\n",
" # We're not using geographic coordinates\n",
" wrap_lon=False\n",
")"
],
"id": "d71e0470"
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<link rel=\"stylesheet\" type=\"text/css\" href=\"./assets/styles.css\"><div class=\"callout callout-style-default callout-titled callout-task\"><div class=\"callout-header\"><div class=\"callout-icon-container\"><i class=\"callout-icon\"></i></div><div class=\"callout-title-container flex-fill\">Try It: Calculate zonal statistics</div></div><div class=\"callout-body-container callout-body\"><p>Calculate zonal status using the <code>zonal_stats()</code> function.\n",
"To figure out which arguments it needs, use either the\n",
"<code>help()</code> function in Python, or search the internet.</p></div></div>"
],
"id": "e6f5fd47-953e-4a0b-8a06-79b9bc7203b2"
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"highlight": true
},
"outputs": [],
"source": [
"# Calculate NDVI stats for each redlining zone"
],
"id": "e3b82bef"
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<link rel=\"stylesheet\" type=\"text/css\" href=\"./assets/styles.css\"><div class=\"callout callout-style-default callout-titled callout-task\"><div class=\"callout-header\"><div class=\"callout-icon-container\"><i class=\"callout-icon\"></i></div><div class=\"callout-title-container flex-fill\">Try It: Plot regional statistics</div></div><div class=\"callout-body-container callout-body\"><p>Plot the regional statistics:</p>\n",
"<ol type=\"1\">\n",
"<li>Merge the NDVI values into the redlining\n",
"<code>GeoDataFrame</code>.</li>\n",
"<li>Use the code template below to convert the <code>grade</code> column\n",
"(<code>str</code> or <code>object</code> type) to an ordered\n",
"<code>pd.Categorical</code> type. This will let you use ordered color\n",
"maps with the grade data!</li>\n",
"<li>Drop all <code>NA</code> grade values.</li>\n",
"<li>Plot the NDVI and the redlining grade next to each other in linked\n",
"subplots.</li>\n",
"</ol></div></div>"
],
"id": "d14c6bd6-b3d4-4bd3-b880-190929833e3e"
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {
"highlight": true
},
"outputs": [],
"source": [
"# Merge the NDVI stats with redlining geometry into one `GeoDataFrame`\n",
"\n",
"# Change grade to ordered Categorical for plotting\n",
"gdf.grade = pd.Categorical(\n",
" gdf.grade,\n",
" ordered=True,\n",
" categories=['A', 'B', 'C', 'D']\n",
")\n",
"\n",
"# Drop rows with NA grades\n",
"denver_ndvi_gdf = denver_ndvi_gdf.dropna()\n",
"\n",
"# Plot NDVI and redlining grade in linked subplots"
],
"id": "7a0a1bc9"
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [],
"source": [
"store denver_ndvi_gdf"
],
"id": "ed98342c"
}
],
"nbformat": 4,
"nbformat_minor": 5,
"metadata": {
"kernelspec": {
"name": "python3",
"display_name": "Python 3 (ipykernel)",
"language": "python"
}
}
}
209 changes: 209 additions & 0 deletions notebooks/redlining-42-tree-model.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,209 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# STEP 7: Fit a model\n",
"\n",
"One way to determine if redlining is related to NDVI is to see if we can\n",
"correctly predict the redlining grade from the mean NDVI value. With 4\n",
"categories, we’d expect to be right only about 25% of the time if we\n",
"guessed the redlining grade at random. Any accuracy greater than 25%\n",
"indicates that there is a relationship between vegetation health and\n",
"redlining.\n",
"\n",
"To start out, we’ll fit a Decision Tree Classifier to the data. A\n",
"decision tree is good at splitting data up into squares by setting\n",
"thresholds. That makes sense for us here, because we’re looking for\n",
"thresholds in the mean NDVI that indicate a particular redlining grade.\n",
"\n",
"First, import variables from previous notebooks:"
],
"id": "0eb4dc23-022c-4b3c-97eb-8c13c90e196c"
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"store -r denver_ndvi_gdf"
],
"id": "bfd58b22"
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<link rel=\"stylesheet\" type=\"text/css\" href=\"./assets/styles.css\"><div class=\"callout callout-style-default callout-titled callout-task\"><div class=\"callout-header\"><div class=\"callout-icon-container\"><i class=\"callout-icon\"></i></div><div class=\"callout-title-container flex-fill\">Try It: Import packages</div></div><div class=\"callout-body-container callout-body\"><p>The cell below imports some functions and classes from the\n",
"<code>scikit-learn</code> package to help you fit and evaluate a\n",
"decision tree model on your data. You may need some additional packages\n",
"later one. Make sure to import them here.</p></div></div>"
],
"id": "c5a6e603-b9b9-4bac-9626-9b402f240124"
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"highlight": true
},
"outputs": [],
"source": [
"from sklearn.tree import DecisionTreeClassifier, plot_tree\n",
"from sklearn.model_selection import train_test_split, cross_val_score"
],
"id": "8128d1af"
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"As with all models, it is possible to **overfit** our Decision Tree\n",
"Classifier by splitting the data into too many disconnected rectangles.\n",
"We could theoretically get 100% accuracy this way, but drawing a\n",
"rectangle for each individual data point. There are many ways to try to\n",
"avoid overfitting. In this case, we can limit the **depth** of the\n",
"decision tree to 2. This means we’ll be drawing 4 rectangles, the same\n",
"as the number of categories we have.\n",
"\n",
"Alternative methods of limiting overfitting include:\n",
"\n",
"- Splitting the data into test and train groups – the overfitted model\n",
" is unlikely to fit data it hasn’t seen. In this case, we have\n",
" relatively little data compared to the number of categories, and so\n",
" it is hard to evaluate a test/train split.\n",
"- Pruning the decision tree to maximize accuracy while minimizing\n",
" complexity. `scikit-learn` will do this for you automatically. You\n",
" can also fit the model at a variety of depths, and look for\n",
" diminishing accuracy returns.\n",
"\n",
"<link rel=\"stylesheet\" type=\"text/css\" href=\"./assets/styles.css\"><div class=\"callout callout-style-default callout-titled callout-task\"><div class=\"callout-header\"><div class=\"callout-icon-container\"><i class=\"callout-icon\"></i></div><div class=\"callout-title-container flex-fill\">Try It: Fit a tree model</div></div><div class=\"callout-body-container callout-body\"><p>Replace <code>predictor_variables</code> and\n",
"<code>observed_values</code> with the values you want to use in your\n",
"model.</p></div></div>"
],
"id": "878dcf08-b82b-4ede-9403-f588d37cf047"
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"highlight": true
},
"outputs": [],
"source": [
"# Convert categories to numbers\n",
"denver_ndvi_gdf['grade_codes'] = denver_ndvi_gdf.grade.cat.codes\n",
"\n",
"# Fit model\n",
"tree_classifier = DecisionTreeClassifier(max_depth=2).fit(\n",
" predictor_variables,\n",
" observed_values,\n",
")\n",
"\n",
"# Visualize tree\n",
"plot_tree(tree_classifier)\n",
"plt.show()"
],
"id": "10c162b6"
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<link rel=\"stylesheet\" type=\"text/css\" href=\"./assets/styles.css\"><div class=\"callout callout-style-default callout-titled callout-task\"><div class=\"callout-header\"><div class=\"callout-icon-container\"><i class=\"callout-icon\"></i></div><div class=\"callout-title-container flex-fill\">Try It: Plot model results</div></div><div class=\"callout-body-container callout-body\"><p>Create a plot of the results by:</p>\n",
"<ol type=\"1\">\n",
"<li>Predict grades for each region using the <code>.predict()</code>\n",
"method of your <code>DecisionTreeClassifier</code>.</li>\n",
"<li>Subtract the actual grades from the predicted grades</li>\n",
"<li>Plot the calculated prediction errors as a chloropleth.</li>\n",
"</ol></div></div>\n",
"\n",
"One method of evaluating your model’s accuracy is by cross-validation.\n",
"This involves selecting some of your data at random, fitting the model,\n",
"and then testing the model on a different group. Cross-validation gives\n",
"you a range of potential accuracies using a subset of your data. It also\n",
"has a couple of advantages, including:\n",
"\n",
"1. It’s good at identifying overfitting, because it tests on a\n",
" different set of data than it trains on.\n",
"2. You can use cross-validation on any model, unlike statistics like\n",
" $p$-values and $R^2$ that you may have used in the past.\n",
"\n",
"A disadvantage of cross-validation is that with smaller datasets like\n",
"this one, it is easy to end up with splits that are too small to be\n",
"meaningful, or don’t have all the categories.\n",
"\n",
"Remember – anything above 25% is better than random!\n",
"\n",
"<link rel=\"stylesheet\" type=\"text/css\" href=\"./assets/styles.css\"><div class=\"callout callout-style-default callout-titled callout-task\"><div class=\"callout-header\"><div class=\"callout-icon-container\"><i class=\"callout-icon\"></i></div><div class=\"callout-title-container flex-fill\">Try It: Evaluate the model</div></div><div class=\"callout-body-container callout-body\"><p>Use cross-validation with the <code>cross_val_score</code> to\n",
"evaluate your model. Start out with the <code>'balanced_accuracy'</code>\n",
"scoring method, and 4 cross-validation groups.</p></div></div>"
],
"id": "b29427a3-8a7f-4634-ba02-cf58fe3021ab"
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {
"highlight": true
},
"outputs": [],
"source": [
"# Evaluate the model with cross-validation"
],
"id": "03274512"
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<link rel=\"stylesheet\" type=\"text/css\" href=\"./assets/styles.css\"><div class=\"callout callout-style-default callout-titled callout-extra\"><div class=\"callout-header\"><div class=\"callout-icon-container\"><i class=\"callout-icon\"></i></div><div class=\"callout-title-container flex-fill\">Looking for an Extra Challenge?: Fit and evaluate an alternative model</div></div><div class=\"callout-body-container callout-body\"><p>Try out some other models and/or hyperparameters (e.g. changing the\n",
"max_depth). What do you notice?</p></div></div>"
],
"id": "322e1b7b-7205-4776-aae3-dcc3b068977f"
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {
"highlight": true
},
"outputs": [],
"source": [
"# Try another model"
],
"id": "b2387dd1"
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<link rel=\"stylesheet\" type=\"text/css\" href=\"./assets/styles.css\"><div class=\"callout callout-style-default callout-titled callout-respond\"><div class=\"callout-header\"><div class=\"callout-icon-container\"><i class=\"callout-icon\"></i></div><div class=\"callout-title-container flex-fill\">Reflect and Respond</div></div><div class=\"callout-body-container callout-body\"><p>Practice writing about your model. In a few sentences, explain your\n",
"methods, including some advantages and disadvantages of the choice.\n",
"Then, report back on your results. Does your model indicate that\n",
"vegetation health in a neighborhood is related to its redlining\n",
"grade?</p></div></div>"
],
"id": "89b80a8f-a63f-4ed8-9a20-51bb771a6bab"
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# YOUR MODEL DESCRIPTION AND EVALUATION HERE"
],
"id": "5286e179-8b3f-488d-889a-70829ae9cbaf"
}
],
"nbformat": 4,
"nbformat_minor": 5,
"metadata": {
"kernelspec": {
"name": "python3",
"display_name": "Python 3 (ipykernel)",
"language": "python"
}
}
}