-
Notifications
You must be signed in to change notification settings - Fork 0
/
population_density.Rmd
109 lines (78 loc) · 3.21 KB
/
population_density.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
---
title: "Does densily populated is New Zealand?"
author: "Adam J Campbell"
date: "12 May, 2020"
output: html_document
---
Load in USA data and libraries
```{r Initial_Loading, warning=FALSE, message=FALSE}
USA_2018_AREAS <- read.csv('data/tracts_2018.csv')
USA_2018_POP <- read.csv('data/nhgis0002_ds239_20185_2018_tract.csv')
library("tidyverse")
library("LaplacesDemon")
```
Join the population data to the area data. At this point, we'll filter out the records with zero area and only select a few important columns. We'll also change the area to km<sup>2</sup>.
```{r create_population_area}
population_area <-
right_join(USA_2018_POP, USA_2018_AREAS, by = "GISJOIN") %>%
filter(ALAND > 0) %>% # filter out areas that have zero land area
mutate(LAND_AREA_km2 = ALAND / 1000000) %>%
select(c("STATE", "TRACTA", "AJWME001", "LAND_AREA_km2"))
#population_area$LAND_AREA_km2 <- population_area$ALAND / 1000000
```
Now we need to load and process the New Zealand data into the format for `population_area` and append that data.
```{r}
NZ_data <-
read.csv("data/NZ_export.csv") %>%
filter(LAND_AREA_ > 0) %>% # filter out areas that have zero land area
mutate(STATE = "New Zealand") %>% # add a state
rename(
TRACTA = SA22018_V1,
LAND_AREA_km2 = LAND_AREA_,
AJWME001 = Pop_Tota_2
) %>%
select(c("STATE", "TRACTA", "AJWME001", "LAND_AREA_km2"))
population_area <-rbind(population_area, NZ_data)
```
Create `state_stats` which are broad summary statistics for each state (plus a few other places) including total population and total area.
```{r create_state_stats}
state_stats <-
population_area %>%
group_by(STATE) %>%
summarise(state_pop = sum(AJWME001), state_land_area = sum(LAND_AREA_km2)) %>%
ungroup()
state_stats$population_density_land <- state_stats$state_pop / state_stats$state_land_area
```
Join the `state_stats` data onto the `population_area` data and calculate the fraction of population and land area each census tract occupies relative to its state.
```{r join_population_area}
population_area <-
left_join(population_area, state_stats, by = "STATE")
population_area$land_area_frac <- population_area$LAND_AREA_km2 / population_area$state_land_area
population_area$pop_frac <- population_area$AJWME001 / population_area$state_pop
```
Now we will compute the KLD sum for each state.
```{r message=FALSE, warning=FALSE}
KLD_stats<-
population_area %>%
group_by(STATE) %>%
select(pop_frac, land_area_frac) %>%
summarise(KLD = KLD(pop_frac, land_area_frac)$sum.KLD.px.py)
```
Join `KLD_stats` to `state_stats` and compute a few fields.
```{r}
state_stats <- inner_join(state_stats, KLD_stats, by="STATE")
state_stats$log_density <- log(state_stats$population_density_land)
state_stats$density_KLD <- state_stats$log_density + state_stats$KLD
state_stats$lived_density <- exp(state_stats$density_KLD)
```
Make a new dataframe `state_density` that cleans up `state_stats` and keeps only items for display
```{r}
state_density <-
state_stats %>%
select(STATE, state_pop, state_land_area, population_density_land, lived_density) %>%
mutate(density_ratio = lived_density/population_density_land)
```
Save data frame
```{r}
saveRDS(state_density, file="state_density.Rda")
```