05-vector-geospatial-data.qmd

---
knitr: 
    opts_chunk: 
      code-fold: show
      results: hold
  
---
# Vector Geospatial Data {#sec-chap05}


```{r}
#| label: setup
#| results: hold
#| include: false

base::source(file = "R/helper.R")

## create data folder (only once, e.g., only in this chapter)
baseURL <- here::here()
pb_create_folder(base::paste0(baseURL, "/data"))

## create chapter folder (for each)
pb_create_folder(base::paste0(baseURL, "/data/Chapter5"))

## set theme for ggplot2 graphics
ggplot2::theme_set(ggplot2::theme_bw())
```


::::: {#obj-chap05}
:::: {.my-objectives}
::: {.my-objectives-header}
Chapter section list
:::

::: {.my-objectives-container}

1. Import geospatial data: @sec-05-import-geodata
2. Creating simple maps: @sec-05-create-maps
3. Overlaying vector datasets: @sec-05-overlaying-vector-datasets
4. Save spatial geodata files: @sec-05-save-geodata
5. Choropleth maps: @sec-05-choropleth-maps
6. Modifying map appearance: @sec-05-modify-map-appearance
7. Exporting graphics output: @sec-05-export-graphics
8. Resources: @sec-05-resources
9. Practice


:::
::::
:::::

## Import Geospatial Data {#sec-05-import-geodata}

### ESRI shapefile format

The data for import in chapter 5 are provided in `r glossary("ESRI")` shapefile format. This format was developed several decades ago but remains one of the widely used file formats for vector geospatial data. It is a multiple file format, where separate files contain the feature geometries, attribute table, spatial indices, and coordinate reference system.

:::::{.my-r-code}
:::{.my-r-code-header}
:::::: {#cnj-05-import-geospatial-data}
: Import Geospatial Data
::::::
:::
::::{.my-r-code-container}
```{r}
#| label: import-geospatial-data


glue::glue("############### import esri data #############")
okcounty <- sf::st_read("data/Chapter5/ok_counties.shp", quiet = TRUE)
tpoint <- sf::st_read("data/Chapter5/ok_tornado_point.shp", quiet = TRUE)
tpath <- sf::st_read("data/Chapter5/ok_tornado_path.shp", quiet = TRUE)

glue::glue("")
glue::glue("############### show data class #############")
class(okcounty)

glue::glue("")
glue::glue("############### show data with dplyr #############")
dplyr::glimpse(okcounty)
```


***

The {**sf**} objects contain a column called geometry. This is a special column that contains the geospatial information about the location of each feature. This column should not be modified directly. It is used by the functions in the {**sf**} package for geospatial data processing.


::::
:::::

::: {.callout-note #nte-05-skimr}
###### Using {skimr} with {sf}

Normally I am using the `skimr::skim()` function for data summary. But for the {**sf**} data classes in the `geometry` column are no skimmers available. (Possible data types are: sfc_POINT, sfc_LINESTRING, sfc_POLYGON, sfc_MULTIPOINT, sfc_MULTILINESTRING, sfc_MULTIPOLYGON, and sfc_GEOMETRY.) In the above case the `class(okcounty$geometry)` = "*`r class(okcounty$geometry)`*" and not user-defined for {**skimr**} The fall back to the "character" class is not useful. (`sfc` stands for "simple feature list column".)

It is possible to adapt {**skimr**} for working with user defined data types using `skimr::skim_with()`. Resources that explain how to do this are:

- [Defining sfl’s for a package](https://docs.ropensci.org/skimr/articles/extending_skimr.html#defining-sfls-for-a-package): General article that explains how to generate and use with user defined data types. `sfl`stands for "skimr function list". It is a list-like data structure used to define custom summary statistics for specific data types.
- [skim of {**sf**} objects](https://github.com/ropensci/skimr/issues/88): Discussion specific to the {**sf**} package.

At the moment I do not understand enough about the {**sf**} package to get into more details for writing an appropriate function. I wonder if there is not already a solution available as spatial data processing with R and the {**sf**} package is not an extremely rare use case.
:::


In the R package {**sf**} (Simple Features), many functions are prefixed with `st_`. The `st_` prefix is inspired by [PostGIS](https://postgis.net/), which refers with the abbreviation to “spatial type”. This prefix is used consistently throughout {**sf**} to indicate that a function operates on spatial data. In the context of {**sf**}, `st_` serves as a namespace for spatial functions, allowing developers and users to easily identify and find functions related to spatial operations. This prefixing convention makes it simple to discover and use spatial functions.

Looking at the file names I noticed: All files have the same filename with different extensions. There are always four files with the extensions `.dbf`, `.prj`, `.shp`, `.shx`. 

The shapefiles are imported to {**sf**} objects using the `sf::st_read()` function. The `quiet = TRUE` argument suppresses output to the console when importing spatial datasets. It 

An example for the output when using `quit = FALSE` (the default option) is:

> Reading layer `ok_counties' from data source 
  `/Users/petzi/Documents/Meine-Repos/GDSWR/data/Chapter5/ok_counties.shp' using driver `ESRI Shapefile'  
Simple feature collection with 77 features and 7 fields  
Geometry type: POLYGON  
Dimension:     XY  
Bounding box:  xmin: -103.0025 ymin: 33.62184 xmax: -94.43151 ymax: 37.00163  
Geodetic CRS:  NAD83

To read in a shapefile, it is only necessary to specify the filename with a `.shp` extension. However, all the files, including the `.shp` file as well as the `.dbf`, `.shx`, and `.prj` files, need to be present in the directory from which the data are read.

- The `ok_counties.shp` dataset contains county boundaries for the state of  Oklahoma. 
- The `ok_tornado_point.shp` dataset and the `ok_tornado_path.shp`  dataset contain historical information about tornadoes in Oklahoma. 
    - The points are the initial locations of tornado touchdown. 
    - The paths are lines that identify the path of each tornado after touchdown.
- These data were derived from larger, national-level datasets generated by the National Oceanographic and Atmospheric Administration (NOAA) [National Weather Service Storm Prediction Center](https://www.spc.noaa.gov/gis/svrgis/).

### Conversion data sf <-> sp

:::::{.my-r-code}
:::{.my-r-code-header}
:::::: {#cnj-05-data-conversion-sf-sp}
: {**sf**} data to {**sp**} data and reverse
::::::
:::
::::{.my-r-code-container}
```{r}
#| label: data-conversion-sf-sp

glue::glue("############### convert from sf to sp data #############")
okcounty_sp <- sf::as_Spatial(okcounty) # sf::as(okcounty, 'Spatial') does not work!
class(okcounty_sp)

glue::glue("")
glue::glue("############### convert from sp to sf data #############")
okcounty_sf <- sf::st_as_sf(okcounty_sp)
class(okcounty_sf)
```


::::
:::::


## Creating simple maps {#sec-05-create-maps}

A good strategy to get an overview about the data is to plot the data as map. There are two options:  Using `ggplot2::geom_sf()` or `base::plot()`.

### Draw Oklahoma county boundaries

To generate a map of counties using `ggplot2::ggplot()` with a {**sf**} object, the `ggplot2::geom_sf()` function is used.

From the view of the {**ggplot2**} package the `ggplot2::geom_sf()` is an unusual geom because it will draw different geometric objects depending on what simple features are present in the data: you can get points, lines, or polygons. For text and labels, you can use `ggplot2::geom_sf_text()` and `ggplot2::geom_sf_label()`.

::: {.my-code-collection}
:::: {.my-code-collection-header}
::::: {.my-code-collection-icon}
:::::
:::::: {#exm-05-ploting-oklahoma-county-boundaries}
: Plotting Oklahoma county boundaries
::::::

::::

::::{.my-code-collection-container}

::: {.panel-tabset}

###### `ggplot2`

:::::{.my-r-code}
:::{.my-r-code-header}
:::::: {#cnj-05-oklahoma-county-boundaries-ggplot2}
: Oklahoma county boundaries with {**ggplot2**}
::::::
:::
::::{.my-r-code-container}
```{r}
#| label: fig-05-oklahoma-county-boundaries-ggplot2
#| fig-cap: "Oklahoma county boundaries plotted with {ggplot2}"
#| fig-height: 3

ggplot2::ggplot(data = okcounty) +
  ggplot2::geom_sf(fill = NA) +
  ggplot2::theme_void()
```

***

`fill = NA` makes the counties transparent.

(To get the same result as in the base R approach I used `ggplot2::theme_void()` to hide the coordinates which is shown in the original book example.) 

::::
:::::


###### `base::plot()`

:::::{.my-r-code}
:::{.my-r-code-header}
:::::: {#cnj-05-oklahoma-county-boundaries-base-plot}
: Oklahoma county boundaries with `base::plot()`
::::::
:::
::::{.my-r-code-container}
```{r}
#| label: fig-05-oklahoma-county-boundaries-base-plot
#| fig-cap: "Oklahoma county boundaries plotted with base::plot()"

graphics::par(mar = c(0, 0, 0, 0))
base::plot(okcounty$geometry)
```

***

From R Graph Gallery I learend that I could also use bese R to plot spatial geodata. But everybody agrees that using {**ggplot2**} is the preferred approach.
::::
:::::

:::

::::
:::::


::: {.callout-note #nte-05-too-much-white-space}
###### Too much space around cholorpleth map

As you can see from both graphics there is ample space aorund the map. I do not know how to remove it. Therefore I wrote a [question on StackOverflow](https://stackoverflow.com/questions/79295599/remove-white-space-around-sf-ggplot2-choropleth-map). I used a simple example provide by the {**sf**} package.
:::


### Inspect `tpoint` and `tpath`

Because {**sf**} objects are a type of data frame, they can be modified using the normal {**tidyverse**} functions. Let's look at the two other R objects we've generated in @cnj-05-import-geospatial-data.

::: {.my-code-collection}
:::: {.my-code-collection-header}
::::: {.my-code-collection-icon}
:::::
:::::: {#exm-05-show-tornado-file-structure}
: Display structure of the tornado files
::::::

::::

::::{.my-code-collection-container}

::: {.panel-tabset}

###### `tpoint`

:::::{.my-r-code}
:::{.my-r-code-header}
:::::: {#cnj-05-glimpse-tpoint}
: Glimpse at `tpoint`
::::::
:::
::::{.my-r-code-container}
```{r}
#| label: glimpse-tpoint

dplyr::glimpse(tpoint)
```


***

The points are the initial locations of tornado touchdowns.
::::
:::::


###### `tpath`

:::::{.my-r-code}
:::{.my-r-code-header}
:::::: {#cnj-05-glimpse-tpath}
: Glimpse at `tpath`
::::::
:::
::::{.my-r-code-container}
```{r}
#| label: glimpse-tpath

dplyr::glimpse(tpath)
```

***

The paths are lines that identify the path of each tornado after touchdown.

::::
:::::

:::

::::
:::::

From `dplyr::glimpse()` we get an idea about the data structure. But we do not know the numeric span covered by the variable. This is especially important for our next task to focus on data from the last five years. We know from @exm-05-show-tornado-file-structure that the dataset starts with the year 1950 but we have no clue about the middle or end of the dataset.

For this reason I have designed a special functions that returns the first and last dataset and several random data. The default number of data shown is eight but this can be changed using a second parameter.

::: {.my-code-collection}
:::: {.my-code-collection-header}
::::: {.my-code-collection-icon}
:::::
:::::: {#exm-05-show-random-tornado-data}
: Show some random tornado data, including first and last record
::::::

::::

::::{.my-code-collection-container}

::: {.panel-tabset}

###### `tpoint`

:::::{.my-r-code}
:::{.my-r-code-header}
:::::: {#cnj-05-show-random-tpoint-data}
: Show random `tpoint` data
::::::
:::
::::{.my-r-code-container}
```{r}
#| label: show-random-tpoint-data

pb_glance_data(tpoint)
```

::::
:::::


###### `tpath`

:::::{.my-r-code}
:::{.my-r-code-header}
:::::: {#cnj-05-show-random-tpath-data}
: Show random `tpath` data
::::::
:::
::::{.my-r-code-container}
```{r}
#| label: show-random-tpath-data

pb_glance_data(tpath)
```


::::
:::::

:::

::::
:::::

### Visualization of the Oklahoma tornado data (2016-2021)

Because {**sf**} objects are a type of data frame, they can be modified using the normal {**tidyverse**} functions.

- A reduced dataset for the years 2016-2021 and only with the columns ID (`om`), the year (`yr`), and the date (`date`) and is prepared in the first tab `reduce data`.
- Initiation points of tornadoes in Oklahoma from 2016–2021 is shown in tab `initiation points`.
- Tab `tornados path` shows the paths of tornadoes in Oklahoma from 2016–2021.
- Initiation points of tornadoes in Oklahoma from 2016–2021 with years represented by the color aesthetic is in tab `color aesthetic`.
- In the final tab `facets` you will see the initiation points of tornadoes in Oklahoma from 2016–2021 with years mapped as separate facets. 

::: {.my-code-collection}
:::: {.my-code-collection-header}
::::: {.my-code-collection-icon}
:::::
:::::: {#exm-05-different-tornado-visualizations}
: Show different visualization of the Oklahoma tornado data (2016-2021)
::::::

::::

::::{.my-code-collection-container}

::: {.panel-tabset}

###### reduce data

:::::{.my-r-code}
:::{.my-r-code-header}
:::::: {#cnj-05-reduce-tornado-data}
: Filter data from 2016 to 2021 and select only three columns (ID, year and date)
::::::
:::
::::{.my-r-code-container}
```{r}
#| label: reduce-tornado-data
#| code-fold: show

tpoint_16_21 <- tpoint |> 
  dplyr::filter(yr >= 2016 & yr <= 2021) |> 
  dplyr::select(om, yr, date)

tpath_16_21 <- tpath |> 
  dplyr::filter(yr >= 2016 & yr <= 2021)  |> 
  dplyr::select(om, yr, date)
```

<center>(*For this R code chunk is no output available*)</center>

::::
:::::


###### initiation points

:::::{.my-r-code}
:::{.my-r-code-header}
:::::: {#cnj-05-show-tornado-initation-points}
: Show initiation points of tornadoes in Oklahoma from 2016–2021
::::::
:::
::::{.my-r-code-container}
```{r}
#| label: fig-show-tornado-initation-points
#| fig-cap: "Initiation points of tornadoes in Oklahoma from 2016–2021."

ggplot2::ggplot() +
  ggplot2::geom_sf(data = okcounty, fill = NA) +
  ggplot2::geom_sf(data = tpoint_16_21)
```

***

- Because each function maps a different dataset, the data argument must be provided in each `ggplot2::geom_sf()` function instead of in the `ggplot2::ggplot()` function. 
- I am using as default theme the `ggplot2::theme_bw()` function (see setup chunk of this chapter) to display the map over a white background while retaining the graticules.


::::
:::::

###### tornado paths

:::::{.my-r-code}
:::{.my-r-code-header}
:::::: {#cnj-05-show-tornado-paths}
: Show tornadoes paths in Oklahoma from 2016–2021
::::::
:::
::::{.my-r-code-container}
```{r}
#| label: fig-show-tornado-paths
#| fig-cap: "Paths of tornadoes in Oklahoma from 2016-2021."

ggplot2::ggplot() +                              
  ggplot2::geom_sf(data = okcounty, fill = NA) + 
  ggplot2::geom_sf(data = tpath_16_21,           
          color = "red",                         
          size = 1)                              
```
***

To make the tornado path lines easier to see in relation to the county boundaries, they are displayed in red and their sizes are increased to be larger (size = 1) than the default line width of 0.5.


::::
:::::

###### color aesthetic

:::::{.my-r-code}
:::{.my-r-code-header}
:::::: {#cnj-05-tornado-with-color-aesthetic}
: Initiation points of tornadoes in Oklahoma from 2016-2021 with years represented by the color aesthetic
::::::
:::
::::{.my-r-code-container}
```{r}
#| label: fig-tornado-with-color-aesthetic
#| fig-cap: "Initiation points of tornadoes in Oklahoma from 2016-2021 with years represented by the color aesthetic."

ggplot2::ggplot() +
  ggplot2::geom_sf(data = tpoint_16_21, 
          ggplot2::aes(color = forcats::as_factor(yr))) + # (1)
  ggplot2::geom_sf(data = okcounty, fill = NA) +
# ggplot2::scale_color_discrete(name = "Year") +          # (2)
  ggokabeito::scale_color_okabe_ito(name = "Year") +      # (2)
  ggplot2::coord_sf(datum = NA) +                         # (3)
  ggplot2::theme_void()                                   # (3)
```
***

To view the years of the tornadoes on the map, an aesthetic can be specified. 

**Line Comment 1**: In the book the color argument is specified as `base::as.factor(yr)` so that the year is displayed as a discrete variable instead of a continuous variable. Instead of the base function I have used `forcats::as_factor(yr)`.

> Compared to base R, when x is a character, this function creates levels in the order in which they appear, which will be the same on every platform. (Base R sorts in the current locale which can vary from place to place.) (from the {**forcats**)} help file).


**Line Comment 2**: The `ggplot2::scale_color_discrete()` function is used to set the legend name. But the used (standard) color scale is not appropriate for people with color-vision deficiency (`r glossary("CVD")`). I have therefore used `ggokabeito::scale_color_okabe_ito()`.


**Line Comment 3**: The book says that the `ggplot2::theme_void()` function removes the plot axes and labels and shows only the map. I suppose that this is not correct. `ggplot2::coord_sf(datum = NA)` removes the plot axes and labels; `ggplot2::theme_void()` removes the border frame around the graphic.


::::
:::::

###### facets

:::::{.my-r-code}
:::{.my-r-code-header}
:::::: {#cnj-05-tornado-initiation-points-facets}
: Initiation points of tornadoes in Oklahoma from 2016-2021 as facet visualization
::::::
:::
::::{.my-r-code-container}
```{r}
#| label: fig-tornado-initiation-points-facets
#| fig-cap: "Initiation points of tornadoes in Oklahoma from 2016-2021 with years mapped as separate facets."

ggplot2::ggplot() +
  ggplot2::geom_sf(data = okcounty, 
          fill = NA, 
          color = "gray") +
  ggplot2::geom_sf(data = tpoint_16_21, size = 0.75) +
  ggplot2::facet_wrap(ggplot2::vars(yr), ncol = 2) +
  ggplot2::coord_sf(datum = NA) +
  ggplot2::theme_void()
```

***

Alternately, `ggplot2::facet_wrap()` can be used to display the tornadoes for each year on a separate map. In comparison to the previous tab the sizes of the points are reduced slightly from the standard `size = 1` to `size = 0.75`, so that they are better suited for the smaller maps. 

::::
:::::


:::

::::
:::::


::: {.callout-note #nte-too-much-horizontal-space}
With the exception of the facet graphics there is too much horizontal space above and below the {**sf**} graphic. Is this a known problem? How to reduce the horizontal space for {**sf**} graphics plotted with {**ggplot2**}?
:::

:::::{.my-solution}
:::{.my-solution-header}
Solution: Remove empty space in maps
:::
::::{.my-solution-container}
I found a solution after [posting the question in StackOverflow](https://stackoverflow.com/questions/79295599/remove-white-space-around-sf-ggplot2-choropleth-map): I need to set the figure size in the quarto chunk options so your figure has the right aspect ratio in the document. As far as I can see there are two options:

- Reducing the heigt of the figure from its standard height of 5 inches. For instance to three inches with `fig-heigt: 3` in the quarto chunk option. See an example in @cnj-05-choropleth-filled-colors or @cnj-05-choropleth-mappying-symbols. (To see the chunk options together with the code I have used `echo: fenced` for these two chunks.)
- Changing the aspect ratio from 1 to a smaller value, for instance to 3/4 with `ggplot2::theme(aspect.ratio = 3/4)`. See an example in @cnj-annex-b-zoom-europe-map.  


::::
:::::


## Overlaying Vector Datasets {#sec-05-overlaying-vector-datasets}

### A first spatial join

The number of tornado points in each county can be calculated using the `sf::st_join()` function to carry out a spatial join. A spatial join with {**sf**} is different from the joinwith {**dplyr**}: `sf::st_join()` links rows from the two tables based on the spatial locations instead of their attributes. 

In this case the functions compares the point coordinates of the `tpoint_16_21` dataset in its `geometry` column with the polygon coordinates from the `geometry` column of the `okcounty` dataset. It joins `tpoint_16_21` with the `geometry` row that includes the appropriate polygon from `okcounty` containing the point coordinates. 


:::::{.my-r-code}
:::{.my-r-code-header}
:::::: {#cnj-05-spatial-join}
: Overlaying vector datasets with a spatial join
::::::
:::
::::{.my-r-code-container}
```{r}
#| label: spatial-join

countypnt <- sf::st_join(tpoint_16_21, okcounty)  

dplyr::glimpse(countypnt)
```


::::
:::::

### Count tornados per county

Afterward, each row in `countypnt` data contains additional columns from the `okcounty` dataset that correspond to the county that the tornado with it point coordinates is within. The dataset contains one record for each tornado. 

To compute the total number of tornadoes per county, `countypnt` must be grouped by the `GEOID` county code or by the county `NAME` (here by `GEOID` county code). 

But before grouping and summarizing, `countypnt` must be converted from an {**sf**} object to a normal data frame using `sf::st_drop_geometry()`.


:::::{.my-r-code}
:::{.my-r-code-header}
:::::: {#cnj-05-count-tornados-per-county}
: Count tornados per county
::::::
:::
::::{.my-r-code-container}
```{r}
#| label: count-tornados
#| results: hold

glue::glue("#### show class before `sf::st_drop_geometry()` #####")
base::class(countypnt)
countypnt <- sf::st_drop_geometry(countypnt)
glue::glue("")
glue::glue("##### show class after `sf::st_drop_geometry()` ######")
base::class(countypnt)


countysum <- countypnt |> 
  dplyr::group_by(GEOID) |> 
  dplyr::summarize(tcnt = dplyr::n())  

glue::glue("")
glue::glue("##### glimpse at the new summarized data frame` ######")
dplyr::glimpse(countysum)
```


::::
:::::

### Associate polygons with tornado counts

In the next step we join `okcounty` to `countysum` so that each polygon is associated with the appropriate tornado summary. 

:::::{.my-r-code}
:::{.my-r-code-header}
:::::: {#cnj-05-join-polygons-with-tornado-counts}
: Associate each polygon with the appropriate tornado summary
::::::
:::
::::{.my-r-code-container}
```{r}
#| label: join-polygons-with-tornado-counts

countymap <- okcounty   |>
  dplyr::left_join(countysum, by = "GEOID")  |>        # (1)
  dplyr::mutate(tcnt = 
        base::ifelse(base::is.na(tcnt), 0, tcnt)) |>   # (2)
  dplyr::mutate(area = sf::st_area(okcounty),
         tdens = 10^3 * 10^3 * tcnt / area)   |>       # (3)
  units::drop_units()                                  # (4)


dplyr::glimpse(countymap)
```

***

**Line comment 1**: Using `dplyr::left_join()` instead of `dplyr::inner_join()` ensures that all of the county polygons are retained in the output of the join. (`dplyr::inner_join()` only keeps observations from x that have a matching key in y, whereas `dplyr::left_join()` keeps all observations in x.)

**Line comment 2**: If there are between 2016-2021 several tornados per county than we get several rows. But the reverse is also true: If a county has had no tornado in the years 2016-2021 this county gets `NA` values as the number of tornados. 

As a matter of fact a few counties had no tornadoes during 2016–2021 and are therefore missing from `countysum`, resulting in `NA` values in the joined table. In this case, we know that `NA` means zero tornadoes, so the we must replace `NA` values by zeroes. I did this with the `dplyr::mutate()` function instead of `base::replace()`. Besides this approach does not need the `.` symbol of the {**magrittr**} packages (exporting into dplyr) for refering to the database (respectively its equivalent `_` for the R pipe). See for details @nte-chap03.

**Line comment 3**: The second `dplyr::mutate()` function computes the area of each county using `sf::st_area()` and then calculates the density of tornadoes per county. Density is initially in tornadoes per square meter but is converted to tornadoes per 1000 km^2. 

**Line comment 4**: The `sf::st_area()` function returns a column with explicit measurement units, but these are removed using the `units::drop_units()` function for simplicity. For more information see the vignettes and help pages of the {**units**} package. 

::::
:::::

## Save spatial geodata files {#sec-05-save-geodata}

### ESRI format

The `sf::st_write()` function can be used to save sf objects to a variety of file formats. In many cases, the function can determine the output format from the output filename extension. The following code saves the county-level tornado summaries in ESRI shapefile format. The `append = FALSE` option overwrites the shapefile if it already exists.

:::::{.my-r-code}
:::{.my-r-code-header}
:::::: {#cnj-05-save-spatial-ESRI-format}
: Save spatial data files into ESRI format
::::::
:::
::::{.my-r-code-container}
```{r}
#| label: save-spatial-ESRI-format
#| eval: false

sf::st_write(countymap, 
         dsn = "data/Chapter5/oktornadosum.shp", 
         append = FALSE)
```

::::
:::::

After a message what the script does 

> > Writing layer `oktornadosum' to data source 
  `data/Chapter5/oktornadosum.shp' using driver `ESRI Shapefile'
  Writing 77 features with 10 fields and geometry type Polygon.

I got for every feature (= 77 rows) a warning message emitted by the GDAL library:

> Warning: GDAL Message 1: Value 1890663260.74707699 of field area of feature 0 not successfully written. Possibly due to too larger number with respect to field width

It turned out that this is a [misleading warning](https://github.com/r-spatial/sf/issues/306) and that one should not use the old ESRI format but the newer and better Open Geospatial Consortium (OGC) GeoPackage format. See also [StackOverflow](https://stackoverflow.com/a/73242539/7322615) and the [answer from the {**sf**} developer](https://github.com/r-spatial/sf/issues/2368):

> The general recommendation is to not use shapefiles: the format is not an open standard, it has many limitations and modern formats are available. A good alternative is GeoPackage.

### GeoPackage format

GeoPackage is also mentioned as an alternative in the book. The data are stored in an SQLite database that may contain one or more layers. In this example, the `delete_dsn = TRUE` argument overwrites the entire GeoPackage. Leaving this argument at its default value of `FALSE` would add a new layer to an existing database.

:::::{.my-r-code}
:::{.my-r-code-header}
:::::: {#cnj-05-save-spatial-geodata-in-GeoPackage-format}
: Save spatial geodata in GeoPackage format
::::::
:::
::::{.my-r-code-container}
```{r}
#| label: save-spatial-GeoPackage-format

sf::st_write(countymap, 
         dsn = "data/Chapter5/oktornado.gpkg", 
         layer = "countysum",
         delete_dsn = TRUE)
```


::::
:::::

### GeoJSON format


Another commonly-used open geospatial data format is GeoJSON. It is based on Javascript Object Notation (`r glossary("JSON")`), a human-readable text format that stores data in ASCII files. The layer_options argument must be set to "RFC7946 = YES" to save the data in the newest GeoJSON specification.

:::::{.my-r-code}
:::{.my-r-code-header}
:::::: {#cnj-05-save-spatial-geodata-in-GeoJSON-format}
: Save spatial geodata in GeoJSON format
::::::
:::
::::{.my-r-code-container}
```{r}
#| label: save-spatial-GeoJSON-format

sf::st_write(obj = countymap, 
             dsn = "data/Chapter5/oktornado.geojson", 
             layer_options = "RFC7946 = YES",
             delete_dsn = TRUE)
```


::::
:::::

Here again I had to add `delete_dsn = TRUE` (`append = FALSE` did not work for this format!). Otherwise I would get an error message that the dataset already exists.

## Choropleth Maps {#sec-05-choropleth-maps}

### Filling with colors (standard)

Another way to display the tornadoes is as a choropleth map, where summary statistics for each county are displayed as different colors. The county-level tornado density can be as a choropleth using the `fill` aesthetic with `ggplot2::geom_sf()`. By default, the fill colors are based on a dark-to-light blue color ramp. The `ggplot2::theme_void()` function eliminates the axes and graticules and displays only the map on a white background.

:::::{.my-r-code}
:::{.my-r-code-header}
:::::: {#cnj-05-choropleth-filled-colors}
: Densities of tornadoes mapped as a choropleth.
::::::
:::
::::{.my-r-code-container}
```{r}
#| label: fig-choropleth-filled-colors
#| fig-cap: "Densities of tornadoes in Oklahoma counties from 2016-2021 mapped as a choropleth."
#| fig-height: 3
#| echo: fenced


ggplot2::ggplot(data = countymap) +
  ggplot2::geom_sf(ggplot2::aes(fill = tdens)) +
  ggplot2::theme_void() +
  ggplot2::coord_sf()
```


::::
:::::

### Mapping symbols

To map symbols, the county polygons must first be converted to points. The `sf::st_centroid()` generates a point feature located at the centroid of each county. The `sf::st_geometry_type()` function returns the feature geometry type. Setting `by_geometry = FALSE` returns one geometry type for the entire dataset instead of for every feature.

:::::{.my-r-code}
:::{.my-r-code-header}
:::::: {#cnj-05-convert-county-polygons-to-points}
: Convert county polygons to points
::::::
:::
::::{.my-r-code-container}
```{r}
#| label: convert-county-polygons-to-points
#| results: hold

glue::glue("##### Return geometry type before converted to points #####")
sf::st_geometry_type(okcounty, by_geometry = FALSE)


############# Return the centroid of a geometry
okcntrd = sf::st_centroid(countymap)

glue::glue("")
glue::glue("##### Return geometry type after converted to points #####")
sf::st_geometry_type(okcntrd, by_geometry = FALSE)

```


::::
:::::

::: {.callout-note style="color: blue;" #nte-05-attributes-constant-warning}
###### How to get rid of the warning?

At the moment I do not know how to suppress the warning. Possible pointers to solve this problem are:

- **sf GitHub**: [suppress specific warning message](https://github.com/r-spatial/sf/issues/406)
- **Spatial Data Science**:  [Chapter 5](https://r-spatial.org/book/05-Attributes.html)

> When, while manipulating geometries, attribute values are retained unmodified, support problems may arise. If we look into a simple case of replacing a county polygon with the centroid of that polygon on a dataset that has attributes, we see that R package sf issues a warning:
>
> *Warning: st_centroid assumes attributes are constant over geometries*
:::

The tornado counts can be mapped using the `okcentrd` dataset with the `size` aesthetic. One point is plotted for each county centroid, and the size of the point is proportional to the number of tornadoes in the county.

:::::{.my-r-code}
:::{.my-r-code-header}
:::::: {#cnj-05-choropleth-mappying-symbols}
: Choropleth map using graduated symbols
::::::
:::
::::{.my-r-code-container}
```{r}
#| label: fig-choropleth-mappying-symbols
#| fig-cap: "Numbers of tornadoes in Oklahoma counties from 2016-2021 mapped as graduated symbols."
#| fig.height: 3
#| echo: fenced

ggplot2::ggplot() +
  ggplot2::geom_sf(data = okcntrd, ggplot2::aes(size = tcnt)) +
  ggplot2::geom_sf(data = okcounty, fill = NA) +
  ggplot2::theme_void()
```


::::
:::::

## Modifying Map Appearance {#sec-05-modify-map-appearance}

### {**RColorBrewer**}: Color palettes for choropleth mapping

The {**RColorBrewer**} package provides a selection of palettes designed for choropleth mapping (Harrower and Brewer 2003). The `display_brewer_all()` function generates a chart with examples of all the available palettes (@fig-rcolorbrewer-palettes).

:::::{.my-r-code}
:::{.my-r-code-header}
:::::: {#cnj-05-rcolorbrewer-palettes}
: Show color palettes of the {**RColorBrewer**} package
::::::
:::
::::{.my-r-code-container}
```{r}
#| label: fig-rcolorbrewer-palettes
#| fig-cap: "Color palettes available in the RColorBrewer package."
#| fig-height: 9

RColorBrewer::display.brewer.all()
```


::::
:::::

There are three types of ColorBrewer palettes. 

1. The top group in @fig-rcolorbrewer-palettes contains sequential palettes that are suitable for mapping ordered data along numerical scales (e.g., temperatures ranging from 0 to 30 degrees C) or ordinal categories (e.g., temperatures classified as cold, warm, and hot). These sequential palettes may include a single color or multiple colors, but have no clear breaks in the scale. 
2. The middle group in @fig-rcolorbrewer-palettes contains qualitative palettes, which use color to distinguish between different categories without implying order. 
3. The lower group in @fig-rcolorbrewer-palettes contains divergent palettes that emphasize a specific breakpoint in the data. Divergent palettes are often used to indicate values that are above or below the mean or to highlight values that are higher or lower than zero. 

More details about these palettes, including recommendations for color schemes that are most effective for different types of computer displays and for accommodating color-blind viewers, are available at [http://colorbrewer2.org](@fig-rcolorbrewer-palettes).

### Specifying a color palette for continuous data

Additional {**ggplot2**} functions can be added to improve the appearance of the map. 

:::::{.my-r-code}
:::{.my-r-code-header}
:::::: {#cnj-05-specify-color-palette}
: Specifying a color palette for continuous data with {**RColorBrewer**}
::::::
:::
::::{.my-r-code-container}
```{r}
#| label: fig-specify-continuous-color-palette
#| fig-cap: "Densities of tornadoes in Oklahoma counties from 2016-2021 mapped as a choropleth with a custom palette."


ggplot2::ggplot(data = countymap) +
  ggplot2::geom_sf(ggplot2::aes(fill = tdens * 10^3)) + # (1)
  ggplot2::scale_fill_distiller(                        # (2)
    name = base::expression("Tornadoes/1000 km"^2),     # (3) 
    palette = "YlOrRd",                                 # (4)
    breaks = scales::extended_breaks(n = 6),            # (5)
    direction = 1) +                                    # (6)
  ggplot2::theme_void() +                               # (7)
  ggplot2::theme(legend.position = "bottom")            # (8)
```

***

- **Line comment 1**: In contrast to the code in the book, I had to multiply the column for the tornado densities (`tdens`) with 10^3. Otherwise I would get decimal numbers overlapping each other after moving the legend to the bottom.
- **Line comment 2**: The `ggplot2::scale_fill_distiller()` function allows the specification of a different color ramp.
- **Line comment 3**: The `base::expression()` function is used for specifying the name argument for `ggplot2::scale_fill_distiller()` and to add text with a superscript.
- **Line comment 4**: In this example we have used the “YlOrRd” palette from the {**RColorBrewer**} package. As the name says it starts from yellow and goes to red.
- **Line comment 5**: The book uses the superseded `scales::pretty_breaks()` function instead the newer `scales::breaks_pretty()` function. This standard R break algorithm is primariy useful for date/times, for numeric scales the `scales::extended_breaks()` function does a slightly better job. `n = 6` is the number of desired breaks. You may get slightly more or fewer breaks that requested. (After trying it out I learned that in this case the `n` parameter wouldn't be necessary to get the same result.
- **Line comment 6**: The default value is `direction = -1` and produces scales from dark to light colors. We want the reverse representing lighter colors with few and dark colors with many tornados.
- **Line comment 7**: Note that “complete” themes like `ggplot2::theme_void()` will remove any settings made by a previous `ggplot2::theme()` function. Therefore, it is necessary to call `ggplot2::theme_void()` before `ggplot2::theme()` to implement specific theme setting settings. 
- **Line comment 8**: We moved the legend to the bottom of the map to better accomodate the longer legend title. 
::::
:::::


### Specifying a color palette for discrete data

The {**RColorBrewer**} palettes each contain a finite number of colors that are intended to be associated with categories in a choropleth map. Note that the `ggplot2::scale_fill_distiller()` function used to generate the color scale for the map in @fig-specify-continuous-color-palette operates a bit differently. This function takes a ColorBrewer palette and converts it to a continuous color ramp. 

The next map example will show how to define categories and map each one as a distinctive color. To view the colors for a given number of categories and a specific palette, the `RColorBrewer::display.brewer.pal()` function is used with the number of categories as the first argument and the palette name as the second palette (@fig-colorbrewer-discrete-palette).

:::::{.my-r-code}
:::{.my-r-code-header}
:::::: {#cnj-annex-b-colorbrewer-discrete-palette}
: ColorBrewer discrete color palette
::::::
:::
::::{.my-r-code-container}
```{r}
#| label: fig-colorbrewer-discrete-palette
#| fig-cap: "The ColorBrewer 'YlOrRd' (yellow to red) color palette with four categories."

RColorBrewer::display.brewer.pal(4, "YlOrRd")
```


::::
:::::


Rather than using continuous scales for color and size, it is often recommended to aggregate the data into a small number of classes (typically 3-6). Using discrete classes makes it easier to associate each color or symbol in the map with a specific range of values. 

To accomplish this step, we need to add a couple of new classified variables using `dplyr::mutate()`. The `base::cut()` function is used to split the continuous variables based on user-specified breaks. The incidence variable is split based on quantiles (i.e., percentiles) defined in the `qbrks` object. The population breaks are manually specified.

:::::{.my-r-code}
:::{.my-r-code-header}
:::::: {#cnj-05-generate-discrete-classes}
: Generating discrete classes
::::::
:::
::::{.my-r-code-container}
```{r}
#| label: generate-discrete-classes
#| results: hold

numclas = 4                                  # (1)
qbrks <- seq(0, 1, length.out = numclas + 1) # (2)
qbrks

countymap <- countymap |>                     # (3)
  dplyr::mutate(tdens_c1 = base::cut(tdens * 10^3,
                        breaks = stats::quantile(tdens * 10^3, breaks = qbrks),
                        include.lowest = T))

base::class(countymap$tdens_c1)
base::levels(countymap$tdens_c1)
```
***

- **Line comment 1**: We want to create four different colors for the tornado density.
- **Line comment 2**: Whatever the number of different colors, because of the end point to create we need to add 1 to this number.
- **Line comment 3**: We create a new column `tdens_c1` for the break values of the tornado densities. The `tdens_c1` column is a discrete factor with 4 levels instead of a continuous numerical variable as in @fig-specify-continuous-color-palette.

::::
:::::

Because the new `tdens_c1` column is a discrete factor instead of a continuous numerical variable as in @fig-specify-continuous-color-palette we are now going to use the `ggplot2::scale_fill_brewer()` function is now used instead of `ggplot2::scale_fill_distiller()`. The comma-separated numbers specify the range of tornado densities for each of the categories (Figure 5.11).

:::::{.my-r-code}
:::{.my-r-code-header}
:::::: {#cnj-05-mapping-colors-categories}
: Mapping densities of tornados with color categories
::::::
:::
::::{.my-r-code-container}
```{r}
#| label: fig-mapping-colors-categories
#| fig-cap: "Densities of tornadoes in Oklahoma counties from 2016-2021 mapped as a choropleth with four categories with colors"

ggplot2::ggplot(data = countymap) +
  ggplot2::geom_sf(ggplot2::aes(fill = tdens_c1)) +
  ggplot2::scale_fill_brewer(name = base::expression("Tornadoes/1000 km"^2),   
                    palette = "YlOrRd") +
  ggplot2::theme_void() +
  ggplot2::theme(legend.position = "bottom") 
```


::::
:::::

### Specifying graduate symbols with categories

Similar to choropleth maps, graduated symbol maps are often easier to interpret if they include a limited number of symbol sizes. To generate a classified map of tornado counts, they are converted to discrete categories using the `base::cut()` function. Instead of using quantiles, the breakpoints for the classification are selected manually and stored in the `brkpts` vector.

:::::{.my-r-code}
:::{.my-r-code-header}
:::::: {#cnj-05-mapping-symbols-categories}
: Generating discrete classes for symbol mapping
::::::
:::
::::{.my-r-code-container}
```{r}
#| label: mapping-symbols-categories

maxcnt <- max(okcntrd$tcnt)
brkpts <- c(0, 2, 5, 10, maxcnt)
okcntrd <- okcntrd |> 
  dplyr::mutate(tcnt_c1 = base::cut(tcnt,
                        breaks = brkpts,
                        include.lowest = TRUE)
                )

base::levels(okcntrd$tcnt_c1)

```

::::
:::::

The resulting map has four symbol sizes, each associated with a specific range of tornado counts.


:::::{.my-r-code}
:::{.my-r-code-header}
:::::: {#cnj-05-mapping-symbols-categories}
: Mapping densities of tornados with symbol categories
::::::
:::
::::{.my-r-code-container}
```{r}
#| label: fig-mapping-symbols-categories
#| fig-cap: "Densities of tornadoes in Oklahoma counties from 2016-2021 mapped as a choropleth with four categories with symbols."

ggplot2::ggplot(data = okcntrd) +
  ggplot2::geom_sf(ggplot2::aes(size = tcnt_c1)) +
  ggplot2::scale_size_discrete(name = "Tornados") +
  ggplot2::geom_sf(data = okcounty, fill = NA) +
  ggplot2::theme_void() +
  ggplot2::theme(legend.position = "bottom")
```


::::
:::::


::: {.callout-warning #wrn-size-not-for-discrete-vars}
###### Should I get rid of the warning?

In @fig-mapping-symbols-categories we got the warning "Using size for a discrete variable is not advised." There are [two different opinions](https://stackoverflow.com/questions/50211624/why-and-when-using-size-for-a-discrete-variable-is-not-advised) about this warning:

1. It is just "a design tip to be broken when necessary". "… this warning is paternalistic and preachey. It has no place in a code warning, it is a style recommendation."
2. "Using size to map a non-ordinal categorical variable may suggest an ordinal relationship to the viewer, when none was intended. I find that's quite serious as it could result in miscommunication".

One recommendation is to "convert the categories to numbers and then do the mapping. That avoids the warning and ensures that the mapping happens exactly the way you want it to."

We are in fact mapping a ordinal categorical variable. Therefore the warning should be ignored. But for the purpose of learning I will change to code so that there is no warning message generated.

:::

In the next two code chunks I am going to convert the categories to numbers (@cnj-05-generate-numbers-for-symbols-categories) and then I will do the mapping (@cnj-05-mapping-numbers-with-symbols-categories).

:::::{.my-r-code}
:::{.my-r-code-header}
:::::: {#cnj-05-generate-numbers-for-symbols-categories}
: Generating numbers for discrete classes
::::::
:::
::::{.my-r-code-container}
```{r}
#| label: generate-numbers-for-symbols-categories

okcntrd <- okcntrd |> 
  dplyr::mutate(tcnt_c2 = dplyr::case_when(
    tcnt_c1 == "[0,2]"   ~ 1,
    tcnt_c1 == "(2,5]"   ~ 2,
    tcnt_c1 == "(5,10]"  ~ 3,
    tcnt_c1 == "(10,16]" ~ 4,
    .default = NA
    )
  )


base::unique((ordered(okcntrd$tcnt_c2)))
base::class(okcntrd$tcnt_c2)

```


::::
:::::

The mapping now needs with `ggplot2::scale_size()` a different scale (instead of `ggplot2::scale_size_discrete()`). However, I also need to adapt the legend labels to recover the binned values.


:::::{.my-r-code}
:::{.my-r-code-header}
:::::: {#cnj-05-mapping-numbers-with-symbols-categories}
: Mapping numbers with symbol categories
::::::
:::
::::{.my-r-code-container}
```{r}
#| label: fig-mapping-symbols-categories2
#| fig-cap: "Densities of tornadoes in Oklahoma counties from 2016-2021 mapped as a choropleth with four categories with symbols. (Without warning)"

ggplot2::ggplot(data = okcntrd) +
  ggplot2::geom_sf(ggplot2::aes(size = tcnt_c2)) +
  ggplot2::scale_size(
    name = "Tornados",
    labels = base::levels(okcntrd$tcnt_c1)
    ) +
  ggplot2::geom_sf(data = okcounty, fill = NA) +
  ggplot2::theme_void() +
  ggplot2::theme(legend.position = "bottom")
```

::::
:::::

## Exporting Graphics Output {#sec-05-export-graphics}

Until now the generated maps and charts using {**ggplot2**} are output as HTML and [stored in the file for chapter 5 of my book notes](https://bookdown.org/pbaumgartner/gdswr-notes/05-vector-geospatial-data.html). 

> However, it is often necessary to export maps and other graphics to files and explicitly specify their dimensions and resolution. This is usually the case when generating graphics for publications that must meet specific size and formatting criteria.

The simplest way to do this is with the `ggplot2::ggsave()` function.

::: {.my-code-collection}
:::: {.my-code-collection-header}
::::: {.my-code-collection-icon}
:::::
:::::: {#exm-ID}
: Title for code collection
::::::

::::

::::{.my-code-collection-container}

::: {.panel-tabset}

###### PNG

:::::{.my-r-code}
:::{.my-r-code-header}
:::::: {#cnj-05-png-export}
: Output to a portable network graphics (PNG)
::::::
:::
::::{.my-r-code-container}

<center>**Run this code chunk manually to save the PNG graphics only once.**</center>

```{r}
#| label: png-export
#| eval: false

ggplot2::ggsave("data/Chapter5/tornado.png", 
       width = 6, 
       height = 4, 
       dpi = 300)
ggplot2::ggsave("data/Chapter5/tornado2.png", 
       width = 15, 
       height = 10, 
       units = "cm", 
       dpi = 100)
```

***

<center>(*For this R code chunk is no visible output available. However, a file was saved if you have run the code chunk manually.*)</center>

This example exports the most recent output of `ggplot2::ggplot()` to a portable network graphics (PNG) file called `tornado.png` with dimensions of 6 x 4 inches and a resolution of 300 pixels per inch. 

Other units besides inches can be used by specifying the units argument.

::::
:::::


###### TIFF

:::::{.my-r-code}
:::{.my-r-code-header}
:::::: {#cnj-05-TIFF-export}
: Output to a tagged image file format (TIFF)
::::::
:::
::::{.my-r-code-container}

<center>**Run this code chunk manually to save the TIFF graphics only once.**</center>

```{r}
#| label: TIFF-export
#| eval: false

ggplot2::ggsave("data/Chapter5/tornado.tiff", 
       width = 6, 
       height = 4, 
       dpi = 300, 
       compression = "lzw")
```

***

<center>(*For this R code chunk is no visible output available. However, a file was saved if you have run the code chunk manually.*)</center>

When saving a TIFF, a compression type can be specified.

::::
:::::

###### JPEG

:::::{.my-r-code}
:::{.my-r-code-header}
:::::: {#cnj-05-JPEG-export}
: Output to a Joint Photographic Experts Group file format (JPEG)
::::::
:::
::::{.my-r-code-container}

<center>**Run this code chunk manually to save the JPEG graphics only once.**</center>

```{r}
#| label: JPEG-export
#| eval: false

ggplot2::ggsave("tornado.jpeg", 
       width = 6, 
       height = 4, 
       dpi = 300, 
       quality = 90)
```

***

<center>(*For this R code chunk is no visible output available. However, a file was saved if you have run the code chunk manually.*)</center>

When saving a JPEG, a compression type can be specified.

::::
:::::

###### PDF

:::::{.my-r-code}
:::{.my-r-code-header}
:::::: {#cnj-05-PDF-export}
: Output to a Portable Document Format file format (PDF)
::::::
:::
::::{.my-r-code-container}

<center>**Run this code chunk manually to save the PDF graphics only once.**</center>

```{r}
#| label: PDF-export
#| eval: false

ggplot2::ggsave(
  "data/Chapter5/tornado.pdf", 
   width = 6, 
   height = 4
  )
```

***

<center>(*For this R code chunk is no visible output available. However, a file was saved if you have run the code chunk manually.*)</center>

When saving a PDF, the `dpi` argument is ignored.

::::
:::::

###### R Object

:::::{.my-r-code}
:::{.my-r-code-header}
:::::: {#cnj-05-R-object-export}
: Output to an R object
::::::
:::
::::{.my-r-code-container}

<center>**Run this code chunk manually to save the R objects only once.**</center>

```{r}
#| label: R-object-export
#| eval: false

choropleth <- ggplot2::ggplot(data = countymap) +
  ggplot2::geom_sf(ggplot2::aes(fill = tdens_c1)) +
  ggplot2::scale_fill_brewer(name = "Density",   
                    palette = "YlOrRd") +
  ggplot2::theme_void()

gradsymbol <- ggplot2::ggplot(data = okcntrd) +
  ggplot2::geom_sf(ggplot2::aes(size = tcnt_c2)) +
  ggplot2::scale_size(
    name = "Count",
    labels = base::levels(okcntrd$tcnt_c1)
    ) +
  ggplot2::geom_sf(data = okcounty, fill = NA) +
  ggplot2::theme_void()

ggplot2::ggsave("data/Chapter5/choropleth.tiff", 
       plot = choropleth,
       width = 6, 
       height = 4, 
       dpi = 300, 
       compression = "lzw")

ggplot2::ggsave("data/Chapter5/gradsymbol.tiff",
       plot = gradsymbol,
       width = 6, 
       height = 4, 
       dpi = 300, 
       compression = "lzw")
```

***

<center>(*For this R code chunk is no visible output available. However, a file was saved if you have run the code chunk manually.*)</center>

The output of `ggplot2::ggplot()` can also be saved as an R object that can be output to graphics files using `ggplot2::ggsave()`. The `plot` argument is used to specify the ggplot object to be saved.

::::
:::::

:::

::::
:::::

There are several options to combine saved graphs and maps into a composite figure:

- **Base R** with the `graphics::par()` function using `mfrow` or `mfcol` as parameters.
- **{**ggpubr**} package** using the `ggpubr::ggarrange()` function [@ggpubr].
- **{**patchwork**} package** using very flexible arrangements with plot arithmetic +, -, |, *, & [@patchwork].
- **{**cowplot**} package** with the `cowplot::plot_grid()` function [@cowplot].

I am going to demonstratge here just the last possibility as is this the example used in the book.

:::::{.my-r-code}
:::{.my-r-code-header}
:::::: {#cnj-05-save-combined-graphs}
: Save combined graphics with the {**cowplot**} package
::::::
:::
::::{.my-r-code-container}
```{r}
#| label: base-r-saved-combined-graphs

choropleth <- ggplot2::ggplot(data = countymap) +
  ggplot2::geom_sf(ggplot2::aes(fill = tdens_c1)) +
  ggplot2::scale_fill_brewer(name = "Density",   
                    palette = "YlOrRd") +
  ggplot2::theme_void()

gradsymbol <- ggplot2::ggplot(data = okcntrd) +
  ggplot2::geom_sf(ggplot2::aes(size = tcnt_c2)) +
  ggplot2::scale_size(
    name = "Count",
    labels = base::levels(okcntrd$tcnt_c1)
    ) +
  ggplot2::geom_sf(data = okcounty, fill = NA) +
  ggplot2::theme_void()

cowplot::plot_grid(choropleth, gradsymbol, 
          labels = c("\n\nA) Choropleth Map", 
                     "\n\nB) Graduated Symbols",
                     label_size = 12),
          ncol = 1, 
          hjust = 0, 
          label_x = 0, 
          align = "hv")
```


::::
:::::


The `cowplot::plot_grid()` function provides a variety of options for arranging figures in a regular grid. This basic example provides a label for each subplot and specifies additional arguments to plot the maps in a single column, justify the labels, move the labels besides each map, and align the maps horizontally and vertically, so they are exactly the same size.

## Resources {#sec-05-resources}

:::::{.my-resource}
:::{.my-resource-header}
:::::: {#lem-05-resources}
: Additional resources for working with the {**sf**} package
::::::
:::
::::{.my-resource-container}

- A variety of additional resources for working with the sf package can be found at https://r-spatial.github.io/sf/index.html. These include a link to the sf “cheatsheet” as well as a variety of articles, vignettes, and blog posts that provide additional examples of how to work with vector geospatial data using this package. 
  - A rather comrehensive overview about loading and saving  datasets of different formats is explained in the blog article [How to load and save vector data in R](https://r-spatial.org/r/2024/06/26/sf-load-save.html). I will follow the tutorial in this article in @sec-annex-a.
- The book Geocomputation with R by Robin Lovelace and others [@lovelace-2020] is also an excellent reference that encompasses {**sf**} as well as many other R packages for working with geospatial data. Look for their forthcoming second edition [@lovelace-2025], which is already [online](https://r.geocompx.org/) for the most up-to-date information on geospatial data processing with R.
- A more pressing knowledge for me is to learn how to get vector map data for creating maps. I am following here the tutorial by [@felixanalytix-2023] in @sec-annex-b.

::::
:::::