96-packages-used.qmd

# Packages used {#sec-annex-a}

```{r}
#| label: setup
#| results: hold
#| include: false

base::source(file = paste0(here::here(), "/R/helper.R"))
```

## autoplotly {#sec-autoplotly}

::::: my-package
::: my-package-header
Package Profile: {autoplotly}
:::

::: my-package-container

------------------------------------------------------------------------

<center>[Automatic Generation of Interactive Visualizations for Statistical Results](https://github.com/terrytangyuan/autoplotly) [@autoplotly; @ggfortify]</center>

------------------------------------------------------------------------

Functionalities to automatically generate interactive visualizations for statistical results supported by {**ggfortify**}, such as time series, PCA, clustering and survival analysis, with [plotly.js](https://plotly.com/) and {**ggplot2**} style. The generated visualizations can also be easily extended using {**ggplot2**} and {**plotly**} syntax while staying interactive.
:::
:::::

## BSDA {#sec-BSDA}

::::: my-package
::: my-package-header
Package Profile: BSDA
:::

::: my-package-container

------------------------------------------------------------------------

<center>[Basic Statistics and Data Analysis](https://alanarnholt.github.io/BSDA) [@BSDA]</center>

------------------------------------------------------------------------

Data sets for the book "Basic Statistics and Data Analysis" by Larry J. Kitchens. (BSDA) [@kitchens2002].
:::
:::::

## broom {#sec-broom}

:::::: my-package
::: my-package-header
Package Profile: broom
:::

:::: my-package-container
<center>[Convert Statistical Objects in Tidy Tibbles](https://broom.tidymodels.org/) [@broom]</center>

------------------------------------------------------------------------

::: {layout="[10, 30]" layout-valign="center"}
![](img/chap09/logoi/logo-broom-min.png){width="176"}

Summarizes key information about statistical objects in tidy tibbles. This makes it easy to report results, create plots and consistently work with large numbers of models at once.
:::

------------------------------------------------------------------------

{**broom**} provides three verbs to make it convenient to interact with model objects:

-   `tidy()` summarizes information about model components
-   `glance()` reports information about the entire model
-   `augment()` adds information about observations to a dataset

For a detailed introduction, please see [Introduction to broom](https://broom.tidymodels.org/articles/broom.html).

{**broom**} tidies 100+ models from popular modelling packages and almost all of the model objects in the stats package that comes with base R.

The vignette [Available methods](https://broom.tidymodels.org/articles/available-methods.html) lists the available methods.
::::
::::::

## car {#sec-car}

:::::: my-package
::: my-package-header
Package Profile: car
:::

:::: my-package-container
<center>[Companion to Applied Regression](https://www.john-fox.ca/Companion/index.html) [@car]</center>

------------------------------------------------------------------------

::: {layout="[10, 30]" layout-valign="center"}
![](img/chap06/logoi/logo-car-min.png){width="176"}

Functions to Accompany J. Fox and S. Weisberg, An R Companion to Applied Regression, Third Edition, Sage, 2019. [@fox2018]
:::

------------------------------------------------------------------------

An R Companion to Applied Regression is a broad introduction to the R statistical computing environment in the context of applied regression analysis. The book provides a step-by-step guide to using the free statistical software R, and emphasizes integrating statistical computing in R with the practice of data analysis. The R packages car and effects, written to facilitate the application and interpretation of regression analysis, are extensively covered in the book.
::::
::::::

## colorblindcheck {#sec-colorblindcheck}

::::: my-package
::: my-package-header
Package Profile: colorblindcheck
:::

::: my-package-container

------------------------------------------------------------------------

<center>[Check Color Palettes for Problems with Color Vision Deficiency (CVD)](https://jakubnowosad.com/colorblindcheck/) [@colorblindcheck])</center>

------------------------------------------------------------------------

{*There is no hexagon sticker available for {**colorblindcheck**}.*}

Compare color palettes with simulations of color vision deficiencies - deuteranopia, protanopia, and tritanopia. It includes calculation of distances between colors, and creating summaries of differences between a color palette and simulations of color vision deficiencies.

Deciding if a color palette is a colorblind friendly is a hard task. This cannot be done in an entirely automatic fashion, as the decision needs to be confirmed by visual judgments. The goal of {**colorblindcheck**} is to provide tools to decide if the selected color palette is colorblind friendly, including:

-   `palette_dist()` - Calculation of the distances between the colors in the input palette and between the colors in simulations of the color vision deficiencies: deuteranopia, protanopia, and tritanopia.
-   `palette_plot()` - Plotting of the original input palette and simulations of color vision deficiencies: deuteranopia, protanopia, and tritanopia.
-   `palette_check()` - Creating summary statistics comparing the original input palette and simulations of color vision deficiencies: deuteranopia, protanopia, and tritanopia.
:::
:::::

## colorblindr {#sec-colorblindr}

::::: my-package
::: my-package-header
Package Profile: colorblindr
:::

::: my-package-container

------------------------------------------------------------------------

<center>[Simulate colorblindness in R figures](https://github.com/clauswilke/colorblindr) [@colorblindr]</center>

------------------------------------------------------------------------

(*There is no hexagon sticker available for {**colorblindr**}.*)

Provides a variety of functions that are helpful to simulate the effects of colorblindness in R figures. Complete figures can be modified to simulate the effects of various types of colorblindness. The resulting figures are standard grid objects and can be further manipulated or outputted as usual.
:::
:::::

## colorspace {#sec-colorspace}

:::::::: my-package
::: my-package-header
Package Profile: colorspace
:::

:::::: my-package-container

------------------------------------------------------------------------

<center>[A Toolbox for Manipulating and Assessing Colors and Palettes](https://main_package_URL) [colorspace](#sec-colorspace)</center>

------------------------------------------------------------------------

(*There is no hexagon sticker available for {**colorspace**}.*)

The colorspace package provides a broad toolbox for selecting individual colors or color palettes, manipulating these colors, and employing them in various kinds of visualizations.

At the core of the package there are various utilities for computing with color spaces (as the name of the package conveys). Thus, the package helps to map various three-dimensional representations of color to each other. A particularly important mapping is the one from the perceptually-based and device-independent color model HCL (Hue-Chroma-Luminance) to standard Red-Green-Blue (sRGB) which is the basis for color specifications in many systems based on the corresponding hex codes (e.g., in HTML but also in R). For completeness further standard color models are included as well in the package: polarLUV() (= HCL), LUV(), polarLAB(), LAB(), XYZ(), RGB(), sRGB(), HLS(), HSV().

The HCL space (= polar coordinates in CIELUV) is particularly useful for specifying individual colors and color palettes as its three axes match those of the human visual system very well: Hue (= type of color, dominant wavelength), chroma (= colorfulness), luminance (= brightness).

There is extensive documentation available. See also the website on [HCL Color Space](https://hclwizard.org/):

> The hclwizard provides tools for manipulating and assessing colors and palettes based on the underlying colorspace software (available in R and Python). It leverages the HCL color space: a color model that is based on human color perception and thus makes it easy to choose good color palettes by varying three color properties: Hue (= type of color, dominant wavelength) - Chroma (= colorfulness) - Luminance (= brightness). As shown in the color swatches below each property can be varied while keeping the other two properties fixed.

::::: my-remark
::: my-remark-header
{colorspace}: My personal evaluation
:::

::: my-remark-container
This toolbox package is very important: All of the other color palette related package uses {**colorspace**} as a bases for their functionality.
:::
:::::
::::::
::::::::

## cowplot {#sec-cowplot}

:::::: my-package
::: my-package-header
Package Profile: cowplot
:::

:::: my-package-container
<center>[Streamlined Plot Theme and Plot Annotations for {**ggplot2**}](https://wilkelab.org/cowplot/) [@cowplot]</center>

------------------------------------------------------------------------

::: {layout="[10, 30]" layout-valign="center"}
![](img/chap03/logoi/logo-cowplot-min.png){width="176"}

The {**cowplot**} package provides various features that help with creating publication-quality figures, such as a set of themes, functions to align plots and arrange them into complex compound figures, and functions that make it easy to annotate plots and or mix plots with images. The package was originally written for internal use in the Wilke lab, hence the name (Claus O. Wilke’s plot package). It has also been used extensively in the book [Fundamentals of Data Visualization](https://www.amazon.com/gp/product/1492031089).
:::

------------------------------------------------------------------------

There are several packages that can be used to align plots. The most widely used ones beside {**cowplot**} are {**egg**} and {**patchwork**} (see @sec-patchwork). All these packages use slightly different approaches to plot alignment, and the respective approaches have different strengths and weaknesses. If you cannot achieve your desired result with one of these packages try another one.

Most importantly, while {**egg**} and {**patchwork**} align and arrange plots at the same time, {**cowplot**} aligns plots independently of how they are arranged. This makes it possible to align plots and then reproduce them separately, or even overlay them on top of each other.

The {**cowplot**} package now provides a set of complementary themes with different features. I now believe that there isn’t one single theme that works for all figures, and therefore I recommend that you always explicitly set a theme for every plot you make.
::::
::::::

## cranlogs {#sec-cranlogs}

:::::: my-package
::: my-package-header
Package Profile: cranlogs
:::

:::: my-package-container
<center>[Download Logs from RStudio CRAN Mirror](https://r.hub.github.io/cranlogs) [@cranlogs]</center>

------------------------------------------------------------------------

::: {layout="[10, 30]" layout-valign="center"}
![](img/chap05/logoi/logo-cranlogs-min.png){width="176"}

`r glossary("APIx", "API")` to the database of `r glossary("CRAN")` package downloads from the RStudio CRAN mirror. The database itself is at <http://cranlogs.r-pkg.org>, see <https://github.com/r-hub/cranlogs.app> for the raw API.
:::

------------------------------------------------------------------------

RStudio publishes the download logs from their CRAN package mirror daily at <http://cran-logs.rstudio.com>.

This R package queries a web API maintained by R-hub that contains the daily download numbers for each package.

The RStudio CRAN mirror is not the only CRAN mirror, but it’s a popular one: it’s the default choice for RStudio users. The actual number of downloads over all CRAN mirrors is unknown.
::::
::::::

## crosstable {#sec-crosstable}

::::::::: my-package
::: my-package-header
Package Profile: crosstable
:::

::::::: my-package-container
<center>[Crosstables for Descriptive Analysis](https://danchaltiel.github.io/crosstable) [@crosstable]</center>

------------------------------------------------------------------------

::: {layout="[10, 30]" layout-valign="center"}
![](img/chap05/logoi/logo-crosstable-min.png){width="176"}

Crosstable is a package centered on a single function, crosstable, which easily computes descriptive statistics on datasets. It can use the {**tidyverse**} syntax and is interfaced with the package {**officer**} to create automatized reports.
:::

------------------------------------------------------------------------

Create descriptive tables for continuous and categorical variables. Apply summary statistics and counting function, with or without a grouping variable, and create beautiful reports using {**rmarkdown**} or {**officer**}. You can also compute effect sizes and statistical tests if needed.

::::: my-remark
::: my-remark-header
{crosstable}: Personal evaluation
:::

::: my-remark-container
I believe that the main usage for this package is to prepare ready-to-print tables. Similar like {**gtsummary**} (see @sec-gtsummary) it provides some descriptive statistics with many display options. But I got the impression that analysis of data is not the main usage of these packages.

For instance you could use `crosstable::display_test(chisq.test(x, y))` to get as result a string, for instance: "p value: \<0.0001 \n(Pearson's Chi-squared test)". This is nice to include for a table, but for the analysis one would also need the values of the different cells.
:::
:::::
:::::::
:::::::::

## curl {#sec-curl}

::::: my-package
::: my-package-header
Package Profile: curl
:::

::: my-package-container

------------------------------------------------------------------------

<center>[A Modern and Flexible Weblient for R](https://cran.r-project.org/web/packages/curl/vignettes/intro.html) [@curl]</center>

------------------------------------------------------------------------

(*There is no hexagon sticker available for {**curl**}.*)

The `curl()` and `curl_download(`) functions provide highly configurable drop-in replacements for base `url()` and `download.file()` with better performance, support for encryption (https, ftps), gzip compression, authentication, and other 'libcurl' goodies.

The core of the package implements a framework for performing fully customized requests where data can be processed either in memory, on disk, or streaming via the callback or connection interfaces. Some knowledge of 'libcurl' is recommended; for a more-user-friendly web client see the 'httr' package which builds on this package with http specific tools and logic.
:::
:::::

## data.table {#sec-data-table}

::::::::: my-package
::: my-package-header
Package Profile: data.table
:::

::::::: my-package-container
<center>[Extension of `data.frame`](https://rdatatable.gitlab.io/data.table/) [@datatable]</center>

------------------------------------------------------------------------

::: {layout="[10, 30]" layout-valign="center"}
![](img/chap01/logoi/logo-data.table-min.png){width="176"}

{**data.table**} provides a high-performance version of base R’s data.frame with syntax and feature enhancements for ease of use, convenience and programming speed.\
\
Fast aggregation of large data (e.g. 100GB in RAM), fast ordered joins, fast add/modify/delete of columns by group using no copies at all, list columns, friendly and fast character-separated-value read/write. Offers a natural and flexible syntax, for faster development.
:::

------------------------------------------------------------------------

**Features**

-   fast and friendly delimited file reader: `data.table::fread()`, see also convenience features for small data
-   fast and feature rich delimited file writer: `data.table::fwrite()`
-   low-level parallelism: many common operations are internally parallelized to use multiple CPU threads
-   fast and scalable aggregations; e.g. 100GB in RAM (see benchmarks on up to two billion rows)
-   fast and feature rich joins: ordered joins (e.g. rolling forwards, backwards, nearest and limited staleness), overlapping range joins (similar to IRanges::findOverlaps), non-equi joins (i.e. joins using operators \>, \>=, \<, \<=), aggregate on join (by=.EACHI), update on join
-   fast add/update/delete columns by reference by group using no copies at all
-   fast and feature rich reshaping data: `data.table::dcast()` (pivot/wider/spread) and `data.table::melt()` (unpivot/longer/gather)
-   any R function from any R package can be used in queries not just the subset of functions made available by a database backend, also columns of type list are supported
-   has no dependencies at all other than base R itself, for simpler production/maintenance
-   the R dependency is as old as possible for as long as possible, dated April 2014, and we continuously test against that version; e.g. v1.11.0 released on 5 May 2018 bumped the dependency up from 5 year old R 3.0.0 to 4 year old R 3.1.0

::::: my-remark
::: my-remark-header
{**data.table**}: Personal evaluation
:::

::: my-remark-container
I believe the most important application of {**data.table**} is working with huge amount of data (several GB). In the book SwR it is used in this first chapter with the `data.table::fread()` function. I have used there the `readr::read_csv()` as part of the {**tidyverse**} collection, because the dataset is very small (29 kB).

With {**DT**} there is a similar package that seems important. It is a wrapper of the JavaScript library 'DataTables' (See @sec-DT). I was using already {**DT**} to display interactive tables on websites, but at time I didn't understand completely the difference between {**data.table**} and {**DT**}. As far as I understand it now the differences are:

-   {**data.table**}: A package for efficient data manipulation and analysis, focusing on speed, memory efficiency, and flexibility. It provides a powerful data structure for handling large datasets.
-   {**DT**} (datatable): A package for rendering R data frames as interactive HTML tables, focusing on visualization and user interaction. It provides a simple way to create web-based tables with filtering, sorting, and editing capabilities.
:::
:::::
:::::::
:::::::::

## datawizard {#sec-datawizard}

:::::: my-package
::: my-package-header
Package Profile: datawizard
:::

:::: my-package-container
<center>[Easy Data Wrangling and Statistical Transformations](https://easystats.github.io/datawizard/) [@datawizard]</center>

------------------------------------------------------------------------

::: {layout="[10, 30]" layout-valign="center"}
![](img/chap06/logoi/logo-datawizard-min.png){width="176"}

{**datawizard**} is a lightweight package to easily manipulate, clean, transform, and prepare your data for analysis. It is part of the {**easystats**} ecosystem, a suite of R packages to deal with your entire statistical analysis, from cleaning the data to reporting the results.
:::

------------------------------------------------------------------------

{**datawizard**} covers two aspects of data preparation:

-   **Data manipulation**: datawizard offers a very similar set of functions to that of the tidyverse packages, such as a {**dplyr**} and {**tidyr**}, to select, filter and reshape data, with a few key differences.
    1)  All data manipulation functions start with the prefix `data_*` (which makes them easy to identify).
    2)  Although most functions can be used exactly as their tidyverse equivalents, they are also string-friendly (which makes them easy to program with and use inside functions).
    3)  Finally, datawizard is super lightweight (no dependencies, similar to {**poorman**}), which makes it awesome for developers to use in their packages.
-   Statistical transformations: {**datawizard**} also has powerful functions to easily apply common data transformations, including standardization, normalization, rescaling, rank-transformation, scale reversing, recoding, binning, etc.
::::
::::::

## descr {#sec-descr}

::::: my-package
::: my-package-header
Package Profile: descr
:::

::: my-package-container

------------------------------------------------------------------------

<center>[Descriptive Statistics](https://github.com/jalvesaq/descr) [@descr]</center>

------------------------------------------------------------------------

(*There is no hexagon sticker available for {**descr**}.*)

Weighted frequency and contingency tables of categorical variables and of the comparison of the mean value of a numerical variable by the levels of a factor, and methods to produce xtable objects of the tables and to plot them. There are also functions to facilitate the character encoding conversion of objects, to quickly convert fixed width files into csv ones, and to export a data.frame to a text file with the necessary R and SPSS codes to reread the data. [@descr]
:::
:::::

## DescTools {#sec-DescTools}

::::: my-package
::: my-package-header
Package Profile: DescTool
:::

::: my-package-container

------------------------------------------------------------------------

<center>[Tools for Descriptive Statistics](https://andrisignorell.github.io/DescTools/) [@DescTools]</center>

------------------------------------------------------------------------

(*There is no hexagon sticker available for {**DescTools**}.*)

A collection of miscellaneous basic statistic functions and convenience wrappers for efficiently describing data. The author's intention was to create a toolbox, which facilitates the (notoriously time consuming) first descriptive tasks in data analysis, consisting of calculating descriptive statistics, drawing graphical summaries and reporting the results.

The package contains furthermore functions to produce documents using MS Word (or PowerPoint) and functions to import data from Excel. Many of the included functions can be found scattered in other packages and other sources written partly by Titans of R. The reason for collecting them here, was primarily to have them consolidated in ONE instead of dozens of packages (which themselves might depend on other packages which are not needed at all), and to provide a common and consistent interface as far as function and arguments naming, NA handling, recycling rules etc. are concerned. Google style guides were used as naming rules (in absence of convincing alternatives). The 'BigCamelCase' style was consequently applied to functions borrowed from contributed R packages as well.
:::
:::::

## dfidx {#sec-dfidx}

::::: my-package
::: my-package-header
Package Profile: dfidx
:::

::: my-package-container

------------------------------------------------------------------------

<center>[Indexed Data Frames](https://cran.r-project.org/package=dfidx) [@dfidx]</center>

------------------------------------------------------------------------

(*There is no hexagon sticker available for {**dfidx**}.*)

Provides extended data frames, with a special data frame column which contains two indexes, with potentially a nesting structure.
:::
:::::

## dichromat {#sec-dichromat}

::::: my-package
::: my-package-header
Package Profile: dichromat
:::

::: my-package-container

------------------------------------------------------------------------

<center>[Color Schemes for Dichromats](https://cran.r-project.org/package=dichromat) [@dichromat]</center>

------------------------------------------------------------------------

(*There is no hexagon sticker available for {**dichromat**}.*)

Collapse red-green or green-blue distinctions to simulate the effects of different types of color-blindness.
:::
:::::

## dplyr {#sec-dplyr}

:::::: my-package
::: my-package-header
Package Profile: dplyr
:::

:::: my-package-container
<center>[A Grammar of Data Manipulation](https://dplyr.tidyverse.org/) [@dplyr]</center>

------------------------------------------------------------------------

::: {layout="[10, 30]" layout-valign="center"}
![](img/chap01/logoi/logo-dplyr-min.png){width="176"}

{**dplyr**} is a grammar of data manipulation, providing a consistent set of verbs that help you solve the most common data manipulation challenges: - mutate() adds new variables that are functions of existing variables - select() picks variables based on their names. - filter() picks cases based on their values. - summarise() reduces multiple values down to a single summary. - arrange() changes the ordering of the rows.
:::

------------------------------------------------------------------------

These all combine naturally with `group_by()` which allows you to perform any operation “by group”. You can learn more about them in [vignette("dplyr")](https://dplyr.tidyverse.org/articles/dplyr.html). As well as these single-table verbs, dplyr also provides a variety of two-table verbs, which you can learn about in [vignette("two-table")](https://dplyr.tidyverse.org/articles/two-table.html). [@dplyr]
::::
::::::

## DT {#sec-DT}

::::: my-package
::: my-package-header
Package Profile: DT
:::

::: my-package-container

------------------------------------------------------------------------

<center>[A Wrapper of the JavaScript Library 'DataTables'](https://rstudio.github.io/DT/) [@DT]</center>

------------------------------------------------------------------------

(*There is no hexagon sticker available for {**DT**}.*)

Data objects in R can be rendered as HTML tables using the JavaScript library [DataTables](https://datatables.net/) (typically via {**R Markdown**} or {**Shiny**}). The 'DataTables' library has been included in this R package. The package name {**DT**} is an abbreviation of 'DataTables'.
:::
:::::

## dunn.test {#sec-dunn.test}

::::: my-package
::: my-package-header
Package Profile: dunn.test
:::

::: my-package-container

------------------------------------------------------------------------

<center>[Dunn's Test of Multiple Comparisons Using Rank Sums](https://cran.r-project.org/package=dunn.test) [@dunn.test]</center>

------------------------------------------------------------------------

(*There is no hexagon sticker available for {**dunn.test**}.*)

Computes Dunn's test [@dunn1964] for stochastic dominance and reports the results among multiple pairwise comparisons after a Kruskal-Wallis test for stochastic dominance among k groups [@kruskal1952]. The interpretation of stochastic dominance requires an assumption that the CDF of one group does not cross the CDF of the other.

{**dunn.test**} makes k(k-1)/2 multiple pairwise comparisons based on Dunn's z-test-statistic approximations to the actual rank statistics. The null hypothesis for each pairwise comparison is that the probability of observing a randomly selected value from the first group that is larger than a randomly selected value from the second group equals one half; this null hypothesis corresponds to that of the `r glossary("Mann-Whitney", "Wilcoxon-Mann-Whitney rank-sum test")`. Like the rank-sum test, if the data can be assumed to be continuous, and the distributions are assumed identical except for a difference in location, Dunn's test may be understood as a test for median difference. {**dunn.test**} accounts for tied ranks.
:::
:::::

## e1071 {#sec-e1071}

::::: my-package
::: my-package-header
Package Profile: e1071
:::

::: my-package-container

------------------------------------------------------------------------

<center>[Misc. functions](https://cran.r-project.org/web/packages/e1071/index.html) [@e1071]</center>

------------------------------------------------------------------------

(*There is no hexagon sticker available for {**e1071**}.*)

Functions for latent class analysis, short time Fourier transform, fuzzy clustering, support vector machines, shortest path computation, bagged clustering, naive Bayes classifier, generalized k-nearest neighbor ...
:::
:::::

## effectsize {#sec-effectsize}

:::::: my-package
::: my-package-header
Package Profile: effectsize
:::

:::: my-package-container
<center>[Indices of Effect Size](https://easystats.github.io/effectsize/) [@effectsize]</center>

------------------------------------------------------------------------

::: {layout="[10, 30]" layout-valign="center"}
![](img/chap06/logoi/logo-effectsize-min.png){width="176"}

The goal of this package is to provide utilities to work with indices of effect size and standardized parameters, allowing computation and conversion of indices such as Cohen’s d, r, odds-ratios, etc.
:::

------------------------------------------------------------------------

Provide utilities to work with indices of effect size for a wide variety of models and hypothesis tests (see list of supported models using the function 'insight::supported_models()'), allowing computation of and conversion between indices such as Cohen's d, r, odds, etc.
::::
::::::

## fmsb {#sec-fmsb}

::::: my-package
::: my-package-header
Package Profile: fmsb
:::

::: my-package-container

------------------------------------------------------------------------

<center>Functions for Medical Statistics Book with some Demographic Data\](https://cran.r-project.org/package=fmsb) [@fmsb]</center>

------------------------------------------------------------------------

(*There is no hexagon sticker available for {**fmsb**}.*)

Several utility functions for the book entitled "Practices of Medical and Health Data Analysis using R" (Pearson Education Japan, 2007) with Japanese demographic data and some demographic analysis related functions.
:::
:::::

## forcats {#sec-forcats}

:::::: my-package
::: my-package-header
Package Profile: forcats
:::

:::: my-package-container
<center>[Tools for Working with Categorical Variables (Factors)](https://forcats.tidyverse.org/) [@forcats]</center>

------------------------------------------------------------------------

::: {layout="[10, 30]" layout-valign="center"}
![](img/chap01/logoi/logo-forcats-min.png){width="176"}

{**forcats**} provide a suite of useful tools that solve common problems with factors. "Forcats" is an anagram of "factors" and part of the {**tidyverse**} suite of packages.
:::

(1) reordering factor levels
    -   moving specified levels to front,
    -   ordering by first appearance,
    -   reversing, and
    -   randomly shuffling
(2) tools for modifying factor levels
    -   collapsing rare levels into other,
    -   'anonymizing', and
    -   manually 'recoding'
::::
::::::

## GGally {#sec-GGally}

::::: my-package
::: my-package-header
Package Profile: GGally
:::

::: my-package-container

------------------------------------------------------------------------

<center>[Extension to {**ggplot2**}](https://ggobi.github.io/ggally/) [@GGally]</center>

------------------------------------------------------------------------

(*There is no hexagon logo for {**GGally**} available*)

The R package {**ggplot2**} is a plotting system based on the grammar of graphics. {**GGally**} extends {**ggplot2**} by adding several functions to reduce the complexity of combining geometric objects with transformed data. Some of these functions include

-   a pairwise plot matrix,
-   a two group pairwise plot matrix,
-   a parallel coordinates plot,
-   a survival plot,
-   and several functions to plot networks.
:::
:::::

## ggfortify {#sec-ggfortify}

::::: my-package
::: my-package-header
Package Profile: ggfortify
:::

::: my-package-container

------------------------------------------------------------------------

<center>[Data Visualization Tools for Statistical Analysis Results](https://github.com/sinhrks/ggfortify) [@ggfortify]</center>

------------------------------------------------------------------------

(*There is no hexagon sticker available for {**ggfortify**}.*)

Unified plotting tools for statistics commonly used, such as GLM, time series, PCA families, clustering and survival analysis. The package offers a single plotting interface for these analysis results and plots in a unified style using {**ggplot2**}.

This package offers `fortify()` and `autoplot()` functions to allow automatic {**ggplot2**} to visualize statistical result of popular R packages. Check out our R Journal paper for more details on the overall architecture design and a gallery of visualizations created with this package. Also check out autoplotly package that could automatically generate interactive visualizations with plotly.js style based on ggfortify. The generated visualizations can also be easily extended using ggplot2 syntax while staying interactive.
:::
:::::

## ggmosaic {#sec-ggmosaic}

:::::: my-package
::: my-package-header
Package Profile: ggmosaic
:::

:::: my-package-container
<center>[Mosaic Plots in the {**ggplot2**} Framework](https://haleyjeppson.github.io/ggmosaic/) [@ggmosaic]</center>

------------------------------------------------------------------------

::: {layout="[10, 30]" layout-valign="center"}
![](img/chap03/logoi/logo-ggmosaic-min.png){width="176"}

{**ggmosaic**} is designed to create visualizations of categorical data and is capable of producing bar charts, stacked bar charts, mosaic plots, and double decker plots and therefore offers a wide range of potential plots.
:::

------------------------------------------------------------------------

Furthermore, {**ggmosaic**} allows various features to be customized:

-   the order of the variables,
-   the formula setup of the plot,
-   faceting,
-   the type of partition, and
-   the space between the categories.
::::
::::::

## ggokabeito {#sec-ggokabeito}

::::: my-package
::: my-package-header
Package Profile: ggokabeito
:::

::: my-package-container

------------------------------------------------------------------------

<center>['Okabe-Ito' Scales for {**ggplot2**} and {**ggraph**}](https://malcolmbarrett.github.io/ggokabeito/index.html) [@ggokabeito]</center>

------------------------------------------------------------------------

(*There is no hexagon sticker available for {**ggokabeito**}.*)

Discrete scales for the colorblind-friendly `Okabe-Ito` palette, including 'color', 'fill', and 'edge_colour'. {**ggokabeito**} provides {**ggplot2**} and {**ggraph**} scales to easily use the discrete, colorblind-friendly ‘Okabe-Ito’ palette in your data visualizations.

Currently, {**ggokabeito**} provides the following scales:

-   `scale_color_okabe_ito(`)/`scale_colour_okabe_ito()`
-   `scale_fill_okabe_ito()`
-   `scale_edge_color_okabe_ito()`/`scale_edge_colour_okabe_ito()`
:::
:::::

## ggplot2 {#sec-ggplot2}

:::::: my-package
::: my-package-header
Package Profile: ggplot2
:::

:::: my-package-container
<center>[Create Elegant Data Visualisations Using the Grammar of Graphics](https://ggplot2.tidyverse.org/) [@ggplot2]</center>

------------------------------------------------------------------------

::: {layout="[10, 30]" layout-valign="center"}
![](img/chap01/logoi/logo-ggplot2-min.png){width="176"}

{**ggplot2**} is a system for declaratively creating graphics, based on [The Grammar of Graphics](https://link.springer.com/book/10.1007/0-387-28695-0). You provide the data, tell {**ggplot2**} how to map variables to aesthetics, what graphical primitives to use, and it takes care of the details. [@ggplot2]
:::

------------------------------------------------------------------------

It’s hard to succinctly describe how {**ggplot2**} works because it embodies a deep philosophy of visualization. However, in most cases you start with `ggplot()`, supply a dataset and aesthetic mapping (with `aes()`). You then add on layers (like `geom_point()` or `geom_histogram()`), scales (like `scale_colour_brewer()`), faceting specifications (like `facet_wrap()`) and coordinate systems (like `coord_flip()`).
::::
::::::

## gplots {#sec-gplots}

::::: my-package
::: my-package-header
Package Profile: gplots
:::

::: my-package-container

------------------------------------------------------------------------

<center>[Various R Programming Tools for Plotting Data](https://github.com/talgalili/gplots) [@gplots]</center>

------------------------------------------------------------------------

(*There is no hexagon sticker available for {**gplots**}.*)

Various R programming tools for plotting data, including:

-   calculating and plotting locally smoothed summary function as ('bandplot', 'wapply'),
-   enhanced versions of standard plots ('barplot2', 'boxplot2', 'heatmap.2', 'smartlegend'),
-   manipulating colors ('col2hex', 'colorpanel', 'redgreen', 'greenred', 'bluered', 'redblue', 'rich.colors'),
-   calculating and plotting two-dimensional data summaries ('ci2d', 'hist2d'),
-   enhanced regression diagnostic plots ('lmplot2', 'residplot'),
-   formula-enabled interface to 'stats::lowess' function ('lowess'),
-   displaying textual data in plots ('textplot', 'sinkplot'),
-   plotting a matrix where each cell contains a dot whose size reflects the relative magnitude of the elements ('balloonplot'),
-   plotting "Venn" diagrams ('venn'),
-   displaying Open-Office style plots ('ooplot'),
-   plotting multiple data on same region, with separate axes ('overplot'),
-   plotting means and confidence intervals ('plotCI', 'plotmeans'),
-   spacing points in an x-y plot so they don't overlap ('space').
:::
:::::

## gridExtra {#sec-gridExtra}

::::: my-package
::: my-package-header
Package Profile: gridExtra
:::

::: my-package-container

------------------------------------------------------------------------

<center>[Miscellaneous Functions for "Grid" Graphics](https://cran.r-project.org/package=gridExtra) [@gridExtra]</center>

------------------------------------------------------------------------

(*There is no hexagon sticker available for {**gridExtra**}.*)

Provides a number of user-level functions to work with "grid" graphics, notably to arrange multiple grid-based plots on a page, and draw tables.

The {**grid**) package (= part of the R system library) provides low-level functions to create graphical objects (`grobs`), and position them on a page in specific viewports. The {**gtable**} package introduced a higher-level layout scheme, arguably more amenable to user-level interaction. With the `gridExtra::arrangeGrob()` / `gridExtra::grid.arrange()` pair of functions, {**gridExtra**} builds upon {**gtable**} to arrange multiple `grobs` on a page.
:::
:::::

## ggrepel {#sec-ggrepel}

:::::: my-package
::: my-package-header
Package Profile: ggrepel
:::

:::: my-package-container
<center>[Automatically Position Non-Overlapping Text Labels with 'ggplot2'](https://ggrepel.slowkow.com/) [@ggrepel]</center>

------------------------------------------------------------------------

::: {layout="[10, 30]" layout-valign="center"}
![](img/chap03/logoi/logo-ggrepel-min.png){width="176"}

Provides text and label geoms for 'ggplot2' that help to avoid overlapping text labels. Labels repel away from each other and away from the data points.
:::

------------------------------------------------------------------------

{**ggrepel**} provides two geoms for {**ggplot2**} to repel overlapping text labels:

-   `ggrepel::geom_text_repel()`
-   `ggrepel::geom_label_repel()`
::::
::::::

## ggtext {#sec-ggtext}

::::: my-package
::: my-package-header
Package Profile: ggtext
:::

::: my-package-container

------------------------------------------------------------------------

<center>[Improved Text Rendering Support for 'ggplot2'](https://wilkelab.org/ggtext/) [@ggtext]</center>

------------------------------------------------------------------------

(*There is no hexagon sticker available for {**ggtext**}.*)

The ggtext package provides simple Markdown and HTML rendering for {**ggplot2.**} Under the hood, the package uses the {**gridtext**} package for the actual rendering, and consequently it is limited to the [feature set provided by gridtext](https://wilkelab.org/gridtext/).

Support is provided for Markdown both in theme elements (plot titles, subtitles, captions, axis labels, legends, etc.) and in geoms (similar to `ggplot2::geom_text()`). In both cases, there are two alternatives, one for creating simple text labels and one for creating text boxes with word wrapping.
:::
:::::

## glue {#sec-glue}

:::::: my-package
::: my-package-header
Package Profile: glue
:::

:::: my-package-container
<center>[Interpreted String Literals](https://glue.tidyverse.org/) [@glue]</center>

------------------------------------------------------------------------

::: {layout="[10, 30]" layout-valign="center"}
![](img/chap02/logoi/logo-glue-min.png){width="176"}

An implementation of interpreted string literals, inspired by Python's Literal String Interpolation

------------------------------------------------------------------------

Glue offers interpreted string literals that are small, fast, and dependency-free. Glue does this by embedding R expressions in curly braces which are then evaluated and inserted into the argument string.
:::
::::
::::::

## gssr {#sec-gssr}

::::: my-package
::: my-package-header
Package Profile: gssr
:::

::: my-package-container

------------------------------------------------------------------------

<center>[US General Social Survey (GSS) Data for R](https://kjhealy.github.io/gssr/) [@gssr]</center>

------------------------------------------------------------------------

(*There is no hexagon sticker available for {**gssr**}.*)

[GSSR Package](https://kjhealy.github.io/gssr/): The General Social Survey Cumulative Data (1972-2022) and Panel Data files packaged for easy use in R. {**gssr**} is a data package, developed and maintained by [Kieran Healy](https://kieranhealy.org/), the author of [Data Visualization](https://kieranhealy.org/publications/dataviz/). The package bundles several datasets into a convenient format. Because of its large size {**gssr**} is not hosted on CRAN but as a [GitHub repository](https://github.com/kjhealy/gssr/).

Instead of browsing and examining the complex dataset with the [GSS Data Explorer](https://gssdataexplorer.norc.org/) or [download datasets directly](https://gss.norc.org/Get-The-Data) from the The National Opinion Research Center ([NORC](http://norc.org/)) you can now just work inside R. The current package 0.4 (see: [gssr Update](https://kieranhealy.org/blog/archives/2023/12/02/gssr-update/)) provides the GSS Cumulative Data File (1972-2022), three GSS Three Wave Panel Data Files (for panels beginning in 2006, 2008, and 2010, respectively), and the 2020 panel file.

Version 0.40 also integrates survey code book information about variables directly into R’s help system, allowing them to be accessed via the help browser or from the console with ?, as if they were functions or other documented objects.
:::
:::::

## gt {#sec-gt}

:::::: my-package
::: my-package-header
Package Profile: gt
:::

:::: my-package-container
<center>[Easily Create Presentation-Ready Display Tables](https://gt.rstudio.com) [@gt]</center>

------------------------------------------------------------------------

::: {layout="[10, 30]" layout-valign="center"}
![](img/chap02/logoi/logo-gt-min.png){width="176"}

With the {**gt**} package, anyone can make wonderful-looking tables using the R programming language. The gt philosophy: we can construct a wide variety of useful tables with a cohesive set of table parts. These include the table header, the stub, the column labels and spanner column labels, the table body, and the table footer.
:::

------------------------------------------------------------------------

It all begins with table data (be it a tibble or a data frame). You then decide how to compose your {**gt**} table with the elements and formatting you need for the task at hand. Finally, the table is rendered by printing it at the console, including it in an R Markdown document, or exporting to a file using `gtsave()`. Currently, {**gt**} supports the HTML, LaTeX, and RTF output formats.
::::
::::::

## gtsummary {#sec-gtsummary}

:::::: my-package
::: my-package-header
Package Profile: gtsummary
:::

:::: my-package-container
<center>[Presentation-Ready Data Summary and Analytic Result Tables](https://www.danieldsjoberg.com/gtsummary/) [@gtsummary]</center>

------------------------------------------------------------------------

::: {layout="[10, 30]" layout-valign="center"}
![](img/chap01/logoi/logo-gtsummary-min.png){width="176"}

Creates presentation-ready tables summarizing data sets, regression models, and more. The code to create the tables is concise and highly customizable. Data frames can be summarized with any function, e.g. mean(), median(), even user-written functions. Regression models are summarized and include the reference rows for categorical variables. Common regression models, such as logistic regression and Cox proportional hazards regression, are automatically identified and the tables are pre-filled with appropriate column headers.
:::

------------------------------------------------------------------------

-   Summarize data frames or tibbles easily in R. Perfect for creating a `r glossary("Table 1")`.
-   Summarize regression models in R and include reference rows for categorical variables.
-   Customize {**gtsummary**} tables using a growing list of formatting/styling functions.
-   Report statistics inline from summary tables and regression summary tables in R markdown. Make your reports completely reproducible!

By leveraging {**broom**}, {**gt**}, and {**labelled**} packages, {**gtsummary**} creates beautifully formatted, ready-to-share summary and result tables in a single line of R code!
::::
::::::

## haven {#sec-haven}

::::::::: my-package
::: my-package-header
Package Profile: haven
:::

::::::: my-package-container
<center>[Import and Export 'SPSS', 'Stata' and 'SAS' Files](https://haven.tidyverse.org/index.html) [@haven]</center>

------------------------------------------------------------------------

::: {layout="[10, 30]" layout-valign="center"}
![](img/chap01/logoi/logo-haven-min.png){width="176"}

{**haven**} enables R to read and write various data formats used by other statistical packages. Currently it supports [SAS](https://www.sas.com/en_us/home.html), [SPSS](https://www.ibm.com/spss) and [STATA](https://www.stata.com/). {**haven**} output object has four important features:
:::

------------------------------------------------------------------------

(1) It creates `tibbles::tibble()` which a better print method for very long and very wide files.
(2) Dates and times are converted to R date/time classes.
(3) Character vectors are not converted to factors.
(4) Value labels are translated into a new `haven::labelled()` class, which preserves the original semantics and can easily be coerced to factors with `haven::as_factor()`. Special missing values are preserved. See details in the vignette [Conversion semantics](https://haven.tidyverse.org/articles/semantics.html).

::::: my-remark
::: my-remark-header
{**haven**}: Personal Evaluation
:::

::: my-remark-container
I am here in this book interested especially in the fourth feature.
:::
:::::
:::::::
:::::::::

## Hmisc {#sec-Hmisc}

:::::::: my-package
::: my-package-header
Package Profile: Hmisc
:::

:::::: my-package-container

------------------------------------------------------------------------

<center>[Harrell Miscellaneous](https://hbiostat.org/r/hmisc/) [@Hmisc]</center>

------------------------------------------------------------------------

(*There is no hexagon sticker available for {**Hmisc**}.*)

The {**Hmisc**} has it names from Frank Harrell Jr. It contains many functions useful for

-   data analysis,
-   high-level graphics,
-   utility operations,
-   computing sample size and power,
-   simulation,
-   importing and annotating datasets,
-   imputing missing values,
-   advanced table making,
-   variable clustering,
-   character string manipulation,
-   conversion of R objects to {{< latex >}} and HTML code,
-   recoding variables,
-   caching,
-   simplified parallel computing,
-   encrypting and decrypting data using a safe workflow,
-   general moving window statistical estimation,
-   assistance in interpreting principal component analysis [@Hmisc]

::::: my-remark
::: my-remark-header
{Hmisc}: Personal Evaluation
:::

::: my-remark-container
This is big variety of functions. In contrast to other packages that are specific directed to solve one problem {**Hmisc**} seems to be an all-in-one-solution.

To learn more about I should visit Frank E. Harrell's Jr [Hmisc start page](https://hbiostat.org/r/hmisc/). Especially his [online-book on R Workflow for Reproducible Data Analysis and Reporting](https://hbiostat.org/rflow/) seems to me very interesting!
:::
:::::
::::::
::::::::

## httr2 {#sec-httr2}

:::::: my-package
::: my-package-header
Package Profile: httr2
:::

:::: my-package-container
<center>[Perform HTTP Requests and Process the Responses](https://httr2.r-lib.org/) [@httr2]</center>

------------------------------------------------------------------------

::: {layout="[10, 30]" layout-valign="center"}
![](img/chap03/logoi/logo-httr2-min.png){width="176"}

Tools for creating and modifying HTTP requests, then performing them and processing the results.
:::

------------------------------------------------------------------------

{**httr2**} (pronounced hitter2) is a ground-up rewrite of {**httr**} that provides a pipeable `r glossary("APIx", "API")` with an explicit request object that solves more problems felt by packages that wrap APIs (e.g. built-in rate-limiting, retries, OAuth, secure secrets, and more). --- {**httr2**} is designed to map closely to the underlying `r glossary("HTTP")` `r glossary("protocol")`. For more details, read [An overview of HTTP](https://developer.mozilla.org/en-US/docs/Web/HTTP/Overview) from `r glossary("MDN")`.
::::
::::::

## janitor {#sec-janitor}

::::::::: my-package
::: my-package-header
Package Profile: janitor
:::

::::::: my-package-container
<center>[Simple Tools for Examining and Cleaning Dirty Data](https://sfirke.github.io/janitor/) [@janitor]</center>

------------------------------------------------------------------------

::: {layout="[10, 30]" layout-valign="center"}
![](img/chap03/logoi/logo-janitor-min.png){width="176"}

{**janitor**} has simple functions for examining and cleaning dirty data. It was built with beginning and intermediate R users in mind and is optimized for user-friendliness. Advanced R users can perform many of these tasks already, but with janitor they can do it faster and save their thinking for the fun stuff.
:::

------------------------------------------------------------------------

**The main janitor functions:**

-   perfectly format data.frame column names;
-   create and format frequency tables of one, two, or three variables - think an improved `base::table()`; and
-   provide other tools for cleaning and examining data.frames.

The tabulate-and-report functions approximate popular features of SPSS and Microsoft Excel.

{**janitor**} is a {**tidyverse**}-oriented package. Specifically, it plays nicely with the `%>%` pipe and is optimized for cleaning data brought in with the {**readr**} and {**readxl**} packages.

::::: my-remark
::: my-remark-header
{janitor}: Personal Evaluation
:::

::: my-remark-container
I am using {**janitor**} mostly in two ways:

1.  as better `base::table()` function, using `janitor::tabyl()`
    -   `base::table()` doesn't accept data.frames and is therefore not compatible with the pipe
    -   `base::table()` doesn't output data.frames
    -   `base::table()` results are hard to format (the most annoying "feature" for me)

-   to add information and formatting to the table with the `janitor::adorn_*` functions
    -   `janitor::adorn_totals()`
    -   `janitor::adorn_percentages()`
    -   `janitor::adorn_pct_formatting()`
    -   `janitor::adorn_rounding()`
    -   `janitor::adorn_ns()` (adding Ns = number of counts)
    -   `janitor::adorn_title()`

You could also use {**tidyverse**} commands (for instance for a two table `dplyr::count()` followed by `tidyr::pivot_wider()`) but the many `adorn_*`-functions make it easy to enhance the results. BTW: The prefix `adorn` comes from 'adornment' (ornament, decoration).
:::
:::::
:::::::
:::::::::

## kableExtra {#sec-kableExtra}

:::::: my-package
::: my-package-header
Package Profile: kableExtra
:::

:::: my-package-container
<center>[Construct Complex Table with 'kable' and Pipe Syntax](http://haozhu233.github.io/kableExtra/) [@kableExtra]</center>

------------------------------------------------------------------------

::: {layout="[10, 30]" layout-valign="center"}
![](img/chap02/logoi/logo-kableExtra-min.png){width="176"}

Build complex HTML or {{< latex >}} tables using `kable()` from {**knitr**} and the piping syntax from {**magrittr**} Function `kable()` is a light weight table generator coming from {**knitr**}. This package simplifies the way to manipulate the HTML or {{< latex >}} codes generated by `kable()` and allows users to construct complex tables and customize styles using a readable syntax.
:::

------------------------------------------------------------------------

{**kableExtra**} is NOT a table generating package. It is a package that can "add features" to a `kable()` output using a syntax that every useR loves - the pipes %\>%. We see similar approaches to deal with plots in packages like ggvis and plotly. There is no reason why we cannot use it with tables.

Most functionalities in {**kableExtra**} can work in both HTML and PDF. In fact, as long as you specifies format in `kable()` (which can be set globally through option `knitr.table.format`), functions in this package will pick the right way to manipulate the table be themselves. As a result, if users want to left align the table, `kable(...) %>% kable_styling(position = "left")` will work in both HTML and PDF. Recently, we also introduced a new `kbl()` function acting as an alternative to {**kable**} but provides better documentation and format detection.
::::
::::::

## knitr {#sec-knitr}

::::: my-package
::: my-package-header
Package Profile: knitr
:::

::: my-package-container

------------------------------------------------------------------------

<center>[A General-Purpose Package for Dynamic Report Generation in R](https://yihui.org/knitr/) [@xie-2014; @xie-2024; @xie2015]</center>

------------------------------------------------------------------------

(*There is no hexagon sticker available for {**knitr**}.*)

Provides a general-purpose tool for dynamic report generation in R using Literate Programming techniques.
:::
:::::

## labelled {#sec-labelled}

:::::: my-package
::: my-package-header
Package Profile: labelled
:::

:::: my-package-container
<center>[Manipulating Labelled Data](https://larmarange.github.io/labelled/) [@labelled]</center>

------------------------------------------------------------------------

::: {layout="[10, 30]" layout-valign="center"}
![](img/chap01/logoi/logo-labelled-min.png){width="176"}

Work with labelled data imported from [IBM SPSS](https://www.ibm.com/spss) or [STATA](https://www.stata.com/) with {**haven**} or {**foreign**}. This package provides useful functions to deal with "haven_labelled" and "haven_labelled_spss" classes introduced by {**haven**} package. / / (With the free [gnu PSPP](https://www.gnu.org/software/pspp/) exists also a SPSS like open source version.)
:::

------------------------------------------------------------------------

See details in the vignette [Introduction to labelled](https://larmarange.github.io/labelled/articles/intro_labelled.html) and the [GitHub website for labelled](https://larmarange.github.io/labelled/). There are other vignettes as well and a [cheat sheet as PDF for download](https://github.com/larmarange/labelled/raw/main/cheatsheet/labelled_cheatsheet.pdf).
::::
::::::

## lmtest {#sec-lmtest}

::::: my-package
::: my-package-header
Package Profile: lmtest
:::

::: my-package-container

------------------------------------------------------------------------

<center>[Testing Linear Regression Models](https://cran.r-project.org/package=lmtest) [@lmtest]</center>

------------------------------------------------------------------------

(*There is no hexagon sticker available for {**lmtest**}.*)

A collection of tests, data sets, and examples for diagnostic checking in linear regression models. Furthermore, some generic tools for inference in parametric models are provided.

Vignette [Diagnostic Checking in Regression Relationships](https://cran.r-project.org/web/packages/lmtest/vignettes/lmtest-intro.pdf)
:::
:::::

## lsr {#sec-lsr}

::::: my-package
::: my-package-header
Package Profile: lsr
:::

::: my-package-container

------------------------------------------------------------------------

<center>[Companion to "Learning Statistics with R"](https://learningstatisticswithr.com/) [@lsr]</center>

------------------------------------------------------------------------

(*There is no hexagon sticker available for {**lsr**}.*)

A collection of tools intended to make introductory statistics easier to teach, including wrappers for common hypothesis tests and basic data manipulation. It accompanies Navarro, D. J. (2015). Learning Statistics with R: A Tutorial for Psychology Students and Other Beginners, Version 0.6.
:::
:::::

## MASS {#sec-MASS}

::::: my-package
::: my-package-header
Package Profile: MASS
:::

::: my-package-container

------------------------------------------------------------------------

<center>[Package ‘MASS’](http://www.stats.ox.ac.uk/pub/MASS4/) [@MASS]</center>

------------------------------------------------------------------------

(*There is no hexagon sticker available for {**MASS**}.*)

Functions and datasets to support Venables and Ripley, [Modern Applied Statistics with S](http://www.stats.ox.ac.uk/pub/MASS4/) [@venables2002]
:::
:::::

## mlogit {#sec-mlogit}

::::: my-package
::: my-package-header
Package Profile: mlogit
:::

::: my-package-container

------------------------------------------------------------------------

<center>[Multinomial Logit Models](https://cran.r-project.org/package=mlogit) [@mlogit]</center>

------------------------------------------------------------------------

(*There is no hexagon sticker available for {**mlogit**}.*)

Maximum likelihood estimation of random utility discrete choice models. The software is described in Croissant [-@croissant2020] and the underlying methods in Train [-@train2009].
:::
:::::

## modeest {#sec-modeest}

::::: my-package
::: my-package-header
Package Profile: modeest
:::

::: my-package-container

------------------------------------------------------------------------

<center>[Mode Estimation](https://github.com/paulponcet/modeest)[@modeest]</center>

------------------------------------------------------------------------

(*There is no hexagon sticker available for {**modeest**}.*)

The {**modeest**} package provides estimators of the mode of univariate unimodal (and sometimes multimodal) data and values of the modes of usual probability distributions.

{**modeest**} is a package specialized for mode estimation. It implements many different mode estimation reported in scientific articles. There is a long [list of references](https://www.rdocumentation.org/packages/modeest/versions/2.4.0/topics/modeest) on different methods of mode estimations.
:::
:::::

## moments {#sec-moments}

::::: my-package
::: my-package-header
Package Profile: moments
:::

::: my-package-container

------------------------------------------------------------------------

<center>Moments, Cumulants, Skewness, Kurtosis and Related Tests\](https://cran.r-project.org/package=moments) [@moments]</center>

------------------------------------------------------------------------

(*There is no hexagon sticker available for {**moments**}.*)

Functions to calculate: moments, Pearson's kurtosis, Geary's kurtosis and skewness; tests related to them.
:::
:::::

## misty {#sec-misty}

::::: my-package
::: my-package-header
Package Profile: misty
:::

::: my-package-container

------------------------------------------------------------------------

<center>[Miscellaneous Functions 'T. Yanagida'](https://cran.r-project.org/package=misty) [@misty]</center>

------------------------------------------------------------------------

(*There is no hexagon sticker available for {**misty**}.*)

Miscellaneous functions for

(1) data management (e.g., grand-mean and group-mean centering, coding variables and reverse coding items, scale and cluster scores, reading and writing Excel and SPSS files),
(2) descriptive statistics (e.g., frequency table, cross tabulation, effect size measures),
(3) missing data (e.g., descriptive statistics for missing data, missing data pattern, Little's test of Missing Completely at Random, and auxiliary variable analysis),
(4) multilevel data (e.g., multilevel descriptive statistics, within-group and between-group correlation matrix, multilevel confirmatory factor analysis, level-specific fit indices, cross-level measurement equivalence evaluation, multilevel composite reliability, and multilevel R-squared measures),
(5) item analysis (e.g., confirmatory factor analysis, coefficient alpha and omega, between-group and longitudinal measurement equivalence evaluation), and
(6) statistical analysis (e.g., confidence intervals, collinearity and residual diagnostics, dominance analysis, between- and within-subject analysis of variance, latent class analysis, t-test, z-test, sample size determination).
:::
:::::

## naniar {#sec-naniar}

:::::: my-package
::: my-package-header
Package Profile: naniar
:::

:::: my-package-container
<center>[Data Structures, Summaries, and Visualisations for Missing Data](https://naniar.njtierney.com/) [@naniar]</center>

------------------------------------------------------------------------

::: {layout="[10, 30]" layout-valign="center"}
![](img/chap05/logoi/logo-naniar-min.png){width="176"}

{**naniar**} provides principled, tidy ways to summarise, visualise, and manipulate missing data with minimal deviations from the workflows in ggplot2 and tidy data.
:::

------------------------------------------------------------------------

Missing values are ubiquitous in data and need to be explored and handled in the initial stages of analysis. {**naniar**} provides data structures and functions that facilitate the plotting of missing values and examination of imputations. This allows missing data dependencies to be explored with minimal deviation from the common work patterns of {**ggplot2**} and tidy data. The work is fully discussed in Tierney & Cook [-@tierney2023].
::::
::::::

## nhanesA {#sec-nhanesA}

:::::::: my-package
::: my-package-header
Package Profile: nhanesA
:::

:::::: my-package-container

------------------------------------------------------------------------

<center>[NHANES Data Retrieval](https://github.com/cjendres1/nhanes/) [@nhanesA]</center>

------------------------------------------------------------------------

(*There is no hexagon sticker available for {**nhanesA**}.*)

{**nhanesA**} is an R package for browsing and retrieving data from the National Health And Nutrition Examination Survey [NHANES website](https://www.cdc.gov/nchs/nhanes/index.htm). This package is designed to be useful for research and instructional purposes.

The functions in the {**nhanesA**} package allow for fully customizable selection and import of data directly from the NHANES website thus it is essential to have an active network connection.

::::: my-remark
::: my-remark-header
{nhanesA}: Personal Evaluation
:::

::: my-remark-container
There are other similar packages also available, but the are more restricted as newer data than 2014 can't be downloaded:

-   {**NHANES**}: For the years 2009-2012
-   {**RNHANES**}: For the years 1999-2014

See for my other reflection of packages for downloading NHANES data in @imp-chap01-nhanesA-pkg and @sec-chap03-rnhanes.
:::
:::::
::::::
::::::::

## nhstplot {#sec-nhstplot}

::::: my-package
::: my-package-header
Package Profile: nhstplot
:::

::: my-package-container

------------------------------------------------------------------------

<center>[Plot Null Hypothesis Significance Tests](https://cran.r-project.org/package=nhstplot) [@nhstplot]</center>

------------------------------------------------------------------------

(*There is no hexagon sticker available for {**nhstplot**}.*)

Illustrate graphically the most common Null Hypothesis Significance Testing procedures. More specifically, this package provides functions to plot

-   Chi-Squared,
-   F,
-   t (one- and two-tailed) and
-   z (one- and two-tailed) tests,

by plotting the probability density under the null hypothesis as a function of the different test statistic values.

Although highly flexible (color theme, fonts, etc.), only the minimal number of arguments (observed test statistic, degrees of freedom) are necessary for a clear and useful graph to be plotted, with the observed test statistic and the p value, as well as their corresponding value labels. The axes are automatically scaled to present the relevant part and the overall shape of the probability density function. This package is especially intended for education purposes, as it provides a helpful support to help explain the Null Hypothesis Significance Testing process, its use and/or shortcomings.
:::
:::::

## nnet {#sec-nnet}

::::: my-package
::: my-package-header
Package Profile: nnet
:::

::: my-package-container

------------------------------------------------------------------------

<center>[Feed-Forward Neural Networks and Multinomial Log-Linear Models](http://www.stats.ox.ac.uk/pub/MASS4/) [@nnet]</center>

------------------------------------------------------------------------

(*There is no hexagon sticker available for {**nnet**}.*)

Software for feed-forward neural networks with a single hidden layer, and for multinomial log-linear models.
:::
:::::

## nortest {#sec-nortest}

::::: my-package
::: my-package-header
Package Profile: nortest
:::

::: my-package-container

------------------------------------------------------------------------

<center>[Tests for Normality](https://cran.r-project.org/package=nortest) [@nortest]</center>

------------------------------------------------------------------------

(*There is no hexagon sticker available for {**nortest**}.*)

Five omnibus tests for testing the composite hypothesis of normality.
:::
:::::

## odds.n.ends {#sec-odds.n.ends}

:::::: my-package
::: my-package-header
Package Profile: odds.n.ends
:::

:::: my-package-container
<center>[Odds Ratios, Contingency Table, and Model Significance from a Generalized Linear Model](https://cran.r-project.org/package=odds.n.ends) [@odds.n.ends]</center>

------------------------------------------------------------------------

::: {layout="[10, 30]" layout-valign="center"}
![](img/chap10/logoi/logo-odds-n-ends-min.png){width="176"}

Computes odds ratios and 95% confidence intervals from a generalized linear model object. It also computes model significance with the chi-squared statistic and p-value and it computes model fit using a contingency table to determine the percent of observations for which the model correctly predicts the value of the outcome. Calculates model sensitivity and specificity.
:::

------------------------------------------------------------------------

{**odds.n.ends**} was created in order to take the results from a binary logistic regression model estimated using the `glm()` function and compute model significance, model fit, and the odds ratios and 95% confidence intervals typically reported from binary logistic regression analyses.
::::
::::::

## onewaytests {#sec-onewaytests}

::::: my-package
::: my-package-header
Package Profile: onewaytests
:::

::: my-package-container

------------------------------------------------------------------------

<center>[One-Way Tests in Independent Groups Designs](https://cran.r-project.org/package=onewaytests) [@onewaytests]</center>

------------------------------------------------------------------------

(*There is no hexagon sticker available for {**onewaytests**}.*)

Performs one-way tests in independent groups designs including homoscedastic and heteroscedastic tests. These are

-   one-way analysis of variance (ANOVA),
-   Welch's heteroscedastic F test,
-   Welch's heteroscedastic F test with trimmed means and Winsorized variances,
-   Brown-Forsythe test,
-   Alexander-Govern test,
-   James second order test,
-   Kruskal-Wallis test,
-   Scott-Smith test,
-   Box F test,
-   Johansen F test,
-   Generalized tests equivalent to Parametric Bootstrap and Fiducial tests,
-   Alvandi's F test,
-   Alvandi's generalized p-value,
-   approximate F test,
-   B square test,
-   Cochran test,
-   Weerahandi's generalized F test,
-   modified Brown-Forsythe test,
-   adjusted Welch's heteroscedastic F test,
-   Welch-Aspin test,
-   Permutation F test.

The package performs pairwise comparisons and graphical approaches.

Also, the package includes

-   Student's t test,
-   Welch's t test and
-   Mann-Whitney U test for two samples.

Moreover, it assesses variance homogeneity and normality of data in each group via tests and plots (Dag et al., 2018, <https://journal.r-project.org/archive/2018/RJ-2018-022/RJ-2018-022.pdf>).
:::
:::::

## openintro {#sec-openintro}

:::::: my-package
::: my-package-header
Package Profile: openintro
:::

:::: my-package-container
<center>[Data Sets and Supplemental Functions from 'OpenIntro' Textbooks and Labs](https://openintrostat.github.io/openintro/) [@openintro]</center>

------------------------------------------------------------------------

::: {layout="[10, 30]" layout-valign="center"}
![](img/chap03/logoi/logo-openintro-min.png){width="176"}

The package contains data sets used in our open-source textbooks along with custom plotting functions for reproducing book figures. It also contains the datasets used in {**OpenIntro**} labs.
:::

------------------------------------------------------------------------

Note that many functions and examples include color transparency; some plotting elements may not show up properly (or at all) when run in some versions of Windows operating system.
::::
::::::

## ordinal {#sec-ordinal}

::::: my-package
::: my-package-header
Package Profile: ordinal
:::

::: my-package-container

------------------------------------------------------------------------

<center>[Regression Models for Ordinal Data](https://github.com/runehaubo/ordinal) [@ordinal]</center>

------------------------------------------------------------------------

(*There is no hexagon sticker available for {**ordinal**}.*)

Implementation of cumulative link (mixed) models also known as ordered regression models, proportional odds models, proportional hazards models for grouped survival times and ordered logit/probit/... models. Estimation is via maximum likelihood and mixed models are fitted with the Laplace approximation and adaptive Gauss-Hermite quadrature. Multiple random effect terms are allowed and they may be nested, crossed or partially nested/crossed. Restrictions of symmetry and equidistance can be imposed on the thresholds (cut-points/intercepts). Standard model methods are available (summary, anova, drop-methods, step, confint, predict etc.) in addition to profile methods and slice methods for visualizing the likelihood function and checking convergence.
:::
:::::

## paletteer {#sec-paletteer}

:::::: my-package
::: my-package-header
Package Profile: paletteer
:::

:::: my-package-container
<center>[Comprehensive Collection of Color Palettes](https://emilhvitfeldt.github.io/paletteer/index.html) [@paletteer]</center>

------------------------------------------------------------------------

::: {layout="[10, 30]" layout-valign="center"}
![](img/chap03/logoi/logo-paletteer-min.png){width="176"}

The choices of color palettes in R can be quite overwhelming with palettes spread over many packages with many different API's. This packages aims to collect all color palettes across the R ecosystem under the same package with a streamlined API.
:::

------------------------------------------------------------------------

The palettes are divided into 2 groups; discrete and continuous. For discrete palette you have the choice between the fixed width palettes and dynamic palettes.

1.  discrete
    -   fixed width palettes: These are the most common discrete palettes. They have a set amount of colors which doesn’t change when the number of colors requested vary.
    -   dynamic palettes: The colors of dynamic palettes depend on the number of colors you need.
2.  continuous: These palettes provides as many colors as you need for a smooth transition of color.

This package includes 2759 palettes from 75 different packages and information about these can be found in the following data.frames: `palettes_c_names`, `palettes_d_names` and `palettes_dynamic_names`. Additionally this [github repo](https://github.com/EmilHvitfeldt/r-color-palettes/blob/main/README.md) showcases all the palettes included in the package and more.
::::
::::::

## patchwork {#sec-patchwork}

::::::::::::: my-package
::: my-package-header
Package Profile: patchwork
:::

::::::: my-package-container
<center>[The Composer of Plots](https://patchwork.data-imaginist.com/) [@patchwork]</center>

------------------------------------------------------------------------

::: {layout="[10, 30]" layout-valign="center"}
![](img/chap03/logoi/logo-patchwork-min.png){width="176"}

The goal of {**patchwork**} is to make it ridiculously simple to combine separate `ggplots` into the same graphic. As such it tries to solve the same problem as `gridExtra::grid.arrange()` and `cowplot::plot_grid` but using an `r glossary("APIx", "API")` that incites exploration and iteration, and scales to arbitrarily complex layouts.
:::

------------------------------------------------------------------------

The {**ggplot2**} package provides a strong API for sequentially building up a plot, but does not concern itself with composition of multiple plots. {**patchwork**} is a package that expands the API to allow for arbitrarily complex composition of plots by, among others, providing mathematical operators for combining multiple plots. Other packages that try to address this need (but with a different approach) are {**gridExtra**} and {**cowplot**} (see @sec-gridExtra and @sec-cowplot).

Before plots can be laid out, they have to be assembled. Arguably one of patchwork’s biggest selling points is that it expands on the use of `+` in ggplot2 to allow plots to be added together and composed, creating a natural extension of the {**ggplot2**} API.

While quite complex compositions can be achieved using `+`, `|`, and `/`, it may be necessary to take even more control over the layout. All of this can be controlled using the `patchwork::plot_layout()` function along with a couple of special placeholder objects.

::::: my-remark
::: my-remark-header
{patchwork}: Personal Evaluation
:::

::: my-remark-container
In this book I am using the dobble colon notation instead of a `library()` call. Without this call it is more difficult to use the {**patchwork**} package.

See [Using plot arithmetic functions with `::` syntax](https://github.com/thomasp85/patchwork/issues/351#issuecomment-1931140157)

| operator | function                    | effect       |
|----------|-----------------------------|--------------|
| \+       | `ggplot2:::"+.gg"()`        | side by side |
| \-       | `patchwork:::"-.ggplot"()`  |              |
| \|       | `patchwork:::"\|.ggplot"()` |              |
| /        | `patchwork:::"/.ggplot"()`  | stacked      |
| \*       | `patchwork:::"*.gg"()`      |              |
| &        | `patchwork:::"&.gg"()`      |              |
:::
:::::
:::::::

## performance {#sec-performance}

:::::: my-package
::: my-package-header
Package Profile: performance
:::

:::: my-package-container
<center>[Assessment of Regression Models Performance](https://easystats.github.io/performance/) [@performance]</center>

------------------------------------------------------------------------

::: {layout="[10, 30]" layout-valign="center"}
![](img/chap10/logoi/logo-performance-min.png){width="176"}

Utilities for computing measures to assess model quality, which are not directly provided by R's 'base' or 'stats' packages. These include e.g. measures like r-squared, intraclass correlation coefficient (Nakagawa, Johnson & Schielzeth (2017) <doi:10.1098/rsif.2017.0213>), root mean squared error or functions to check models for overdispersion, singularity or zero-inflation and more. Functions apply to a large variety of regression models, including generalized linear models, mixed effects models and Bayesian models. References: Lüdecke et al. (2021) <doi:10.21105/joss.03139>.
:::

------------------------------------------------------------------------

A crucial aspect when building regression models is to evaluate the quality of modelfit. It is important to investigate how well models fit to the data and which fit indices to report. Functions to create diagnostic plots or to compute fit measures do exist, however, mostly spread over different packages. There is no unique and consistent approach to assess the model quality for different kind of models.

The primary goal of the performance package is to fill this gap and to provide utilities for computing indices of model quality and goodness of fit. These include measures like r-squared ($R^2$), root mean squared error (RMSE) or intraclass correlation coefficient (ICC) , but also functions to check (mixed) models for overdispersion, zero-inflation, convergence or singularity.
::::
::::::
:::::::::::::

## plotly {#sec-plotly}

:::::: my-package
::: my-package-header
Package Profile: plotly
:::

:::: my-package-container
<center>[Create Interactive Web Graphics via 'plotly.js'](https://plotly.com/r/) [@plotly]</center>

------------------------------------------------------------------------

::: {layout="[10, 30]" layout-valign="center"}
![](img/chap09/logoi/logo-plotly-min.png){width="176"}

Create interactive web graphics from 'ggplot2' graphs and/or a custom interface to the (MIT-licensed) JavaScript library 'plotly.js' inspired by the grammar of graphics.
:::

------------------------------------------------------------------------

Plotly.js is a standalone Javascript data visualization library, and it also powers the Python and R modules named plotly in those respective ecosystems (referred to as Plotly.py and Plotly.R).

Plotly.js can be used to produce dozens of chart types and visualizations, including statistical charts, 3D graphs, scientific charts, SVG and tile maps, financial charts and more.
::::
::::::

## ppcor {#sec-ppcor}

::::: my-package
::: my-package-header
Package Profile: ppcor
:::

::: my-package-container

------------------------------------------------------------------------

<center>[Partial and Semi-Partial (Part) Correlation](https://cran.r-project.org/package=ppcor) [@ppcor]</center>

------------------------------------------------------------------------

(*There is no hexagon sticker available for {**ppcor**}.*)

Calculates partial and semi-partial (part) correlations along with p value. [@kim2015]
:::
:::::

## pscl {#sec-pscl}

::::: my-package
::: my-package-header
Package Profile: pscl
:::

::: my-package-container

------------------------------------------------------------------------

<center>[Political Science Computational Laboratory](https://github.com/atahk/pscl) [@pscl]</center>

------------------------------------------------------------------------

(*There is no hexagon sticker available for {**pscl**}.*)

Bayesian analysis of item-response theory (IRT) models, roll call analysis; computing highest density regions; maximum likelihood estimation of zero-inflated and hurdle models for count data; goodness-of-fit measures for GLMs; data sets used in writing and teaching; seats-votes curves.
:::
:::::

## psych {#sec-psych}

::::: my-package
::: my-package-header
Package Profile: psych
:::

::: my-package-container

------------------------------------------------------------------------

<center>[Procedures for Psychological, Psychometric, and Personality Research](https://personality-project.org/r/psych/) [@psych]</center>

------------------------------------------------------------------------

(*There is no hexagon sticker available for {**psych**}.*)

A general purpose toolbox developed originally for personality, psychometric theory and experimental psychology.

-   Functions are primarily for multivariate analysis and scale construction using factor analysis, principal component analysis, cluster analysis and reliability analysis, although others provide basic descriptive statistics.
-   Item Response Theory is done using factor analysis of tetrachoric and polychoric correlations.
-   Functions for analyzing data at multiple levels include within and between group statistics, including correlations and factor analysis.\
-   Validation and cross validation of scales developed using basic machine learning algorithms are provided, as are functions for simulating and testing particular item and test structures.
-   Several functions serve as a useful front end for structural equation modeling.
-   Graphical displays of path diagrams, including mediation models, factor analysis and structural equation models are created using basic graphics.
-   Some of the functions are written to support a book on psychometric theory as well as publications in personality research.
:::
:::::

## pubh {#sec-pubh}

::::: my-package
::: my-package-header
Package Profile: pubh
:::

::: my-package-container

------------------------------------------------------------------------

<center>[A Toolbox for Public Health and Epidemiology](https://cran.r-project.org/package=pubh) [@pubh]</center>

------------------------------------------------------------------------

(*There is no hexagon sticker available for {**pubh**}.*)

A toolbox for making R functions and capabilities more accessible to students and professionals from Epidemiology and Public Health related disciplines. Includes a function to report coefficients and confidence intervals from models using robust standard errors (when available), functions that expand {**ggplot2**} plots and functions relevant for introductory papers in Epidemiology or Public Health. Please note that use of the provided data sets is for educational purposes only.
:::
:::::

## purrr {#sec-purrr}

:::::: my-package
::: my-package-header
Package Profile: purrr
:::

:::: my-package-container
<center>[Functional Programming Tools](https://purrr.tidyverse.org/) [@purrr]</center>

------------------------------------------------------------------------

::: {layout="[10, 30]" layout-valign="center"}
![](img/chap02/logoi/logo-purrr-min.png){width="176"}

{**purrr**} enhances R’s functional programming (FP) toolkit by providing a complete and consistent set of tools for working with functions and vectors.
:::

------------------------------------------------------------------------

If you’ve never heard of FP before, the best place to start is the family of `purrr::map()` functions which allow you to replace many for loops with code that is both more succinct and easier to read. The best place to learn about the `purrr::map()` functions is the [iteration chapter](https://r4ds.had.co.nz/iteration.html) in R for data science.
::::
::::::

## rcompanion {#sec-rcompanion}

:::::::: my-package
::: my-package-header
Package Profile: rcompanion
:::

:::::: my-package-container

------------------------------------------------------------------------

<center>[Functions to Support Extension Education Program Evaluation](https://rcompanion.org/handbook/) [@rcompanion]</center>

------------------------------------------------------------------------

(*There is no hexagon sticker available for {**rcompanion**}.*)

Functions and datasets to support [Summary and Analysis of Extension Program Evaluation in R](https://rcompanion.org/handbook/), and [An R Companion for the Handbook of Biological Statistics](https://rcompanion.org/rcompanion/). Vignettes are available at <https://rcompanion.org>. (See also the [PDF book](https://rcompanion.org/documents/RHandbookProgramEvaluation.pdf) [@mangiafico2023].)

Both books provide example programs for nearly all of the statistical tests that are described in the Handbook of Biological Statistics [@mcdonald-2009].

::::: my-remark
::: my-remark-header
{rcompanion}: Personal Evaluation
:::

::: my-remark-container
Although all three books are now 15 years old, it seems to me that they cover valuable material that is still important. The books by McDonald explains many different tests that are used in the two books by Mangiafico. I will add here the complete table of content to remind me that I should look into this three books:

**TABLE OF CONTENT**

**Introduction**

-   Purpose of this Book
-   Author of this Book
-   Using R
-   [Statistics Textbooks and Other Resources](https://rcompanion.org/handbook/A_04.html)

**Statistics for Educational Program Evaluation**

-   Why Statistics?
-   Evaluation Tools and Surveys

**Variables, Descriptive Statistics, and Plots**

-   Types of Variables
-   Descriptive Statistics
-   Confidence Intervals
-   Basic Plots

**Understanding Statistics and Hypothesis Testing**

-   Hypothesis Testing and p-values
-   Reporting Results of Data and Analyses
-   [Choosing a Statistical Test](https://rcompanion.org/handbook/D_03.html)
-   Independent and Paired Values

**Likert Data**

-   Introduction to Likert Data
-   Descriptive Statistics for Likert Item Data
-   Descriptive Statistics with the likert Package
-   Confidence Intervals for Medians
-   Converting Numeric Data to Categories

**Traditional Nonparametric Tests**

-   Introduction to Traditional Nonparametric Tests
-   One-sample Wilcoxon Signed-rank Test
-   Sign Test for One-sample Data
-   Two-sample Mann–Whitney U Test
-   Mood’s Median Test for Two-sample Data
-   Two-sample Paired Signed-rank Test
-   Sign Test for Two-sample Paired Data
-   Kruskal–Wallis Test
-   Mood’s Median Test
-   Friedman Test
-   Quade Test
-   Scheirer–Ray–Hare Test
-   Aligned Ranks Transformation ANOVA
-   Nonparametric Regression and Local Regression
-   Nonparametric Regression for Time Series

**Permutation Tests**

-   Introduction to Permutation Tests
-   One-way Permutation Test for Ordinal Data
-   One-way Permutation Test for Paired Ordinal Data
-   Permutation Tests for Medians and Percentiles

**Tests for Ordinal Data in Tables**

-   Association Tests for Ordinal Tables
-   Measures of Association for Ordinal Tables

**Concepts for Linear Models**

-   Introduction to Linear Models
-   Using Random Effects in Models
-   What are Estimated Marginal Means?
-   Estimated Marginal Means for Multiple Comparisons
-   Factorial ANOVA: Main Effects, Interaction Effects, and Interaction Plots
-   p-values and R-square Values for Models
-   Accuracy and Errors for Models

**Ordinal Tests with Cumulative Link Models**

-   Introduction to Cumulative Link Models (CLM) for Ordinal Data
-   Two-sample Ordinal Test with CLM
-   Two-sample Paired Ordinal Test with CLMM
-   One-way Ordinal Regression with CLM
-   One-way Repeated Ordinal Regression with CLMM
-   Two-way Ordinal Regression with CLM
-   Two-way Repeated Ordinal Regression with CLMM

**Tests for Nominal Data**

-   Introduction to Tests for Nominal Variables
-   Confidence Intervals for Proportions
-   Goodness-of-Fit Tests for Nominal Variables
-   Association Tests for Nominal Variables
-   Measures of Association for Nominal Variables
-   Tests for Paired Nominal Data
-   Cochran–Mantel–Haenszel Test for 3-Dimensional Tables
-   Cochran’s Q Test for Paired Nominal Data
-   Models for Nominal Data

**Parametric Tests**

-   Introduction to Parametric Tests
-   One-sample t-test
-   Two-sample t-test
-   Paired t-test
-   One-way ANOVA
-   One-way ANOVA with Blocks
-   One-way ANOVA with Random Blocks
-   Two-way ANOVA
-   Repeated Measures ANOVA
-   Correlation and Linear Regression
-   Advanced Parametric Methods
-   Transforming Data
-   Normal Scores Transformation

**Analysis of Count Data and Percentage Data**

-   Regression for Count Data
-   Beta Regression for Percent and Proportion Data

**Other Books**
:::
:::::
::::::
::::::::

## readr {#sec-readr}

:::::: my-package
::: my-package-header
Package Profile: readr
:::

:::: my-package-container
<center>[Read Rectangular Text Data](https://readr.tidyverse.org/) [@readr]</center>

------------------------------------------------------------------------

::: {layout="[10, 30]" layout-valign="center"}
![](img/chap01/logoi/logo-readr-min.png){width="176"}

The goal of readr is to provide a fast and friendly way to read rectangular data from delimited files, such as comma-separated values (CSV) and tab-separated values (TSV). It is designed to parse many types of data found in the wild, while providing an informative problem report when parsing leads to unexpected results. [@readr]
:::

------------------------------------------------------------------------

{**readr**} supports the following formats:

-   read_csv(): comma-separated values (CSV)
-   read_tsv(): tab-separated values (TSV)
-   read_csv2(): semicolon-separated values with , as the decimal mark
-   read_delim(): delimited files (CSV and TSV are important special cases)
-   read_fwf(): fixed-width files
-   read_table(): whitespace-separated files
-   read_log(): web log files
::::
::::::

## readxl {#sec-readxl}

:::::: my-package
::: my-package-header
Package Profile: readxl
:::

:::: my-package-container
<center>[Read Excel Files](https://readxl.tidyverse.org/) [@readxl]</center>

------------------------------------------------------------------------

::: {layout="[10, 30]" layout-valign="center"}
![](img/chap03/logoi/logo-readxl-min.png){width="176"}
:::

The readxl package makes it easy to get data out of Excel and into R. Compared to many of the existing packages (e.g. {**gdata**}, {**xlsx**}, {**xlsReadWrite**}) {**readxl**} has no external dependencies, so it’s easy to install and use on all operating systems. It is designed to work with *tabular* data. Works on Windows, Mac and Linux without external dependencies. ------------------------------------------------------------------------

{**readxl**} supports both the legacy `.xls` format and the modern xml-based `.xlsx` format. The embedded [libxls](https://github.com/libxls/libxls) C library is used to support `.xls`, which abstracts away many of the complexities of the underlying binary format. To parse `.xlsx`, we use the [RapidXML](https://rapidxml.sourceforge.net/) C++ library.
::::
::::::

## rms {#sec-rms}

::::: my-package
::: my-package-header
Package Profile: rms
:::

::: my-package-container

------------------------------------------------------------------------

<center>[Regression Modeling Strategies](https://hbiostat.org/r/rms/) [@rms]</center>

------------------------------------------------------------------------

(*There is no hexagon sticker available for {**rms**}.*)

Regression modeling, testing, estimation, validation, graphics, prediction, and typesetting by storing enhanced model design attributes in the fit. {**rms**} is a collection of functions that assist with and streamline modeling.

It also contains functions for binary and ordinal logistic regression models, ordinal models for continuous Y with a variety of distribution families, and the Buckley-James multiple regression model for right-censored responses, and implements penalized maximum likelihood estimation for logistic and ordinary linear models.

{**rms**} works with almost any regression model, but it was especially written to work with binary or ordinal regression models, Cox regression, accelerated failure time models, ordinary linear models, the Buckley-James model, generalized least squares for serially or spatially correlated observations, generalized linear models, and quantile regression.
:::
:::::

## report {#sec-report}

:::::: my-package
::: my-package-header
Package Profile: report
:::

:::: my-package-container
<center>[Automated Reporting of Results and Statistical Models](https://easystats.github.io/report/) [@report]</center>

------------------------------------------------------------------------

::: {layout="[10, 30]" layout-valign="center"}
![](img/chap02/logoi/logo-report-min.png){width="176"}

The primary goal of {**report**} is to bridge the gap between R’s output and the formatted results contained in your manuscript. This package converts statistical models and data frames into textual reports suited for publication, ensuring standardization and quality in results reporting.

------------------------------------------------------------------------

{**report**} automatically produces reports of models and data frames according to best practices guidelines (e.g., APA’s style), ensuring standardization and quality in results reporting.
:::
::::
::::::

## RNHANES {#sec-RNHANES}

:::::::: my-package
::: my-package-header
Package Profile: RNHANES
:::

:::::: my-package-container

------------------------------------------------------------------------

<center>[Facilitates Analysis of CDC NHANES Data](https://wwww.silentspring.org/RNHANES/index.html) [@RNHANES]</center>

------------------------------------------------------------------------

(*There is no hexagon sticker available for {**RNHANES**}.*)

RNHANES is an R package for accessing and analyzing `r glossary("CDC")` `r glossary("NHANES")` (National Health and Nutrition Examination Survey) data that was developed by [Silent Spring Institute](https://silentspring.org/).

::::: my-remark
::: my-remark-header
{**RNHANES**}: Personal Evaluation
:::

::: my-remark-container
The CRAN version of {**RNHANES**} only works with data before 2015. It is said in the book that for the year 2015-2016 you could use the GitHub developer version. But this didn't work for me.

The problem is the function `RNHANES::validate_year()` that is not up-to-date. It has the valid years included as fixed strings which I see as bad programming. (One could generate these pair of years programmatically, checking with the modulo operator `%%`, subtracting from the current year 4 years, because the data has to be prepared to make it public available.)

I therefore used in @lst-chap03-get-NHANES-2018-data code to download data directly from the website. Currently I learned that there is another --- more updated --- {**nhanesA**} packages that I am going to test in chapter 6, where I need NHANES data again.
:::
:::::
::::::
::::::::

## rstatix {#sec-rstatix}

::::: my-package
::: my-package-header
Package Profile: rstatix
:::

::: my-package-container

------------------------------------------------------------------------

<center>[Pipe-Friendly Framework for Basic Statistical Tests](https://rpkgs.datanovia.com/rstatix/) [@rstatix]</center>

------------------------------------------------------------------------

(*There is no hexagon sticker available for {**rstatix**}.*)

Provides a simple and intuitive pipe-friendly framework, coherent with the {**tidyverse**} design philosophy, for performing basic statistical tests, including t-test, Wilcoxon test, ANOVA, Kruskal-Wallis and correlation analyses.

The output of each test is automatically transformed into a tidy data frame to facilitate visualization.

Additional functions are available for reshaping, reordering, manipulating and visualizing correlation matrix. Functions are also included to facilitate the analysis of factorial experiments, including purely ‘within-Ss’ designs (repeated measures), purely ‘between-Ss’ designs, and mixed ‘within-and-between-Ss’ designs.

It’s also possible to compute several effect size metrics, including “eta squared” for ANOVA, “Cohen’s d” for t-test and “Cramer’s V” for the association between categorical variables. The package contains helper functions for identifying univariate and multivariate outliers, assessing normality and homogeneity of variances.
:::
:::::

## rvest {#sec-rvest}

::::::::: my-package
::: my-package-header
Package Profile: rvest
:::

::::::: my-package-container
<center>[Easily Harvest (Scrape) Web Pages](https://rvest.tidyverse.org/) [@rvest]</center>

------------------------------------------------------------------------

::: {layout="[10, 30]" layout-valign="center"}
![](img/chap04/logoi/logo-rvest-min.png){width="176"}

Wrappers around the {**xml2**} and {**httr**} packages to make it easy to download, then manipulate, `r glossary("HTML")` and `r glossary("XML")`.

------------------------------------------------------------------------

{**rvest**} helps you scrape (or harvest) data from web pages. It is designed to work with {**magrittr**} to make it easy to express common web scraping tasks, inspired by libraries like [beautiful soup](https://www.crummy.com/software/BeautifulSoup/) and [RoboBrowser](https://robobrowser.readthedocs.io/en/latest/readme.html).
:::

::::: my-remark
::: my-remark-header
{rvest}: Personal Evaluation
:::

::: my-remark-container
If you’re scraping multiple pages, Hadley Wickham highly recommends using {**rvest**} in concert with [{**polite**}](https://dmi3kno.github.io/polite/). The {**polite**} package ensures that you’re respecting the [robots.txt](https://en.wikipedia.org/wiki/Robots_exclusion_standard) and not hammering the site with too many requests.

It is important to read the introductory vignette [Web scraping 101](https://rvest.tidyverse.org/articles/rvest.html). It introduces you to the basics of web scraping with {**rvest**}. You’ll first learn the basics of HTML and how to use `r glossary("CSS")` selectors to refer to specific elements, then you’ll learn how to use {**rvest**} functions to get data out of `r glossary("HTML")` and into R.

A very important tool to get the appropriate CSS selector is SelectorGadget. To learn how to install and to use this tool read the \[SelectorGadget help page\]https://rvest.tidyverse.org/articles/selectorgadget.html) of {**rvest**}.
:::
:::::
:::::::
:::::::::

## scales {#sec-scales}

:::::: my-package
::: my-package-header
Package Profile: scales
:::

:::: my-package-container
<center>[Scale Functions for Visualization](https://scales.r-lib.org/) [@scales]</center>

------------------------------------------------------------------------

::: {layout="[10, 30]" layout-valign="center"}
![](img/chap02/logoi/logo-scales-min.png){width="176"}

Graphical scales map data to aesthetics, and provide methods for automatically determining breaks and labels for axes and legends.
:::

------------------------------------------------------------------------

One of the most difficult parts of any graphics package is scaling, converting from data values to perceptual properties. The inverse of scaling, making guides (legends and axes) that can be used to read the graph, is often even harder! The scales packages provides the internal scaling infrastructure used by ggplot2, and gives you tools to override the default breaks, labels, transformations and palettes.
::::
::::::

## scico {#sec-scico}

:::::: my-package
::: my-package-header
Package Profile: scico
:::

:::: my-package-container
<center>[Colour Palettes Based on the Scientific Colour-Maps](https://github.com/thomasp85/scico) [@scico]</center>

------------------------------------------------------------------------

::: {layout="[10, 30]" layout-valign="center"}
![](img/chap03/logoi/logo-scico-min.png){width="176"}

Color choice in information visualization is important in order to avoid being mislead by inherent bias in the used color palette. The {**scico**} package provides access to the perceptually uniform and color-blindness friendly palettes developed by [Fabio Crameri](https://www.fabiocrameri.ch/colourmaps/) and released under the "Scientific Color-Maps" moniker. The package contains 39 different palettes and includes both diverging and sequential types. It uses more or less the same `r glossary("APIx", "API")` as {**viridis**} and provides scales for {**ggplot2**} without requiring {**ggplot2**} to be installed.
:::

------------------------------------------------------------------------

**Features of {scico}**

-   Perceptually uniform
-   Perceptually ordered
-   Color-vision-deficiency (`r glossary("CVD")`) friendly
-   Readable in black & white prints
-   All color map types & classes in all major formats
-   Citable & reproducible
::::
::::::

## semTools {#sec-semTools}

:::::::: my-package
::: my-package-header
Package Profile: semTools
:::

:::::: my-package-container

------------------------------------------------------------------------

<center>[Useful Tools for Structural Equation Modeling](https://github.com/simsem/semTools/wiki) [@semTools]</center>

------------------------------------------------------------------------

(*There is no hexagon sticker available for {**semTools**}.*)

Provides tools for structural equation modeling, many of which extend the {**lavaan**} package; for example, to pool results from multiple imputations, probe latent interactions, or test measurement invariance.

::::: my-remark
::: my-remark-header
{semTools}: Personal Evaluation
:::

::: my-remark-container
This is a very specialized package. I believe I will not use it at the moment besides the function `semTools::skew()` and `semTools::kurtosis()`.
:::
:::::
::::::
::::::::

## sjlabelled {#sec-sjlabelled}

::::::::: my-package
::: my-package-header
Package Profile: sjlabelled
:::

::::::: my-package-container
<center>[Labelled Data Utility Functions](https://strengejacke.github.io/sjlabelled/) [@sjlabelled]</center>

------------------------------------------------------------------------

::: {layout="[10, 30]" layout-valign="center"}
![](img/chap01/logoi/logo-sjlabelled-min.png){width="176"}

Basically, this package covers reading and writing data between other statistical packages (like ‘SPSS’) and R, based on the haven and foreign packages; hence, this package also includes functions to make working with labelled data easier.
:::

------------------------------------------------------------------------

{**sjlabelled**} includes easy ways to get, set or change value and variable label attributes, to convert labelled vectors into factors or numeric (and vice versa), or to deal with multiple declared missing values.

::::: my-remark
::: my-remark-header
{sjlabelled}: Personal Evaluation
:::

::: my-remark-container
The prefix `sj` in {**sjlabelled**} (= in German Strenge Jacke, "strict jacket") refers to other work of [Daniel Lüdecke](https://www.uke.de/kliniken-institute/institute/medizinische-soziologie/team/profil_daniel_luedecke_ims.html), who has developed many R packages. All the `sj`-packages support labelled data.

His packages are divided in two approaches:

1.  Most packages are part pf the project [EasyStats](https://easystats.github.io/easystats/), that provides with 11 packages "An R Framework for Easy Statistical Modeling, Visualization, and Reporting", similar to the {**tidyverse**} collection. The {**easystats**} collection is orientated more to statistics, whereas {**tidyverse**} is more directed to data science.
2.  The other line of package development supports labelled data in combination with different R task like
    -   [Data and Variable Transformation Functions](https://cran.r-project.org/web/packages/sjmisc/index.html) {**sjmisc**},
    -   [Data Visualization for Statistics in Social Science](https://cran.r-project.org/web/packages/sjPlot/index.html) {**sjPlot**} and a
    -   [Collection of Convenient Functions for Common Statistical Computations](https://cran.r-project.org/web/packages/sjstats/index.html) {**sjStats**}. -Additionally there exists {**sjtable2df**} a package to [Convert 'sjPlot' HTML-Tables to R 'data.frame'](https://cran.r-project.org/web/packages/sjtable2df/index.html).
:::
:::::
:::::::
:::::::::

## sjPlot {#sec-sjPlot}

::::::::: my-package
::: my-package-header
Package Profile: sjPlot
:::

::::::: my-package-container
<center>[Data Visualization for Statistics in Social Science](https://strengejacke.github.io/sjPlot/) [@sjPlot]</center>

------------------------------------------------------------------------

::: {layout="[10, 30]" layout-valign="center"}
![](img/chap05/logoi/logo-sjPlot-min.png){width="176"}

Collection of plotting and table output functions for data visualization.
:::

------------------------------------------------------------------------

Results of various statistical analyses (that are commonly used in social sciences) can be visualized using this package, including simple and cross tabulated frequencies, histograms, box plots, (generalized) linear models, mixed effects models, PCA and correlation matrices, cluster analyses, scatter plots, Likert scales, effects plots of interaction terms in regression models, constructing index or score variables and much more.

::::: my-remark
::: my-remark-header
{sjPlot}: Personal Evaluation
:::

::: my-remark-container
The standard plot versions are easy to create, but to adapt the resulted graph is another issue. Although {**sjPlot**} uses in the background the {**ggplot2**} package, you can’t specify changes with ggplot2 commands. I tried it and it produced two different plots.

To customize plot appearance you have to learn the many arguments of of `sjPlot:set_theme()` and `sjPlot::plot_grpfrq()`. See the documentation of the [many specialized functions](https://strengejacke.github.io/sjPlot/reference/index.html#plot-customization) to tweak the default values.
:::
:::::
:::::::
:::::::::

## sjstats {#sec-sjstats}

:::::: my-package
::: my-package-header
Package Profile: sjstats
:::

:::: my-package-container
<center>[Collection of Convenient Functions for Common Statistical Computations](https://strengejacke.github.io/sjstats/) [@sjstats]</center>

------------------------------------------------------------------------

::: {layout="[10, 30]" layout-valign="center"}
![](img/chap04/logoi/logo-sjstats-min.png){width="176"}

Collection of convenient functions for common statistical computations, which are not directly provided by R’s base or stats packages.
:::

------------------------------------------------------------------------

This package aims at providing:

1.  sShortcuts for statistical measures, which otherwise could only be calculated with additional effort (like Cramer’s V, Phi, or effict size statistics like Eta or Omega squared), or for which currently no functions available.
2.  Another focus lies on weighted variants of common statistical measures and tests like weighted standard error, mean, t-test, correlation, and more.

The comprised tools include:

-   Especially for mixed models: design effect, sample size calculation
-   Weighted statistics and tests for: mean, median, standard error, standard deviation, correlation, Chi-squared test, t-test, Mann-Whitney-U-test
::::
::::::

## skimr {#sec-skimr}

::::::::: my-package
::: my-package-header
Package Profile: skimr
:::

::::::: my-package-container
<center>[Compact and Flexible Summaries of Data](https://docs.ropensci.org/skimr/) [@skimr]</center>

------------------------------------------------------------------------

::: {layout="[10, 30]" layout-valign="center"}
![](img/chap01/logoi/logo-skimr-min.png){width="176"}

A simple to use summary function that can be used with pipes and displays nicely in the console.
:::

------------------------------------------------------------------------

The default summary statistics may be modified by the user as can the default formatting. Support for data frames and vectors is included, and users can implement their own skim methods for specific object types as described in a vignette. Default summaries include support for inline spark graphs. Instructions for managing these on specific operating systems are given in the [Using skimr](https://docs.ropensci.org/skimr/articles/skimr.html) vignette and the [README](https://github.com/ropensci/skimr/#skimr-).

::::: my-remark
::: my-remark-header
{skimr}: Personal Evaluation
:::

::: my-remark-container
At the moment I am just using the `skimr::skim()` function. I believe most of the many other functions for adaption are oriented to developers. But still: I need to have a closer look to this package.
:::
:::::
:::::::
:::::::::

## statpsych {#sec-statpsych}

::::: my-package
::: my-package-header
Package Profile: statpsych
:::

::: my-package-container

------------------------------------------------------------------------

<center>[pkg_short-description](https://main_package_URL) [@statpsych]</center>

------------------------------------------------------------------------

(*There is no hexagon sticker available for {**statpsych**}.*)

{**statpsych**} implements confidence interval and sample size methods that are especially useful in psychological research.

The methods can be applied in 1-group, 2-group, paired-samples, and multiple-group designs and to a variety of parameters including means, medians, proportions, slopes, standardized mean differences, standardized linear contrasts of means, plus several measures of correlation and association.

The confidence intervals and sample size functions are applicable to single parameters as well as differences, ratios, and linear contrasts of parameters.

The sample size functions can be used to approximate the sample size needed to estimate a parameter or function of parameters with desired confidence interval precision or to perform a variety of hypothesis tests (directional two-sided, equivalence, superiority, noninferiority) with desired power. For details, see: https://dgbonett.sites.ucsc.edu/.
:::
:::::

## stringi {#sec-stringi}

:::::: my-package
::: my-package-header
Package Profile: stringi
:::

:::: my-package-container
<center>[Fast and Portable Character String Processing Facilities](https://stringi.gagolewski.com/) [@stringi]</center>

------------------------------------------------------------------------

::: {layout="[10, 30]" layout-valign="center"}
![](img/chap01/logoi/logo-stringi-min.png){width="176"}

A collection of character string/text/natural language processing tools for pattern searching (e.g., with 'Java'-like regular expressions or the 'Unicode' collation algorithm), random string generation, case mapping, string transliteration, concatenation, sorting, padding, wrapping, Unicode normalization, date-time formatting and parsing, and many more.
:::

------------------------------------------------------------------------

The {**stringi**} tools are fast, consistent, convenient, and - thanks to `r glossary("ICU")` ([International Components for Unicode](https://icu.unicode.org/home)) - portable across all locales and platforms. Documentation about {**stringi**} is provided via its website at <https://stringi.gagolewski.com/> and the paper by Gagolewski [-@gagolewski2022].
::::
::::::

## stringr {#sec-stringr}

::::::::: my-package
::: my-package-header
Package Profile: stringr
:::

::::::: my-package-container
<center>[Simple, Consistent Wrappers for Common String Operations](https://stringr.tidyverse.org) [@stringr]</center>

------------------------------------------------------------------------

::: {layout="[10, 30]" layout-valign="center"}
![](img/chap01/logoi/logo-stringr-min.png){width="176"}

A consistent, simple and easy to use set of wrappers around the fantastic 'stringi' package. All function and argument names (and positions) are consistent, all functions deal with "NA"'s and zero length vectors in the same way, and the output from one function is easy to feed into the input of another.
:::

------------------------------------------------------------------------

Strings are not glamorous, high-profile components of R, but they do play a big role in many data cleaning and preparation tasks. The {**stringr**} package provides a cohesive set of functions designed to make working with strings as easy as possible. If you’re not familiar with strings, the best place to start is the [chapter on strings](https://r4ds.hadley.nz/strings) in R for Data Science.

::::: my-remark
::: my-remark-header
{stringr}: Personal Evaluation
:::

::: my-remark-container
Even if I had not used {**stringi**} in this book I will add this package profile into the appropriate section @sec-stringi.
:::
:::::
:::::::
:::::::::

## tableone {#sec-tableone}

:::::::: my-package
::: my-package-header
Package Profile: tableone
:::

:::::: my-package-container

------------------------------------------------------------------------

<center>[Create 'Table 1' to Describe Baseline Characteristics with or without Propensity Score Weights](https://github.com/kaz-yos/tableone) [@tableone]</center>

------------------------------------------------------------------------

(*There is no hexagon sticker available for {**tableone**}.*)

Creates 'Table 1', i.e., description of baseline patient characteristics, which is essential in every medical research. Supports both continuous and categorical variables, as well as p-values and standardized mean differences. Weighted data are supported via the {**survey**} package.

::::: my-remark
::: my-remark-header
{tableone}: Personal Evaluation
:::

::: my-remark-container
Instead of using {**tableone**} I will use {**gtsummry**} in conjunction with {**gt**}.
:::
:::::
::::::
::::::::

## tabulizer {#sec-tabulizer}

:::::::::: my-package
::: my-package-header
Package Profile: tabulizer
:::

:::::::: my-package-container

------------------------------------------------------------------------

<center>[Bindings for 'Tabula' PDF Table Extractor Library](https://docs.ropensci.org/tabulizer) [@tabulizer]</center>

------------------------------------------------------------------------

(*There is no hexagon sticker available for {**tabulizer**}.*)

{**tabulizer**} provides R bindings to the [Tabula java library](https://github.com/tabulapdf/tabula-java/), which can be used to computationaly extract tables from PDF documents. The {**tabulizerjars**} package <https://github.com/ropensci/tabulizerjars> provides versioned 'Java' .jar files, including all dependencies, aligned to releases of 'Tabula'.

{**tabulizer**} depends on {**rJava**}, which implies a system requirement for Java. This can be frustrating, especially on Windows.

::::::: my-remark
::: my-remark-header
{tabulizer}: Personal Evaluation
:::

::::: my-remark-container
I just noticed that {**tabulizer**} was removed from the CRAN repository. But you can still install it from the [CRAN archive](https://cran.r-project.org/src/contrib/Archive/tabulizer/) --- or even better --- from the [GitHub site](https://github.com/ropensci/tabulizer/). I have installed it several years ago (version 0.2.3) and it works smoothly.

I have looked and tested alternatives, but nothing worked satisfactorily:

[{**pdftool**}](https://docs.ropensci.org/pdftools/): A great tool to scrap text from PDFs, but not so good with tables: "It is possible to use {**pdftools**} with some creativity to parse tables from PDF documents, which does not require Java to be installed."

An example how to do that is explained in [How to extract data tables from PDF in r Tutorial](How%20to%20extract%20data%20tables%20from%20PDF%20in%20r%20Tutorial), a video by Data Centrics Inc. Another approach can be found on [StackOverflow](https://stackoverflow.com/questions/60127375/using-the-pdf-data-function-from-the-pdftools-package-efficiently?rq=2). But both procedures are way to complex and I have to say that it does not repays the effort, especially with the small example table in the video tutorial. It would be much easier to use other tools, for instance on macOS with the app [TextSniper](https://www.textsniper.app/en) or even input the figures manually.

[{**PDE**}](https://cran.r-project.org/web/packages/PDE/vignettes/PDE.html) PDF Data extractor (PDE) seems the right tool for the task because it should "Extract Tables and Sentences from PDFs with User Interface". I couldn't work with interactive user interface because I has many different options and I didn't have time to study them thoroughly. But I succeeded with the programming interface, although the result had some errors. Part of columns appeared in extra columns. So these errors were easy to detect and to repair.

With the following code I could extract all 13 tables from the ATF document and could also scrap the pictures in the PDF and convert them to PNGs.

```         
atf_tables <- PDE::PDE_pdfs2table(
    pdfs = "data/chap03/firearms_commerce_2019.pdf",
    out = "data/chap03/test/",
    table.heading.words = "Exhibit",
    out.table.format = ".csv (macintosh)"
)
```

It took me about 20-30 seconds and I got the following message:

> Following file is processing: 'firearms_commerce_2019.pdf' No filter words chosen for analysis. 13 table(s) found in 'firearms_commerce_2019.pdf'. Analysis of 'firearms_commerce_2019.pdf' complete.

Maybe the interactive UI would also work, but as I am very content with {**tabulizer**} I did not delve deeply into {**PDE**}.

:::: my-important
::: my-important-header
My recommendation: As the first choice try to install and use {**tabulizer**}. If this does not work for you, try {**PDE**}.
:::
::::
:::::
:::::::
::::::::
::::::::::

## tibble {#sec-tibble}

:::::: my-package
::: my-package-header
Package Profile: tibble
:::

:::: my-package-container
<center>[Simple Data Frames](https://tibble.tidyverse.org/) [@tibble]</center>

------------------------------------------------------------------------

::: {layout="[10, 30]" layout-valign="center"}
![](img/chap01/logoi/logo-tibble-min.png){width="176"}

A tibble, or `tbl_df`, is a modern reimagining of the data.frame, keeping what time has proven to be effective, and throwing out what is not.
:::

------------------------------------------------------------------------

Tibbles are data.frames that are lazy and surly: they do less (i.e. they don’t change variable names or types, and don’t do partial matching) and complain more (e.g. when a variable does not exist). This forces you to confront problems earlier, typically leading to cleaner, more expressive code. Tibbles also have an enhanced `print()` method which makes them easier to use with large datasets containing complex objects.
::::
::::::

## tidyr {#sec-tidyr}

:::::: my-package
::: my-package-header
Package Profile: tidyr
:::

:::: my-package-container
<center>[Tidy Messy Data](https://tidyr.tidyverse.org/) [@tidyr]</center>

------------------------------------------------------------------------

::: {layout="[10, 30]" layout-valign="center"}
![](img/chap01/logoi/logo-tidyr-min.png){width="176"}

The goal of {**tidyr**} is to help you create tidy data. Tidy data describes a standard way of storing data that is used wherever possible throughout the {**tidyverse**}. If you ensure that your data is tidy, you’ll spend less time fighting with the tools and more time working on your analysis.
:::

------------------------------------------------------------------------

Tidy data is data where: - Every column is a variable. - Every row is an observation. - Every cell is a single value.
::::
::::::

## tidyselect {#sec-tidyselect}

::::: my-package
::: my-package-header
Package Profile: tidyselect
:::

::: my-package-container

------------------------------------------------------------------------

<center>[Select from a Set of Strings](https://tidyselect.r-lib.org/index.html) [@tidyselect]</center>

------------------------------------------------------------------------

(*There is no hexagon sticker available for {**tidyselect**}.*)

The {**tidyselect**} package is the backend of functions like `dplyr::select()` or `dplyr::pull()` as well as several {**tidyr**} verbs. It allows you to create selecting verbs that are consistent with other {**tidyverse**} packages.

To learn about the selection syntax as a user of {**dplyr**} or {**tidyr**}, read the user-friendly [?language](https://tidyselect.r-lib.org/reference/language.html) reference.

To learn how to implement tidyselect in your own functions, read [vignette("tidyselect")](https://tidyselect.r-lib.org/articles/tidyselect.html).

To learn exactly how the {**tidyselect**} syntax is interpreted, read the technical description in [vignette("syntax")](https://tidyselect.r-lib.org/articles/syntax.html).
:::
:::::

## tidyverse {#sec-tidyverse}

::::::::: my-package
::: my-package-header
Package Profile: tidyverse
:::

::::::: my-package-container
<center>[Easily Install and Load the 'Tidyverse'](https://www.tidyverse.org/) [@tidyverse]</center>

------------------------------------------------------------------------

::: {layout="[10, 30]" layout-valign="center"}
![](img/chap01/logoi/logo-tidyverse-min.png){width="176"}

The {**tidyverse**} is an opinionated [collection of R packages](https://www.tidyverse.org/packages/) designed for data science.
:::

------------------------------------------------------------------------

All packages share an underlying design philosophy, grammar, and data structures [@tidyverse]. Read more about the philosophy and purpose: [The tidy tools manifesto](https://tidyverse.tidyverse.org/articles/manifesto.html) and [Welcome to the {**tidyverse**}](https://tidyverse.tidyverse.org/articles/paper.html)

::::: my-remark
::: my-remark-header
{tidyverse}: Personal Evaluation
:::

::: my-remark-container
In this book I am not going to load {**tidyverse**} with all its packages. Instead I am using the `<package>::<function>` format to access the commands. Explicitly mentioned the used packages with every function call helps me to learn which package is responsible for which function.
:::
:::::
:::::::
:::::::::

## vcd {#sec-vcd}

::::: my-package
::: my-package-header
Package Profile: vcd
:::

::: my-package-container

------------------------------------------------------------------------

<center>[Visualizing Categorical Data](https://cran.r-project.org/web/packages/vcd/index.html) [@vcd]</center>

------------------------------------------------------------------------

(*There is no hexagon sticker available for {**vcd**}.*)

Visualization techniques, data sets, summary and inference procedures aimed particularly at categorical data. Special emphasis is given to highly extensible grid graphics. The package was package was originally inspired by the book "Visualizing Categorical Data" by Michael Friendly and is now the main support package for a new book, "Discrete Data Analysis with R" by Michael Friendly and David Meyer [-@friendly2015].
:::
:::::

## vcdExtra {#sec-vcdExtra}

:::::: my-package
::: my-package-header
Package Profile: vcdExtra
:::

:::: my-package-container
<center>['vcd' Extensions and Additions](https://friendly.github.io/vcdExtra/) [@vcdExtra]</center>

------------------------------------------------------------------------

::: {layout="[10, 30]" layout-valign="center"}
![](img/chap03/logoi/logo-vcdExtra-min.png){width="176"}

This package provides additional data sets, documentation, and many functions designed to extend the vcd package for Visualizing Categorical Data and the {**gnm**} package for Generalized Nonlinear Models. In particular, {**vcdExtra**} extends mosaic, assoc and sieve plots from {**vcd**} to handle `stats::glm()` and `gnm::gnm()` models and adds a 3D version in `vcdExtra::mosaic3d()`.
:::

------------------------------------------------------------------------

{**vcdExtra**} is a support package for the book Discrete Data Analysis with R (DDAR) by Michael Friendly and David Meyer [-@friendly2015]. There is also a [web site for DDAR](http://ddar.datavis.ca/) with all figures and code samples from the book. It is also used in Friendly's graduate course, [Psy 6136: Categorical Data Analysis](https://friendly.github.io/psy6136/).
::::
::::::

## viridis {#sec-viridis}

:::::: my-package
::: my-package-header
Package Profile: viridis
:::

:::: my-package-container
<center>[Colorblind-Friendly Color Maps for R](https://sjmgarnier.github.io/viridis/) [@viridis]</center>

------------------------------------------------------------------------

::: {layout="[10, 30]" layout-valign="center"}
![](img/chap03/logoi/logo-viridis-min.png){width="176"}

{**viridis**}, and its companion package {**viridisLite**} provide a series of color maps that are designed to improve graph readability for readers with common forms of color blindness and/or color vision deficiency. The color maps are also perceptually-uniform, both in regular form and also when converted to black-and-white for printing.
:::

------------------------------------------------------------------------

{**viridisLite**} provides the base functions for generating the color maps in base R. The package is meant to be as lightweight and dependency-free as possible for maximum compatibility with all the R ecosystem. {**viridis**} provides additional functionalities, in particular bindings for {**ggplot2**}.
::::
::::::

## waffle {#sec-waffle}

::::: my-package
::: my-package-header
Package Profile: waffle
:::

::: my-package-container

------------------------------------------------------------------------

<center>[Create Waffle Chart Visualizations](https://cinc.rud.is/web/packages/waffle/index.html) [@waffle]</center>

------------------------------------------------------------------------

(*There is no hexagon sticker available for {**waffle**}.*)

Square pie charts (a.k.a. waffle charts) can be used to communicate parts of a whole for categorical quantities. To emulate the percentage view of a pie chart, a 10x10 grid should be used with each square representing 1% of the total.

Modern uses of waffle charts do not necessarily adhere to this rule and can be created with a grid of any rectangular shape.

Best practices suggest keeping the number of categories small, just as should be done when creating pie charts.

Tools are provided to create waffle charts as well as stitch them together, and to use glyphs for making isotype pictograms.

It uses {**ggplot2**} and returns a `ggplot2` object.
:::
:::::

## withr {#sec-withr}

:::::: my-package
::: my-package-header
Package Profile: withr
:::

:::: my-package-container
<center>[Run Code 'With' Temporarily Modified Global State](https://withr.r-lib.org/) [@withr]</center>

------------------------------------------------------------------------

::: {layout="[10, 30]" layout-valign="center"}
![](img/chap03/logoi/logo-withr-min.png){width="176"}

A set of functions to run code with safely and temporarily modified global state, {**withr**} makes working with the global state, i.e. side effects, less error-prone.
:::

------------------------------------------------------------------------

Pure functions, such as the `sum()` function, are easy to understand and reason about: they always map the same input to the same output and have no other impact on the workspace. In other words, pure functions have no *side effects*: they are not affected by, nor do they affect, the global state in any way apart from the value they return.

The purpose of the {**withr**} package is to help you manage side effects in your code. You may want to run code with secret information, such as an API key, that you store as an environment variable. You may also want to run code with certain options, with a given random-seed, or with a particular working-directory.

The {**withr**} package helps you manage these situations, and more, by providing functions to modify the global state temporarily, and safely. These functions modify one of the global settings for duration of a block of code, then automatically reset it after the block is completed.
::::
::::::

## qualvar {#sec-qualvar}

::::: my-package
::: my-package-header
Package Profile: qualvar
:::

::: my-package-container

------------------------------------------------------------------------

<center>[Implements Indices of Qualitative Variation Proposed by Wilcox (1973) \[\@wilcox1973\]](http://joelgombin.github.io/qualvar/) [@qualvar]</center>

------------------------------------------------------------------------

(*There is no hexagon sticker available for {**qualvar**}.*)

In 1973, Wilcox published a paper presenting various indices of qualitative variation for social scientists. The problem is to find relevant statistical indices to measure the variation in nominal-scale (i.e. qualitative or categorical) data. Please see the Wilcox paper for more details on the rationale [@wilcox1973].

Wilcox presents six indices that can be used to measure qualitative variation. The {**qualvar**} package implements these indices so that R users can easily use them.
:::
:::::