Skip to content

Commit

Permalink
big changes to workshop
Browse files Browse the repository at this point in the history
  • Loading branch information
siobhanschenk committed May 24, 2024
1 parent be14d44 commit 0313c5f
Show file tree
Hide file tree
Showing 5 changed files with 65 additions and 180 deletions.
53 changes: 15 additions & 38 deletions content/columns.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,15 +20,16 @@ Input
```r
# Sample code from the dplyr cheat sheet
## 3.1 Select columns
pull(mtcars, wt)
select(mtcars, mpg, wt)
pullmt=pull(mtcars, wt)
selectmt=select(mtcars, mpg, wt)

## 3.2 Arrange columns
relocate(mtcars, mpg, cyl, .after = last_col())
relocatemt=relocate(mtcars, mpg, cyl, .after = last_col())

## 3.3 Add columns
mutate(mtcars, gpm = 1 / mpg)
transmute(mtcars, gpm = 1 / mpg)
mutmt=mutate(mtcars, gpm = 1 / mpg)
tmt=transmute(mtcars, gpm = 1 / mpg)

### A showcase for bind_cols
x <- data.frame(
A = c('a', 'b', 'c'),
Expand All @@ -37,49 +38,26 @@ x <- data.frame(
x
y <- data.frame(
E = c('a', 'b', 'd'),
F = c('t', 'u', 'w'),
G = c(3, 2, 1))
G = c('t', 'u', 'w'), #skip F because F = FALSE, so it's best not to use that letter.
H = c(3, 2, 1))
y
bind_cols(x, y) # you have to make sure the binding is meaningful by yourself
# You have to make sure the binding is meaningful by yourself.
# This is not the same as joins
bind_cols(x, y)

## 3.4 Rename columns
rename(cars, distance = dist)
```


## 3.1 Select Columns
![implicit](images/select_cols.png)
<div style="text-align: right">
<p>Figure Source: <a href="https://raw.githubusercontent.com/rstudio/cheatsheets/main/data-transformation.pdf">dplyr Cheat Sheet</a>, CC BY SA Posit Software, PBC</p>
</div>

## 3.2 Arrange Columns
![implicit](images/arrange_cols.png)
<div style="text-align: right">
<p>Figure Source: <a href="https://raw.githubusercontent.com/rstudio/cheatsheets/main/data-transformation.pdf">dplyr Cheat Sheet</a>, CC BY SA Posit Software, PBC</p>
</div>

## 3.3 Add Columns
![implicit](images/add_cols.png)
![implicit](images/bind_cols.png)
<div style="text-align: right">
<p>Figure Source: <a href="https://raw.githubusercontent.com/rstudio/cheatsheets/main/data-transformation.pdf">dplyr Cheat Sheet</a>, CC BY SA Posit Software, PBC</p>
</div>

## 3.4 Rename Columns
![implicit](images/rename_cols.png)
<div style="text-align: right">
<p>Figure Source: <a href="https://raw.githubusercontent.com/rstudio/cheatsheets/main/data-transformation.pdf">dplyr Cheat Sheet</a>, CC BY SA Posit Software, PBC</p>
</div>

## Practice 3
## Practice 2
`iris` is a data frame with 150 cases (rows) and 5 variables (columns) named `Sepal.Length`, `Sepal.Width`, `Petal.Length`, `Petal.Width`, and `Species`. Make a new data frame which contains only `Species` and the ratio of `Petal.Width` and `Petal.Length`.
<details>
<summary><u>Click here for solution</u></summary>
<div style="border: thin grey 1px; background-color: #eeebee; padding:15px;">
<p>
my_iris1 <- mutate(iris, Petal.Width.Length.Ratio = Petal.Width/Petal.Length) <br>
select(my_iris1, Species, Petal.Width.Length.Ratio)
my_iris1 <- iris %>%
mutate(Petal.Width.Length.Ratio = Petal.Width/Petal.Length) %>%
select(Species, Petal.Width.Length.Ratio)
</p>
</div>
</details>
Expand All @@ -89,4 +67,3 @@ rename(cars, distance = dist)


This page is meant to introduce functions that help manipulate columns.
A pause here for questions.
48 changes: 27 additions & 21 deletions content/introduction.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,20 +10,8 @@ has_toc: false

## 1.1 What is *dplyr*?

> "A Grammar of Data Manipulation: a fast, consistent tool for working with data frame like objects, both in memory and out of memory."
<div style="text-align: right">
<p><a href="https://cran.r-project.org/web/packages/dplyr/dplyr.pdf">R document (r-project.org)</a></p>
</div>
> "A grammar of data manipulation, providing a consistent set of verbs that help you solve the most common data manipulation challenges"
<div style="text-align: right">
<p><a href="https://dplyr.tidyverse.org/">dplyr (tidyverse.org)</a></p>
</div>
> "A part of the tidyverse meta-package and, as a core package, it is among those loaded via `library(tidyverse)`".
<div style="text-align: right">
<p><a href="https://stat545.com/dplyr-intro.html">Introduction to dplyr in STAT 545</a></p>
</div>
A part of the tidyverse meta-package that facilitates data manipulation.
As with all other tidyverse pacakges, dplyr has <a href="https://dplyr.tidyverse.org/" target="_blank">extensive documentation</a> and <a href="https://www.rstudio.com/wp-content/uploads/2015/02/data-wrangling-cheatsheet.pdf" target="_blank">cheat sheets</a> available


## Practice 1
Expand Down Expand Up @@ -53,14 +41,32 @@ if(!require(tidyverse)) # checks if package can be loaded

## 1.2 Tidy data and pipes

![implicit](images/dplyr_implicit.png)
<div style="text-align: right">
<p>Figure Source: <a href="https://raw.githubusercontent.com/rstudio/cheatsheets/main/data-transformation.pdf">dplyr Cheat Sheet</a></p>
</div>
&nbsp;
&nbsp;
The pipe operator `%>%` takes the thing on its left side and feeds that to its right side. You can read it as "then".

The use of pipe operator is based on some patterns shared by most of the functions from the dplyr package:
* The input is a data frame, which is usually the first argument for a function.
* The output is also a data frame.

Therefore, you can pass on the output from a function as input for the next function.


Note: you can use CTRL+Shift+M (PC) or CMD+Shift+M (Mac) as a keyboard shortcut for `%>%`. If you use the keyboard shortcut, your pipe will look like this `|>`.

Getting grouped summaries is a common data exploration task, and it usually requires multiple steps. For example, in the dplyr cheat sheet, the following sample code first group the cases in `mtcars` data by the `cyl` variable, i.e., group the cars by number of cylinders, and then calculate the average `mpg` miles per gallon for each group.

Input
{: .label .label-green}
```r
# without using pipes - operations are nested
newmtcars = summarise(group_by(mtcars, cyl), avg = mean(mpg))

# with pipes - easier to read because each command is separated and in order
newmtcars = mtcars %>%
group_by(cyl) %>%
summarise(avg = mean(mpg))


```


This page is meant to introduce the dplyr package briefly and get you ready move on to learn functions from dplyr.
A pause here for questions.
71 changes: 0 additions & 71 deletions content/pipe.md

This file was deleted.

60 changes: 16 additions & 44 deletions content/rows.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ has_toc: false

# 2 Manipulating Rows

| Select rows | `filter()`, `distinct()`, `slice()` family
| Select rows | `filter()`, `distinct()`, `slice()`
| Arrange rows | `arrange()`
| Add rows | `add_row()`, `bind_rows()`

Expand All @@ -26,21 +26,17 @@ mtcars

# Sample code from the dplyr cheat sheet
## 2.1 Select cases
filter(mtcars, mpg > 20)
distinct(mtcars, gear)
slice(mtcars, 10:15)
slice_sample(mtcars, n = 5, replace = TRUE)
slice_min(mtcars, mpg, prop = 0.25)
slice_max(mtcars, mpg, prop = 0.25)
slice_head(mtcars, n = 5)
slice_tail(mtcars, n = 5)
over20=filter(mtcars, mpg > 20)
distinctmt=distinct(mtcars, gear)
slicemt=slice(mtcars, 10:15)
headmt=slice_head(mtcars, n = 5)
tailmt=slice_tail(mtcars, n = 5)

## 2.2 Arrange cases
arrange(mtcars, mpg)
arrange(mtcars, desc(mpg))
## 2.2 Arrange observations
arrmt=arrange(mtcars, mpg)
descmt=arrange(mtcars, desc(mpg))

## 2.3 Add cases
add_row(cars, speed = 1, dist = 1)
## 2.3 Make dataframes and bind them together
### A showcase for bind_rows
x <- data.frame(
A = c('a', 'b'),
Expand All @@ -55,48 +51,24 @@ y
bind_rows(x, y)
```

## 2.1 Select Cases
![implicit](images/select_cases.png)
<div style="text-align: right">
<p>Figure Source: <a href="https://raw.githubusercontent.com/rstudio/cheatsheets/main/data-transformation.pdf">dplyr Cheat Sheet</a>, CC BY SA Posit Software, PBC</p>
</div>

## 2.2 Arrange Cases
![implicit](images/arrange_cases.png)
<div style="text-align: right">
<p>Figure Source: <a href="https://raw.githubusercontent.com/rstudio/cheatsheets/main/data-transformation.pdf">dplyr Cheat Sheet</a>, CC BY SA Posit Software, PBC</p>
</div>

## 2.3 Add Cases
![implicit](images/add_cases.png)
![implicit](images/bind_rows.png)
<div style="text-align: right">
<p>Figure Source: <a href="https://raw.githubusercontent.com/rstudio/cheatsheets/main/data-transformation.pdf">dplyr Cheat Sheet</a>, CC BY SA Posit Software, PBC</p>
</div>

## Practice 2
## Practice 1
`iris` is a data frame with 150 cases (rows) and 5 variables (columns) such as `Petal.Width` and `Species`. In the `iris` data set, the cases with the minimum and maximum petal width belong to what species?
<details>
<summary><u>Click here for solutions</u></summary>
<div style="border: thin grey 1px; background-color: #eeebee; padding:15px;">
<p>
# solution 1 <br>
arrange(iris, Petal.Width) <br>
# solution 2 <br>
slice_min(iris, Petal.Width, prop = 0.01) <br>
slice_max(iris, Petal.Width, prop = 0.01) <br>
slice_min(iris, Petal.Width) <br>
slice_max(iris, Petal.Width) <br>
<br>
# The case with the minimum petal width belongs to setosa. <br>
# The case with the maximum petal width belongs to virginica.
</p>
</div>
</details>
&nbsp;
&nbsp;



This page is meant to introduce functions that help manipulate rows.
A pause here for questions.


&nbsp;
13 changes: 7 additions & 6 deletions index.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,21 +16,22 @@ Participants are expected to have installed the latest versions of R and RStudio
Install R from [https://cran.rstudio.com/](https://cran.rstudio.com/)
Install RStudio from [https://rstudio.com/products/rstudio/download/#download](https://rstudio.com/products/rstudio/download/#download)

Please note that while we go through many examples during the workshop, we do not cover all of the operations available in dplyr.

## Learning objectives

At the end of this workshop, you will be able to:
1. Understand what the *dplyr* package is used for
2. Manipulate rows to select, arrange, and add cases
3. Manipulate columns to select, arrange, add, and rename variables
4. Use pipe operator to combine multiple manipulations together
2. Use pipe operator to combine multiple manipulations together
3. Manipulate rows to select, arrange, and add cases
4. Manipulate columns to select, arrange, add, and rename variables

## Schedule

| 0:00 - 0:05 | Welcome and using Zoom|
| 0:05 - 0:15 | Intro to *dplyr* |
| 0:15 - 0:30 | Manipulating rows |
| 0:30 - 0:45 | Manipulating columns |
| 0:45 - 0:55 | Combining multiple manipulations |
| 0:15 - 0:35 | Manipulating rows |
| 0:35 - 0:50 | Manipulating columns |
| 0:55 - 1:00 | Recap |
| 1:00 - 1:30 | Optional Q&A |

Expand Down

0 comments on commit 0313c5f

Please sign in to comment.