big changes to workshop

ubc-library-rc · May 24, 2024 · 0313c5f · 0313c5f
1 parent be14d44
commit 0313c5f
Show file tree

Hide file tree

Showing 5 changed files with 65 additions and 180 deletions.
diff --git a/content/columns.md b/content/columns.md
@@ -20,15 +20,16 @@ Input
 ```r
 # Sample code from the dplyr cheat sheet
 ## 3.1 Select columns
-pull(mtcars, wt)
-select(mtcars, mpg, wt)
+pullmt=pull(mtcars, wt)
+selectmt=select(mtcars, mpg, wt)
 
 ## 3.2 Arrange columns
-relocate(mtcars, mpg, cyl, .after = last_col())
+relocatemt=relocate(mtcars, mpg, cyl, .after = last_col())
 
 ## 3.3 Add columns
-mutate(mtcars, gpm = 1 / mpg)
-transmute(mtcars, gpm = 1 / mpg)
+mutmt=mutate(mtcars, gpm = 1 / mpg)
+tmt=transmute(mtcars, gpm = 1 / mpg)
+
 ### A showcase for bind_cols
 x <- data.frame(
   A = c('a', 'b', 'c'), 
@@ -37,49 +38,26 @@ x <- data.frame(
 x
 y <- data.frame(
   E = c('a', 'b', 'd'), 
-  F = c('t', 'u', 'w'),
-  G = c(3, 2, 1))
+  G = c('t', 'u', 'w'), #skip F because F = FALSE, so it's best not to use that letter.
+  H = c(3, 2, 1))
 y
-bind_cols(x, y) # you have to make sure the binding is meaningful by yourself
+# You have to make sure the binding is meaningful by yourself. 
+# This is not the same as joins
+bind_cols(x, y) 
 
 ## 3.4 Rename columns
 rename(cars, distance = dist)
 ```
 
-
-## 3.1 Select Columns 
-![implicit](images/select_cols.png)
-<div style="text-align: right">
-	<p>Figure Source: <a href="https://raw.githubusercontent.com/rstudio/cheatsheets/main/data-transformation.pdf">dplyr Cheat Sheet</a>, CC BY SA Posit Software, PBC</p>
-</div>
-
-## 3.2 Arrange Columns
-![implicit](images/arrange_cols.png)
-<div style="text-align: right">
-	<p>Figure Source: <a href="https://raw.githubusercontent.com/rstudio/cheatsheets/main/data-transformation.pdf">dplyr Cheat Sheet</a>, CC BY SA Posit Software, PBC</p>
-</div>
-
-## 3.3 Add Columns
-![implicit](images/add_cols.png)
-![implicit](images/bind_cols.png)
-<div style="text-align: right">
-	<p>Figure Source: <a href="https://raw.githubusercontent.com/rstudio/cheatsheets/main/data-transformation.pdf">dplyr Cheat Sheet</a>, CC BY SA Posit Software, PBC</p>
-</div>
-
-## 3.4 Rename Columns
-![implicit](images/rename_cols.png)
-<div style="text-align: right">
-	<p>Figure Source: <a href="https://raw.githubusercontent.com/rstudio/cheatsheets/main/data-transformation.pdf">dplyr Cheat Sheet</a>, CC BY SA Posit Software, PBC</p>
-</div>
-
-## Practice 3
+## Practice 2
 `iris` is a data frame with 150 cases (rows) and 5 variables (columns) named `Sepal.Length`, `Sepal.Width`, `Petal.Length`, `Petal.Width`, and `Species`. Make a new data frame which contains only `Species` and the ratio of `Petal.Width` and `Petal.Length`.
 <details>
 	<summary><u>Click here for solution</u></summary>
 	<div style="border: thin grey 1px; background-color: #eeebee; padding:15px;">
 		<p>
-		my_iris1 <- mutate(iris, Petal.Width.Length.Ratio = Petal.Width/Petal.Length) <br>
-		select(my_iris1, Species, Petal.Width.Length.Ratio)
+		my_iris1 <- iris %>% 
+		mutate(Petal.Width.Length.Ratio = Petal.Width/Petal.Length) %>%
+		select(Species, Petal.Width.Length.Ratio)
 		</p>
     </div>
 </details>
@@ -89,4 +67,3 @@ rename(cars, distance = dist)
 
 
 This page is meant to introduce functions that help manipulate columns.  
-A pause here for questions.
diff --git a/content/introduction.md b/content/introduction.md
@@ -10,20 +10,8 @@ has_toc: false
 
 ## 1.1 What is *dplyr*?
 
-> "A Grammar of Data Manipulation: a fast, consistent tool for working with data frame like objects, both in memory and out of memory."
-<div style="text-align: right">
-	<p><a href="https://cran.r-project.org/web/packages/dplyr/dplyr.pdf">R document (r-project.org)</a></p>
-</div>
-
-> "A grammar of data manipulation, providing a consistent set of verbs that help you solve the most common data manipulation challenges"
-<div style="text-align: right">
-	<p><a href="https://dplyr.tidyverse.org/">dplyr (tidyverse.org)</a></p>
-</div>
-
-> "A part of the tidyverse meta-package and, as a core package, it is among those loaded via `library(tidyverse)`".
-<div style="text-align: right">
-	<p><a href="https://stat545.com/dplyr-intro.html">Introduction to dplyr in STAT 545</a></p>
-</div>
+A part of the tidyverse meta-package that facilitates data manipulation.
+As with all other tidyverse pacakges, dplyr has <a href="https://dplyr.tidyverse.org/" target="_blank">extensive documentation</a> and <a href="https://www.rstudio.com/wp-content/uploads/2015/02/data-wrangling-cheatsheet.pdf" target="_blank">cheat sheets</a> available
 
 
 ## Practice 1
@@ -53,14 +41,32 @@ if(!require(tidyverse)) # checks if package can be loaded
 
 ## 1.2 Tidy data and pipes
 
-![implicit](images/dplyr_implicit.png)
-<div style="text-align: right">
-	<p>Figure Source: <a href="https://raw.githubusercontent.com/rstudio/cheatsheets/main/data-transformation.pdf">dplyr Cheat Sheet</a></p>
-</div>
-&nbsp;    
-&nbsp;    
+The pipe operator `%>%` takes the thing on its left side and feeds that to its right side. You can read it as "then".
+
+The use of pipe operator is based on some patterns shared by most of the functions from the dplyr package:
+* The input is a data frame, which is usually the first argument for a function.
+* The output is also a data frame.  
+
+Therefore, you can pass on the output from a function as input for the next function.
+
+
+Note: you can use CTRL+Shift+M (PC) or CMD+Shift+M (Mac) as a keyboard shortcut for `%>%`. If you use the keyboard shortcut, your pipe will look like this `|>`.
+
+Getting grouped summaries is a common data exploration task, and it usually requires multiple steps. For example, in the dplyr cheat sheet, the following sample code first group the cases in `mtcars` data by the `cyl` variable, i.e., group the cars by number of cylinders, and then calculate the average `mpg` miles per gallon for each group. 
 
+Input
+{: .label .label-green}
+```r
+# without using pipes - operations are nested
+newmtcars = summarise(group_by(mtcars, cyl), avg = mean(mpg))
+
+# with pipes - easier to read because each command is separated and in order
+newmtcars = mtcars %>% 
+  group_by(cyl) %>%
+  summarise(avg = mean(mpg))
+
+
+```
 
 
 This page is meant to introduce the dplyr package briefly and get you ready move on to learn functions from dplyr.  
-A pause here for questions.
diff --git a/content/pipe.md b/content/pipe.md
diff --git a/content/rows.md b/content/rows.md
@@ -8,7 +8,7 @@ has_toc: false
 
 # 2 Manipulating Rows
 
-| Select rows | `filter()`, `distinct()`, `slice()` family
+| Select rows | `filter()`, `distinct()`, `slice()` 
 | Arrange rows | `arrange()`
 | Add rows | `add_row()`, `bind_rows()`
 
@@ -26,21 +26,17 @@ mtcars
 
 # Sample code from the dplyr cheat sheet
 ## 2.1 Select cases
-filter(mtcars, mpg > 20)
-distinct(mtcars, gear)
-slice(mtcars, 10:15)
-slice_sample(mtcars, n = 5, replace = TRUE)
-slice_min(mtcars, mpg, prop = 0.25)
-slice_max(mtcars, mpg, prop = 0.25)
-slice_head(mtcars, n = 5)
-slice_tail(mtcars, n = 5)
+over20=filter(mtcars, mpg > 20)
+distinctmt=distinct(mtcars, gear)
+slicemt=slice(mtcars, 10:15)
+headmt=slice_head(mtcars, n = 5)
+tailmt=slice_tail(mtcars, n = 5)
 
-## 2.2 Arrange cases
-arrange(mtcars, mpg)
-arrange(mtcars, desc(mpg))
+## 2.2 Arrange observations
+arrmt=arrange(mtcars, mpg)
+descmt=arrange(mtcars, desc(mpg))
 
-## 2.3 Add cases
-add_row(cars, speed = 1, dist = 1)
+## 2.3 Make dataframes and bind them together
 ### A showcase for bind_rows
 x <- data.frame(
   A = c('a', 'b'), 
@@ -55,48 +51,24 @@ y
 bind_rows(x, y)
 ```
 
-## 2.1 Select Cases
-![implicit](images/select_cases.png)
-<div style="text-align: right">
-	<p>Figure Source: <a href="https://raw.githubusercontent.com/rstudio/cheatsheets/main/data-transformation.pdf">dplyr Cheat Sheet</a>, CC BY SA Posit Software, PBC</p>
-</div>
-
-## 2.2 Arrange Cases
-![implicit](images/arrange_cases.png)
-<div style="text-align: right">
-	<p>Figure Source: <a href="https://raw.githubusercontent.com/rstudio/cheatsheets/main/data-transformation.pdf">dplyr Cheat Sheet</a>, CC BY SA Posit Software, PBC</p>
-</div>
-
-## 2.3 Add Cases
-![implicit](images/add_cases.png)
-![implicit](images/bind_rows.png)
-<div style="text-align: right">
-	<p>Figure Source: <a href="https://raw.githubusercontent.com/rstudio/cheatsheets/main/data-transformation.pdf">dplyr Cheat Sheet</a>, CC BY SA Posit Software, PBC</p>
-</div>
-
-## Practice 2
+## Practice 1
 `iris` is a data frame with 150 cases (rows) and 5 variables (columns) such as `Petal.Width` and `Species`. In the `iris` data set, the cases with the minimum and maximum petal width belong to what species?
 <details>
 	<summary><u>Click here for solutions</u></summary>
 	<div style="border: thin grey 1px; background-color: #eeebee; padding:15px;">
 		<p>
 		# solution 1 <br>
 		arrange(iris, Petal.Width) <br>
+		
 		# solution 2 <br>
-		slice_min(iris, Petal.Width, prop = 0.01) <br>
-		slice_max(iris, Petal.Width, prop = 0.01) <br>
+		slice_min(iris, Petal.Width) <br>
+		slice_max(iris, Petal.Width) <br>
+		
 		 <br>
 		# The case with the minimum petal width belongs to setosa. <br>
 		# The case with the maximum petal width belongs to virginica.
 		</p>
     </div>
 </details>
 &nbsp;    
-&nbsp;    
-
-
-
-This page is meant to introduce functions that help manipulate rows.  
-A pause here for questions.
-
-
+&nbsp;  
diff --git a/index.md b/index.md
@@ -16,21 +16,22 @@ Participants are expected to have installed the latest versions of R and RStudio
 Install R from [https://cran.rstudio.com/](https://cran.rstudio.com/)   
 Install RStudio from [https://rstudio.com/products/rstudio/download/#download](https://rstudio.com/products/rstudio/download/#download)  
 
+Please note that while we go through many examples during the workshop, we do not cover all of the operations available in dplyr. 
+
 ## Learning objectives
 
 At the end of this workshop, you will be able to:
 1. Understand what the *dplyr* package is used for
-2. Manipulate rows to select, arrange, and add cases
-3. Manipulate columns to select, arrange, add, and rename variables
-4. Use pipe operator to combine multiple manipulations together
+2. Use pipe operator to combine multiple manipulations together
+3. Manipulate rows to select, arrange, and add cases
+4. Manipulate columns to select, arrange, add, and rename variables
 
 ## Schedule
 
 | 0:00 - 0:05 | Welcome and using Zoom|
 | 0:05 - 0:15 | Intro to *dplyr* |  
-| 0:15 - 0:30 | Manipulating rows |
-| 0:30 - 0:45 | Manipulating columns |  
-| 0:45 - 0:55 | Combining multiple manipulations | 
+| 0:15 - 0:35 | Manipulating rows |
+| 0:35 - 0:50 | Manipulating columns |   
 | 0:55 - 1:00 | Recap |
 | 1:00 - 1:30 | Optional Q&A |