-
Notifications
You must be signed in to change notification settings - Fork 31
/
Deepak-607-TidyVerse.Rmd
165 lines (121 loc) · 4 KB
/
Deepak-607-TidyVerse.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
---
title: "deepak 607 tidyverse"
author: "deepak sharma"
date: "4/6/2021"
output:
html_document:
theme: default
highlight: espresso
toc: yes
toc_depth: 5
toc_float:
collapsed: yes
pdf_document:
toc: yes
toc_depth: '5'
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
```{r}
library(tidyverse)
library(dplyr)
```
## Getting started
First we need to load these packages:
* tidyverse \
* stringr \
* dplyr - used for subsetting data in our analysis \
* rmdformats - used to for styling html document \
We’re going to load a dataset from fivethirtyeight.com to help us show examples of stringr at work. Our data shows murders in cities in America from 2014 to 2015.
We’ll take the first 10 rows of the data for simplicity’s sake.
```{r cars}
url <- 'https://raw.githubusercontent.com/fivethirtyeight/data/master/murder_2016/murder_2015_final.csv'
murder_raw <- read_csv(url)
```
## Ordering Strings
str_sort(character vector,decreasing = X)
Purpose:
Order a character vector alphabetically.
Input:
character vector - what you want to order
X - indicate whether to order characters decreasing (FALSE - alphabetically) or increasing (TRUE - order from Z to A)
Output:
An ordered character vector
Example:
We’ll order the column ‘city’ from our dataframe ‘murder
```{r}
str_sort(murder_raw$city,decreasing=FALSE)
```
## Combining Strings
str_c(String1,String2,…Stringn)
Purpose:
The function takes in a strings or vectors of strings and concatentates them together
Input:
String or vector of strings separated by comma
Output:
Single string of vector of combined strings
Example:
You can combine as many strings as you want together at once
Let’s let’s see how we can combine two vectors of strings together from our dataframe: the city and the state
```{r}
head(str_c(murder_raw$city,murder_raw$state))
#separate city and state by sep = '' argument
murder_raw$City_State <- str_c(murder_raw$city,murder_raw$state,sep=",")
head(murder_raw$City_State)
```
## Replacing Strings
str_replace_all(string, pattern, string)
Purpose:
This function will replace all instances of a pattern with the given replacement
Input:
String or vector of strings
Pattern - you can use regular expressions here
Output:
Single string of vector of combined strings
Example:
Supposed we wanted to replace all appearances of , in the column ‘City_State’. We can easily do this with str_replace_all()
```{r}
murder_raw$City_State <- str_replace_all(murder_raw$City_State,'[\\,]','*')
head(murder_raw$City_State)
```
## Get the Length of a String
str_length(string)
Purpose:
Find out the length of a string or a vector of strings
Input:
String or vector of strings
Output:
Integer
Example:
Let’s find how out how long each city name
```{r}
str_length(murder_raw$city)
#Let’s only view the rows in the dataframe where the city has more than 9 letters in the name. To do this we’ll also use the filter function from the package dplyr.
filter(murder_raw,str_length(murder_raw$city) > 9)
```
## Conclusion
In dplyr also there are so many verbs like pivot_longer,wider etc. which help us in getting the task done for specific required data view.
--------------------------------------------------------------------------------
\clearpage
## Dmitriy Burtsev add group by example
```{r}
df =summarise_at(group_by(murder_raw,state),vars(change),funs(sum))
head(df)
```
--------------------------------------------------------------------------------
\clearpage
## Ramnivas Singh TidyVerse EXTEND Assignment
### str_detect()
str_detect() is a command within filter() which requires the column name, followed by the letters (in quotes) to search for
```{r echo=FALSE, message=FALSE, warning=FALSE}
# str_detect() helper function
df<-murder_raw %>% filter(str_detect(city,'New'))
head(df)
```
### do() function
```{r echo=FALSE, message=FALSE, warning=FALSE}
# do() function- Compute within groups
df =summarise_at(group_by(murder_raw,state),vars(change),funs(sum))%>%do(head( . , 2))
head(df)
```