-
Notifications
You must be signed in to change notification settings - Fork 27
/
Copy path03-Extracting-Tweet-Entities.Rmd
42 lines (29 loc) · 1.64 KB
/
03-Extracting-Tweet-Entities.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
# Extracting Tweet Entities
## Problem
You want to extract tweet entities such as `@mentions`, `#hashtags`, and short URLs from Twitter search results or other batches of tweets.
## Solution
Use `rtweet::search_tweets()` or any of the _timeline_ functions in `rtweet`.
## Discussion
Michael has provided a very powerful search interface for Twitter data mining. `rtweet::search_tweets()` retrieves, parses and extracts an astounding amount of data for you to then use. Let's search Twitter for the `#rstats` hashtag and see what is available:
```{r 03_pkgs, message=FALSE, warning=FALSE, cache=TRUE}
library(rtweet)
library(tidyverse)
```
```{r 03_search, message=FALSE, warning=FALSE, cache=TRUE}
(rstats <- search_tweets("#rstats", n=300)) # pull 300 tweets that used the "#rstats" hashtag
glimpse(rstats)
```
From the output, you can see that all the URLs (short and expanded), status id's, user id's and other hashtags are all available and all in a [tidy](http://r4ds.had.co.nz/tidy-data.html) data frame.
What are the top 10 (with ties) other hashtags used in conjunction with `#rstats` (for this search group)?
```{r 03_hash, message=FALSE, warning=FALSE, cache=TRUE}
select(rstats, hashtags) %>%
unnest() %>%
mutate(hashtags = tolower(hashtags)) %>%
count(hashtags, sort=TRUE) %>%
filter(hashtags != "rstats") %>%
top_n(10)
```
## See Also
- Official Twitter [search API](https://developer.twitter.com/en/docs/tweets/search/guides/build-standard-query) documentation
- Twitter [entites](https://developer.twitter.com/en/docs/tweets/data-dictionary/overview/entities-object) information
- The [tidyverse](https://www.tidyverse.org/) introduction.