-
Notifications
You must be signed in to change notification settings - Fork 27
/
Copy path02-Trending-Topics.Rmd
74 lines (52 loc) · 3.45 KB
/
02-Trending-Topics.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
# Looking Up the Trending Topics
## Problem
You want to keep track of the trending topics on Twitter over a period of time.
## Solution
Use `rtweet::trends_available()` to see trend areas and `rtweet::get_trends()` to pull trends, after which you can setup a task to retrieve and cache the trend data periodically.
## Discussion
Twitter has [extensive information](https://help.twitter.com/en/using-twitter/twitter-trending-faqs) on trending topics and their API enables you to see topics that are trending globally or regionally. Twitter uses [Yahoo! Where on Earth](https://developer.yahoo.com/geo/geoplanet/guide/concepts.html) identifiers (WOEIDs) for the regions which can be obtained from `rtweet::trends_available()`:
```{r 02_pkgs, message=FALSE, warning=FALSE, cache=FALSE}
library(rtweet)
library(tidyverse)
```
```{r echo=FALSE}
readRenviron("~/.Renviron")
```
```{r 02_trends_avail, message=FALSE, warning=FALSE, cache=TRUE}
(trends_avail <- trends_available())
glimpse(trends_avail)
```
The Twitter API is somewhat unforgiving and unfriendly when you use it directly since it requires the use of a WOEID. Michael has made life much easier for us all by enabling the use of names or regular expressions when asking for trends from a particular place. That means we don't even need to care about capitalization:
```{r 02_us_trends, message=FALSE, warning=FALSE, cache=TRUE}
(us <- get_trends("united states"))
glimpse(us)
```
Twitter's [documentation](https://developer.twitter.com/en/docs/trends/trends-for-location/api-reference/get-trends-place) states that trends are updated every 5 minutes, which means you should not call the API more frequently than that and their current API rate-limit (Twitter puts some restrictions on how frequently you can call certain API targets) is 75 requests per 15-minute window.
The `rtweet::get_trends()` function returns a data frame. Our ultimate goal is to retrieve the trends data on a schedule and cache it. There are numerous --- and usually complex -- ways to schedule jobs. One cross-platform solution is to use R itself to run a task periodically. This means keeping an R console open and running at all times, so is far from an optimal solution. See the [`taskscheduleR`](https://github.com/bnosac/taskscheduleR) package for other ideas on how to setup more robust scheduled jobs.
In this example, we will:
- use a [SQLite](https://www.sqlite.org/) database to store the trends
- use the `DBI` add `RSQlite` packages to work with this database
- setup a never-ending loop with `Sys.sleep()` providing a pause between requests
```{r 02_sqlite, message=FALSE, warning=FALSE, eval=FALSE}
library(DBI)
library(RSQLite)
library(rtweet) # mkearney/rtweet
repeat {
message("Retrieveing trends...") # optional
us <- get_trends("united states")
db_con <- dbConnect(RSQLite::SQLite(), "data/us-trends.db")
dbWriteTable(db_con, "us_trends", us, append=TRUE) # append=TRUE will update the table vs overwrite and also create it on first run if it does not exist
dbDisconnect(db_con)
Sys.sleep(10 * 60) # sleep for 10 minutes
}
```
Later on, we can look at this data with `dplyr`/`dbplyr`:
```{r 02_sql_get, message=FALSE, warning=FALSE, eval=TRUE, cache=FALSE}
library(dplyr)
trends_db <- src_sqlite("data/us-trends.db")
us <- tbl(trends_db, "us_trends")
select(us, trend)
```
## See Also
- [`RSQlite`](https://www.r-project.org/nosvn/pandoc/RSQLite.html) quick reference
- Introduction to `dbplyr` : <http://dbplyr.tidyverse.org/articles/dbplyr.html>