You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Using the gtrendsR package and a modified version of Alex Dyachenko’s tutorial, I’ve been trying to query estimated Google Trends daily hits. I noticed that my modified version of Alex’s code doesn’t allow me to stop mid-month. In my modified version, all the days past the last day of the previous month show up as NA. Is there a way to resolve the issue?
In essence, I am just trying to replicate the steps in this Medium article but instead of doing monthly, I want to do an entire range of time.
Here's some replication code and the sample.xlsx file.
# daily estimates
library(gtrendsR)
library(tidyverse)
library(lubridate)
library(readxl)
library(here)
library(stringr)
library(curl)
get_daily_gtrend <- function(keyword = c('Taylor Swift', 'Kim Kardashian'), geo = 'US', from = '2004-01-01', to = '2004-11-02') {
if (ymd(to) >= floor_date(Sys.Date(), 'month')) {
to <- floor_date(ymd(to), 'month') - days(1)
if (to < from) {
stop("Specifying \'to\' date in the current month is not allowed")
}
}
aggregated_data <- gtrends(keyword = keyword, geo = geo, time = paste(from, to))
if(is.null(aggregated_data$interest_over_time)) {
print('There is no data in Google Trends!')
return()
}
mult_m <- aggregated_data$interest_over_time %>%
mutate(hits = as.integer(ifelse(hits == '<1', '0', hits))) %>%
group_by(month = floor_date(date, 'month'), keyword) %>%
dplyr::summarise(hits = sum(hits)) %>%
ungroup() %>%
mutate(ym = format(month, '%Y-%m'),
mult = hits / max(hits)) %>%
dplyr::select(month, ym, keyword, mult) %>%
as_tibble()
pm <- tibble(s = seq(ymd(from), ymd(to), by = 'month'),
e = seq(ymd(from), ymd(to), by = 'month') + months(1) - days(1))
raw_trends_m <- tibble()
for (i in seq(1, nrow(pm), 1)) {
curr <- gtrends(keyword, geo = geo, time = paste(pm$s[i], pm$e[i]))
if(is.null(curr$interest_over_time)) next
print(paste('for', pm$s[i], pm$e[i], 'retrieved', count(curr$interest_over_time), 'days of data (all keywords)'))
raw_trends_m <- rbind(raw_trends_m,
curr$interest_over_time)
}
trend_m <- raw_trends_m %>%
dplyr::select(date, keyword, hits) %>%
mutate(ym = format(date, '%Y-%m'),
hits = as.integer(ifelse(hits == '<1', '0', hits))) %>%
as_tibble()
trend_res <- trend_m %>%
left_join(mult_m) %>%
mutate(est_hits = hits * mult) %>%
dplyr::select(date, keyword, est_hits) %>%
as_tibble() %>%
mutate(date = as.Date(date))
return(trend_res)
}
all <- read_excel("sample.xlsx")
all$Name <- trimws(all$Name)
all <- distinct(all)
all$surname <- str_extract(all$Name, '[^ ]+$')
all$surname <- trimws(all$surname)
all_j <- all %>%
dplyr::select(Year, Folder) %>%
distinct()
#####
cand2004 <- all %>%
arrange(Folder, str_count(Name, "\\w+"), nchar(Name)) %>%
group_by(Folder, Year) %>%
mutate(order = row_number()) %>%
ungroup()
cand2004 <- cand2004 %>%
dplyr::select(Year, Folder, Name, order) %>%
distinct() %>%
separate(Folder, c("state", "name"), sep="\\-", extra = "merge")
cand2004_grp1 <- cand2004 %>%
filter(Year == 2004, order == 1)
cand2004_grp1a <- split(cand2004_grp1,rep(1:20,each=5))
l <- cand2004_grp1a$`1` %>% dplyr::pull(Name)
l <- as.list(unique(l))
r <- tibble()
for(k in l) {
r <- r %>%
rbind(get_daily_gtrend(keyword = k, geo = 'US', from = '2004-01-01', to = '2004-11-02'))
}
r %>% view()
The text was updated successfully, but these errors were encountered:
It is a problem how you loop over the dates. You can only download daily data for at moist 270 days.
The code you get builds queries for each month.
pm <- tibble(s = seq(ymd(from), ymd(to), by = 'month'),
e = seq(ymd(from), ymd(to), by = 'month') + months(1) - days(1))
Also note that what you are doing makes the resulting time series hardly useful, since the queries are not comparable over time. You are stitching daily hits together which are standardized for the time frame in which you download the data.
Using the gtrendsR package and a modified version of Alex Dyachenko’s tutorial, I’ve been trying to query estimated Google Trends daily hits. I noticed that my modified version of Alex’s code doesn’t allow me to stop mid-month. In my modified version, all the days past the last day of the previous month show up as NA. Is there a way to resolve the issue?
In essence, I am just trying to replicate the steps in this Medium article but instead of doing monthly, I want to do an entire range of time.
Here's some replication code and the sample.xlsx file.
The text was updated successfully, but these errors were encountered: