Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: replace {crul} with {httr2} #157

Draft
wants to merge 20 commits into
base: main
Choose a base branch
from
4 changes: 2 additions & 2 deletions DESCRIPTION
Original file line number Diff line number Diff line change
Expand Up @@ -25,14 +25,14 @@ BugReports: https://github.com/ropensci/opencage/issues
Depends:
R (>= 3.4.0)
Imports:
crul (>= 0.5.2),
dplyr (>= 0.7.4),
httr2 (>= 0.2.0),
jsonlite (>= 1.5),
lifecycle,
magrittr,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might as well take the dependency on magrittr v2.0 then?
I.e. does it make sense to more generally bump all versions roughly to the one before the most recent necessary dependency (i.e. httr2) came out? Or is that a mistake? I wouldn't know what more ancient versions really mean practically?

Suggested change
magrittr,
magrittr (>= 2.0.0),

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a specific reason for depending on that magrittr version?

memoise (>= 1.1.0),
progress (>= 1.1.2),
purrr (>= 0.2.4),
ratelimitr (>= 0.4.0),
rlang,
tibble (>= 1.4.2),
tidyr (>= 0.8.0),
Expand Down
1 change: 1 addition & 0 deletions NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -26,4 +26,5 @@ export(oc_reverse_df)
export(opencage_forward)
export(opencage_key)
export(opencage_reverse)
importFrom(magrittr,`%>%`)
importFrom(rlang,.data)
2 changes: 2 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
# opencage (development version)


* http requests are now handled by {[httr2](https://httr2.r-lib.org/)}, not {[crul](https://docs.ropensci.org/crul/)}; and rate limitation / throttling by httr2, not [ratelimitr](https://github.com/tarakc02/ratelimitr) ([#156](https://github.com/ropensci/opencage/issues/156)).
* opencage now supports an `address_only` parameter, see "[New optional API parameter 'address_only'](https://blog.opencagedata.com/post/new-optional-parameter-addressonly)", ([#151](https://github.com/ropensci/opencage/pull/151)).
* The geocoding functions will not send a query to the API anymore if no API key is present ([#133](https://github.com/ropensci/opencage/issues/133)).
* `NA`s are allowed again for the `placename` or `latitude`/`longitude` arguments (also empty strings for `placename`).
Expand Down
8 changes: 7 additions & 1 deletion R/oc_api_ok.R
Original file line number Diff line number Diff line change
Expand Up @@ -10,5 +10,11 @@
#' @keywords internal

oc_api_ok <- function(url = "https://api.opencagedata.com") {
crul::ok(url, useragent = "https://github.com/ropensci/opencage")
resp <- httr2::request(url) %>%
httr2::req_method("HEAD") %>%
httr2::req_user_agent(oc_ua_string) %>%
httr2::req_error(is_error = function(resp) FALSE) %>%
httr2::req_perform()

!httr2::resp_is_error(resp)
}
12 changes: 6 additions & 6 deletions R/oc_config.R
Original file line number Diff line number Diff line change
Expand Up @@ -119,16 +119,16 @@ oc_config <-

Sys.setenv(OPENCAGE_KEY = pat)

# set rate limit
ratelimitr::UPDATE_RATE(
oc_get_limited,
ratelimitr::rate(n = rate_sec, period = 1L)
)

# set no_record
oc_check_logical(no_record, check_length_one = TRUE)
options("oc_no_record" = no_record)

# set show_key
options("oc_show_key" = show_key)

# set rate
if (!is.numeric(rate_sec)) {
cli::cli_abort("Must use a numeric {.code rate_sec}")
}
options("oc_rate_sec" = rate_sec)
}
72 changes: 34 additions & 38 deletions R/oc_process.R
Original file line number Diff line number Diff line change
Expand Up @@ -130,7 +130,7 @@ oc_process <-
}

# build url
oc_url <- oc_build_url(
oc_url_parts <- oc_build_url(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current naming confused me a little, I am afraid.

I think there are two to three steps until the request is made

1a. format the query parameters, i.e. turn booleans into integers, delete unused parameters, concatenate bounds coordinates, etc. This is now done in oc_build_url() except this does not yet build the url, so maybe oc_format_query_parameters() and the resulting list is query_parameters? I would move the endpoint as a separate input to the next step/function.
1b. build the httr2 request object (or rather just its url part). Basically what build_query_with_req() does now (maybe just name it build_request()?). The inputs are the query_parameters and endpoint, the output is query_req.
2. Make the request with oc_get(). Pretty much like it is now, but with request or similar as an argument name. (Note to self: This has to be a separate step, so we can return the url_only before).

🤔 À propos naming: Maybe I overdid it with the oc_ prefix for internal functions... 😉

query_par = list(
q = query,
bounds = bounds,
Expand All @@ -150,6 +150,8 @@ oc_process <-
),
endpoint = endpoint
)
query_req <- build_query_with_req(oc_url_parts)
oc_url <- query_req[["url"]]

# return url only
if (return == "url_only") {
Expand All @@ -166,13 +168,11 @@ oc_process <-
# send query to API if not NA, else return (fake) empty res_text
if (!is.na(query) && nchar(query) >= 2) {
# get response
res_env <- oc_get_memoise(oc_url)
res_env <- oc_get_memoise(oc_url_parts)

# parse response
res_text <- oc_parse_text(res_env)

# check status message
oc_check_status(res_env, res_text)
dpprdan marked this conversation as resolved.
Show resolved Hide resolved
} else {
# Fake 0 results response

Expand Down Expand Up @@ -236,8 +236,7 @@ oc_build_url <- function(query_par, endpoint) {

oc_path <- paste0("geocode/v1/", endpoint)

crul::url_build(
url = "https://api.opencagedata.com",
list(
path = oc_path,
query = query_par
)
Expand All @@ -246,18 +245,38 @@ oc_build_url <- function(query_par, endpoint) {

#' GET request from OpenCage
#'
#' @param oc_url character string URL with query parameters, built with
#' @param oc_url_parts list with path and query, built with
#' `oc_build_url()`
#'
#' @return crul::HttpResponse object
#' @return httr2 response
#' @noRd

oc_get <- function(oc_url) {
client <- crul::HttpClient$new(
url = oc_url,
headers = list(`User-Agent` = oc_ua_string)
)
client$get()
oc_get <- function(oc_url_parts = NULL) {

query_req <- build_query_with_req(oc_url_parts)

Comment on lines +254 to +257
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
oc_get <- function(oc_url_parts = NULL) {
query_req <- build_query_with_req(oc_url_parts)
oc_get <- function(query_req = NULL) {

The build_query_with_req() in .oc_process() ought to suffice?

query_req %>%
httr2::req_throttle(rate = getOption("oc_rate_sec", default = 1L) / 1L) %>%
httr2::req_user_agent(oc_ua_string) %>%
httr2::req_perform() # will error if API error :-)
}

build_query_with_req <- function(oc_url_parts) {
initial_req <- httr2::request("https://api.opencagedata.com")

req_with_url <- if (!is.null(oc_url_parts[["path"]])) {
httr2::req_url_path_append(initial_req, oc_url_parts[["path"]])
} else {
initial_req
}

query_req <- if (!is.null(oc_url_parts[["query"]])) {
httr2::req_url_query(req_with_url, !!!oc_url_parts[["query"]])
} else {
req_with_url
}

query_req
Comment on lines +264 to +279
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we only use this within oc_process() and tie it to the OpenCage API domain, I think the following should be just as safe. I.e. I cannot think of a scenario where we would make queries to api.opencagedata.com without an endpoint or query_parameters.

Suggested change
build_query_with_req <- function(oc_url_parts) {
initial_req <- httr2::request("https://api.opencagedata.com")
req_with_url <- if (!is.null(oc_url_parts[["path"]])) {
httr2::req_url_path_append(initial_req, oc_url_parts[["path"]])
} else {
initial_req
}
query_req <- if (!is.null(oc_url_parts[["query"]])) {
httr2::req_url_query(req_with_url, !!!oc_url_parts[["query"]])
} else {
req_with_url
}
query_req
build_request <- function(endpoint, query_parameters) {
httr2::request("https://api.opencagedata.com") %>%
httr2::req_url_path_append(endpoint) %>%
httr2::req_url_query(req_with_url, !!!query_parameters)

But if we want to keep it generic (e.g. for the oc_get_memoise() tests), we could instead do

Suggested change
build_query_with_req <- function(oc_url_parts) {
initial_req <- httr2::request("https://api.opencagedata.com")
req_with_url <- if (!is.null(oc_url_parts[["path"]])) {
httr2::req_url_path_append(initial_req, oc_url_parts[["path"]])
} else {
initial_req
}
query_req <- if (!is.null(oc_url_parts[["query"]])) {
httr2::req_url_query(req_with_url, !!!oc_url_parts[["query"]])
} else {
req_with_url
}
query_req
build_request <- function(base_url = "https://api.opencagedata.com", endpoint, query_parameters) {
query_req <- httr2::request(base_url)
query_req <- if (!is.null(endpoint)) {
httr2::req_url_path_append(query_req , endpoint)
}
query_req <- if (!is.null(query_parameters)) {
httr2::req_url_query(query_req, !!!query_parameters)
}
query_req

}

# user-agent string: this is set at build-time, but that should be okay,
Expand All @@ -277,36 +296,13 @@ oc_ua_string <-
#' @noRd

oc_parse_text <- function(res) {
text <- res$parse(encoding = "UTF-8")
text <- httr2::resp_body_string(res)
if (identical(text, "")) {
stop("OpenCage returned an empty response.", call. = FALSE)
}
text
}


#' Check the status of the HttpResponse
#'
#' @param res_env crul::HttpResponse object
#' @param res_text parsed HttpResponse
#'
#' @return NULL if status code less than or equal to 201, else `stop()`
#' @noRd

oc_check_status <- function(res_env, res_text) {
if (res_env$success()) {
return(invisible())
}
message <-
jsonlite::fromJSON(
res_text,
simplifyVector = TRUE,
flatten = TRUE
)$status$message
stop("HTTP failure: ", res_env$status_code, "\n", message, call. = FALSE)
}


#' Format the result string
#'
#' @param res_text parsed HttpResponse
Expand Down
22 changes: 6 additions & 16 deletions R/zzz.R
Original file line number Diff line number Diff line change
@@ -1,27 +1,17 @@
#' @importFrom magrittr `%>%`

# We use `<<-` below to modify the package's namespace.
# It doesn't modify the global environment.
# We do this to prevent build time dependencies on {memoise} and {ratelimitr},
# We do this to prevent build time dependencies on {memoise},
# as recommended in <http://memoise.r-lib.org/reference/memoise.html#details>.
# Cf. <https://github.com/r-lib/memoise/issues/76> for further details.

# First make sure that the functions are defined at build time
oc_get_limited <- oc_get
oc_get_memoise <- oc_get_limited
# First make sure that the function is defined at build time
oc_get_memoise <- oc_get

# Then modify them at load-time
# nocov start
.onLoad <- function(libname, pkgname) {
# limit requests per second
oc_get_limited <<-
ratelimitr::limit_rate(
oc_get,
# rate can be changed via oc_config()/ratelimitr::UPDATE_RATE()
ratelimitr::rate(
n = 1L,
period = 1L
)
)

oc_get_memoise <<- memoise::memoise(oc_get_limited)
oc_get_memoise <<- memoise::memoise(oc_get)
}
# nocov end
4 changes: 2 additions & 2 deletions tests/testthat/test-oc_build_url.R
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
test_that("oc_build_url returns a string", {
test_that("oc_build_url returns a list", {
expect_type(
oc_build_url(
query_par = list(placename = "Haarlem"),
endpoint = "json"
),
"character"
"list"
)
})
12 changes: 6 additions & 6 deletions tests/testthat/test-oc_check_status.R
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ test_that("oc_check_status returns 400 error if request is invalid", {
longitude = 0,
return = "json_list"
),
"HTTP failure: 400"
"HTTP 400 Bad Request"
)

# We don't send queries with nchar(query) <= 1 to the API, see .oc_process()
Expand All @@ -31,7 +31,7 @@ test_that("oc_check_status returns 400 error if request is invalid", {
placename = " ",
return = "json_list"
),
"HTTP failure: 400"
"HTTP 400 Bad Request"
)
})

Expand All @@ -42,7 +42,7 @@ test_that("oc_check_status returns 401 error if key is invalid", {
withr::local_envvar(c("OPENCAGE_KEY" = "32charactersandnumbers1234567890"))
expect_error(
oc_reverse(latitude = 0, longitude = 0),
"HTTP failure: 401"
"HTTP 401 Unauthorized"
)
})

Expand All @@ -53,7 +53,7 @@ test_that("oc_check_status returns 402 error if quota exceeded", {
withr::local_envvar(c("OPENCAGE_KEY" = key_402))
expect_error(
oc_reverse(latitude = 0, longitude = 0),
"HTTP failure: 402"
"HTTP 402"
)
})

Expand All @@ -64,7 +64,7 @@ test_that("oc_check_status returns 403 error if key is blocked", {
withr::local_envvar(c("OPENCAGE_KEY" = key_403))
expect_error(
oc_reverse(latitude = 0, longitude = 0),
"HTTP failure: 403"
"HTTP 403"
)
})

Expand All @@ -75,6 +75,6 @@ test_that("oc_check_status returns 429 error if rate limit is exceeded", {
withr::local_envvar(c("OPENCAGE_KEY" = key_429))
expect_error(
oc_reverse(latitude = 0, longitude = 0),
"HTTP failure: 429"
"HTTP 429"
)
})
9 changes: 3 additions & 6 deletions tests/testthat/test-oc_clear_cache.R
Original file line number Diff line number Diff line change
Expand Up @@ -2,11 +2,8 @@ test_that("oc_clear_cache clears cache", {
skip_on_cran()
skip_if_offline("httpbin.org")

# until a memoise >v.1.1 is released, we need to run oc_get_memoise() twice to
# have it really cache results
# https://github.com/ropensci/opencage/pull/87#issuecomment-573573183
replicate(2, oc_get_memoise("https://httpbin.org/get"))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure these edits were the right ones. However now one cannot "get" stuff from an URL that's not the OpenCage API, with the refactoring.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not entirely sure either whether this actually tests what we want to.
Could we build oc_get() in such a way that it accepts a httr2 request object and just performs the request (i.e. just wrapper around httr2::req_perform())? Then we could just build a simple query request here (I'd rather not hit the OpenCage API for this).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am now wondering why we are testing memoization at all, given it's "outsourced" to {memoise}?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm also not convinced we need to test oc_clear_cache() as it only wraps memoise that we trust?

By the way here's another caching method: https://deploy-preview-537--ropensci.netlify.app/blog/2023/04/21/ropensci-news-digest-april-2023/#caching-the-results-of-functions-of-your-r-package

expect_true(memoise::has_cache(oc_get_memoise)("https://httpbin.org/get"))
replicate(2, oc_get_memoise())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
replicate(2, oc_get_memoise())
oc_get_memoise()

expect_true(memoise::has_cache(oc_get_memoise)())
oc_clear_cache()
expect_false(memoise::has_cache(oc_get_memoise)("https://httpbin.org/get"))
expect_false(memoise::has_cache(oc_get_memoise)())
})
28 changes: 9 additions & 19 deletions tests/testthat/test-oc_config.R
Original file line number Diff line number Diff line change
Expand Up @@ -48,25 +48,6 @@ test_that("oc_config throws error with faulty OpenCage key", {
)
})

# test rate_sec argument --------------------------------------------------
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please add a little test whether oc_config() can update the oc_rate_limit option?


test_that("oc_config updates rate limit of oc_get_limit", {
# make sure there is a key present
withr::local_envvar(c("OPENCAGE_KEY" = key_200))

rps <- 5L
oc_config(rate_sec = rps)
expect_identical(ratelimitr::get_rates(oc_get_limited)[[1]][["n"]], rps)

rps <- 3L
oc_config(rate_sec = rps)
expect_identical(ratelimitr::get_rates(oc_get_limited)[[1]][["n"]], rps)

# set rate_sec back to default: 1L/sec
oc_config()
expect_identical(ratelimitr::get_rates(oc_get_limited)[[1]][["n"]], 1L)
})

# test no_record argument -------------------------------------------------

test_that("oc_config sets no_record option", {
Expand Down Expand Up @@ -112,3 +93,12 @@ test_that("oc_config sets show_key option", {
oc_config(show_key = TRUE)
expect_true(getOption("oc_show_key"))
})

test_that("rate_sec checks/sets oc_rate_sec option", {
withr::local_envvar(c("OPENCAGE_KEY" = key_200))
withr::local_options(oc_rate_sec = 123)
oc_config(rate_sec = 42)
expect_equal(getOption("oc_rate_sec"), 42)

expect_error(oc_config(rate_sec = "blablabla"), "Must")
})
17 changes: 3 additions & 14 deletions tests/testthat/test-oc_get.R
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ test_that("oc_get returns a response object", {
endpoint = "json"
)
),
"HttpResponse"
"httr2_response"
)
})

Expand All @@ -33,7 +33,7 @@ test_that("oc_get returns a response object for Namibia NA countrycode", {
endpoint = "json"
)
),
"HttpResponse"
"httr2_response"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these tests are actually failing with "HTTP 400 Bad Request". I'm not sure why as the same oc_forward() query works. I see the first test in this file, that doesn't fail, uses a key for always getting 200. I'm a bit lost.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All queries/tests in test-oc_get.R return HTTP 400 errors, also on main, because the query parameter placename ought to be q here 🙈. Well, the first one passes, because it uses the key_200 which always returns a HTTP 200 response, no matter what nonsense you throw at the API.
I don't know whether this has always been the case or whether there has been a related change in the OpenCage API. Regardless, it never surfaced, because all queries only test whether a HttpResponse object is returned by oc_get(), and the status wasn't checked until later. But the status check is now as part of oc_get() so, the error now surfaces here already.

I am not sure about the usefulness of the Namibia and vector countrycode tests here. I think that these should be tested with oc_forward() if they aren't already?

Here, I am more interested in whether memoisation and rate limiting works as expected.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As noted in another comment, I wouldn't test memoisation nor rate limiting as we outsource those.

What we should test is as you noted, whether the configuration function can configure the rate. BUT since in req_throttle() I call getOption("oc_rate_sec", default = 1L), it only means retrieving the option before and after calling the configuration function.

Would you agree?

)
})

Expand All @@ -52,21 +52,10 @@ test_that("oc_get returns a response object for vector countrycode", {
endpoint = "json"
)
),
"HttpResponse"
"httr2_response"
)
})

test_that("oc_get_limited is rate limited", {
skip_on_cran()
skip_if_offline("httpbin.org")

tm <- system.time({
replicate(2, oc_get_limited("https://httpbin.org/get"))
})
rate <- ratelimitr::get_rates(oc_get_limited)
expect_gte(tm[["elapsed"]], rate[[1]][["period"]] / rate[[1]][["n"]])
})

test_that("oc_get_memoise memoises", {
skip_on_cran()
skip_if_offline("httpbin.org")
Expand Down