-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Is it possible to sample only a subset of works when using oa_snowball ? #279
Comments
Hi @adrientaudiere could you elaborate on the motivation to perform such an analysis? Why would you want a subset of the cited/citing work? |
Hi @trangdata, |
The idea is not completely out of scope, but I'd also like to hear more before considering implementing this (though it seems like sampling, specifically, is not the point?) If the point is to test a pipeline with a small library(openalexR)
x <- oa_fetch(doi = "10.1038/s41586-022-05258-z")
# >200 connections
x$cited_by_count + length(x$referenced_works[[1]])
#> [1] 221
# Search the range of 1 year forward, 10 years back
pub_date <- as.Date(x$publication_date)
snowball <- oa_snowball(
identifier = x$id,
citing_params = list(to_publication_date = pub_date + 365),
cited_by_params = list(from_publication_date = pub_date - 3650)
)
# Returns a subset
nrow(snowball$nodes)
#> [1] 112
hist(as.Date(snowball$nodes$publication_date), "years", freq = TRUE)
abline(v = pub_date, col = "red", lwd = 3) |
Oh yes, it's a perfect trick. Maybe it deserves to be present in the example of the function. Thank you. |
We had actually planned for a full snowballing vignette but haven't got around to it. We'll keep this example in mind! |
Update: no longer necessary to do date conversion of library(openalexR)
x <- oa_fetch(doi = "10.1038/s41586-022-05258-z")
# >200 connections
x$cited_by_count + length(x$referenced_works[[1]])
#> [1] 223
# Search the range of 1 year forward, 10 years back
snowball <- oa_snowball(
identifier = x$id,
citing_params = list(to_publication_date = x$publication_date + 365),
cited_by_params = list(from_publication_date = x$publication_date - 3650)
)
# Returns a subset
nrow(snowball$nodes)
#> [1] 114
hist(snowball$nodes$publication_date, "years", freq = TRUE)
abline(v = x$publication_date, col = "red", lwd = 3) Closing to track vignette discussion in #284 |
Hi all,
Thanks for your very usefull package.
I wonder if it is possible to add an equivalent of
options = list(sample = 10, seed = 1))
to each snowball query ? For the moment if I addoptions = list(sample = 10, seed = 1)
, it only apply to the first oa_fetch query. And the optionciting_params
andcited_by_params
add a filter in the api request, so the api parameter sample is not usable.Best,
The text was updated successfully, but these errors were encountered: