DSL for SPARQL in R. ✨
glitter
producessparkleSPARQL! :sparkles:
This package aims at writing and sending SPARQL queries without advanced knowledge of the SPARQL language syntax. It makes the exploration and use of Linked Open Data (Wikidata in particular) easier for those who do not know SPARQL well.
With glitter, compared to writing SPARQL queries by hand, your code should be easier to write, and easier to read by your peers who do not know SPARQL. The glitter package supports a “domain-specific language” (DSL) with function names (and syntax) closer to the tidyverse and base R than to SPARQL.
For instance, to find a corpus of 5 articles with a title in English and “wikidata” in that title, instead of writing SPARQL by hand you can run:
library("glitter")
query <- spq_init() %>%
spq_add("?item wdt:P31 wd:Q13442814") %>%
spq_label(item) %>%
spq_filter(str_detect(str_to_lower(item_label), 'wikidata')) %>%
spq_head(n = 5)
query
#> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
#> SELECT ?item ?item_label
#> WHERE {
#>
#> ?item wdt:P31 wd:Q13442814.
#> OPTIONAL {
#> ?item rdfs:label ?item_labell.
#> FILTER(lang(?item_labell) IN ('en'))
#> }
#>
#> BIND(COALESCE(?item_labell,'') AS
#> ?item_label)FILTER(REGEX(LCASE(?item_label),"wikidata"))
#> }
#>
#> LIMIT 5
Note how we were able to use str_detect()
and str_to_lower()
(as in
the stringr package) instead of SPARQL’s functions REGEX
and LCASE
.
To perform the query,
spq_perform(query)
#> # A tibble: 5 × 2
#> item item_label
#> <chr> <chr>
#> 1 http://www.wikidata.org/entity/Q18507561 Wikidata: A Free Collaborative Knowl…
#> 2 http://www.wikidata.org/entity/Q21503276 Utilizing the Wikidata system to imp…
#> 3 http://www.wikidata.org/entity/Q21503284 Wikidata: A platform for data integr…
#> 4 http://www.wikidata.org/entity/Q23712646 Wikidata as a semantic framework for…
#> 5 http://www.wikidata.org/entity/Q24074986 From Freebase to Wikidata: The Great…
To get a random subset of movies with the date they were released, you could use
spq_init() %>%
spq_add("?film wdt:P31 wd:Q11424") %>%
spq_label(film) %>%
spq_add("?film wdt:P577 ?date") %>%
spq_mutate(date = year(date)) %>%
spq_head(10) %>%
spq_perform()
#> # A tibble: 10 × 3
#> film date film_label
#> <chr> <dbl> <chr>
#> 1 http://www.wikidata.org/entity/Q372 2009 We Live in Public
#> 2 http://www.wikidata.org/entity/Q595 2011 The Intouchables
#> 3 http://www.wikidata.org/entity/Q595 2011 The Intouchables
#> 4 http://www.wikidata.org/entity/Q595 2012 The Intouchables
#> 5 http://www.wikidata.org/entity/Q595 2012 The Intouchables
#> 6 http://www.wikidata.org/entity/Q593 2011 A Gang Story
#> 7 http://www.wikidata.org/entity/Q1365 1974 Swept Away
#> 8 http://www.wikidata.org/entity/Q1365 1974 Swept Away
#> 9 http://www.wikidata.org/entity/Q1365 1975 Swept Away
#> 10 http://www.wikidata.org/entity/Q1365 1975 Swept Away
Note that we were able to “overwrite” the date variable, which is straightforward in dplyr, but not so much in SPARQL.
Install this packages through R-universe:
install.packages("glitter", repos = "https://lvaudor.r-universe.dev")
Or through GitHub:
install.packages("remotes") #if remotes is not already installed
remotes::install_github("lvaudor/glitter")
You can access the documentation regarding package glitter
on its
pkgdown
website.