forked from IQSS/dataverse-client-r
-
Notifications
You must be signed in to change notification settings - Fork 0
/
README.Rmd
129 lines (83 loc) · 6 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
# R Client for Dataverse 4 Repositories #
[![Dataverse Project logo](http://dataverse.org/files/dataverseorg/files/dataverse_project_logo-hp.png "Dataverse Project")](http://dataverse.org)
The **dataverse** package provides access to [Dataverse 4](http://dataverse.org/) APIs, enabling data search, retrieval, and deposit, thus allowing R users to integrate public data sharing into the reproducible research workflow. **dataverse** is the next-generation iteration of [the **dvn** package](http://cran.r-project.org/package=dvn), which works with Dataverse 3 ("Dataverse Network") applications. **dataverse** includes numerous improvements for data search, retrieval, and deposit, including use of the (currently in development) **sword** package for data deposit and the **UNF** package for data fingerprinting.
Some features of the Dataverse 4 API are public and require no authentication. This means in many cases you can search for and retrieve data without a Dataverse account for that a specific Dataverse installation. But, other features require a Dataverse account for the specific server installation of the Dataverse software, and an API key linked to that account. Instructions for obtaining an account and setting up an API key are available in the [Dataverse User Guide](http://guides.dataverse.org/en/latest/user/account.html). (Note: if your key is compromised, it can be regenerated to preserve security.) Once you have an API key, this should be stored as an environment variable called `DATAVERSE_KEY`. It can be set within R using:
`Sys.setenv("DATAVERSE_KEY" = "examplekey12345")`
Because [there are many Dataverse installations](http://dataverse.org/), all functions in the R client require specifying what server installation you are interacting with. This can be set by default with an environment variable, `DATAVERSE_SERVER`. This should be the Dataverse server, without the "https" prefix or the "/api" URL path, etc. For example, the Harvard Dataverse can be used by setting:
`Sys.setenv("DATAVERSE_SERVER" = "dataverse.harvard.edu")`
Note: The package attempts to compensate for any malformed values, though.
### Data Discovery ###
Dataverse supplies a pretty robust search API to discover Dataverses, datasets, and files. The simplest searches simply consist of a query string:
```R
dataverse_search("Gary King")
```
More complicated searches might specify metadata fields:
```R
dataverse_search(author = "Gary King", title = "Ecological Inference")
```
And searches can be restricted to specific types of objects (Dataverse, dataset, or file):
```R
dataverse_search(author = "Gary King", type = "dataset")
```
The results are paginated using `per_page` argument. To retrieve subsequent pages, specify `start`.
### Data and Metadata Retrieval ###
The easiest way to access data from Dataverse is to use a persistent identifier (typically a DOI). You can retrieve the contents of a Dataverse dataset:
```R
get_dataset("doi:10.7910/DVN/ARKOTI")
```
retrieve metadata:
```R
dataset_metadata("doi:10.7910/DVN/ARKOTI")
```
and even access files directly in R using the DOI and a filename:
```R
f <- get_file("constructionData.tab", "doi:10.7910/DVN/ARKOTI")
# load it into memory
tmp <- tempfile(fileext = ".dta")
writeBin(f, tmp)
dat <- rio::import(tmp, haven = FALSE)
```
If you don't konw the file name in advance, you can parse the available files returned by `get_dataset()`:
```R
d1 <- get_dataset("doi:10.7910/DVN/ARKOTI")
f <- get_file(d1$files$datafile$id[3])
```
### Data Deposit ###
The data deposit workflow is build on [SWORD v2.0](http://swordapp.org/sword-v2/). This means that to create a new dataset listing, you will have first initialize a dataset entry with some metadata, add one or more files to the dataset, and then publish it. This looks something like the following:
```R
# retrieve your service document
d <- service_document()
# create a list of metadata
metadat <- list(title = "My Study",
creator = "Doe, John",
description = "An example study")
# create the dataset
dat <- initiate_dataset("mydataverse", body = metadat)
# add files to dataset
tmp <- tempfile()
write.csv(iris, file = tmp)
f <- add_file(dat, file = tmp)
# publish new dataset
publish_dataset(dat)
# dataset will now be published
list_datasets(dat)
```
Dataverse actually implements two ways to release datasets: the SWORD API and the "native" API. Documentation of the latter is forthcoming.
### Native API Features ###
Coming soon...
## Installation ##
[![CRAN Version](http://www.r-pkg.org/badges/version/dataverse)](http://cran.r-project.org/package=dataverse)
![Downloads](http://cranlogs.r-pkg.org/badges/dataverse)
[![Travis-CI Build Status](https://travis-ci.org/IQSS/dataverse-client-r.png?branch=master)](https://travis-ci.org/IQSS/dataverse-client-r)
[![codecov.io](http://codecov.io/github/IQSS/dataverse-client-r/coverage.svg?branch=master)](http://codecov.io/github/IQSS/dataverse-client-r?branch=master)
You can (eventually) find a stable release on [CRAN](http://cran.r-project.org/web/packages/dataverse/index.html), or install the latest development version from GitHub:
```R
if(!require("ghit")) {
install.packages("ghit")
}
ghit::install_github("iqss/dataverse-client-r")
library("dataverse")
```
Users interested in downloading metadata from archives other than Dataverse may be interested in Kurt Hornik's [OAIHarvester](http://cran.r-project.org/web/packages/OAIHarvester/index.html) and Scott Chamberlain's [oai](https://cran.fhcrc.org/web/packages/oai/index.html), which offer metadata download from any web repository that is compliant with the [Open Archives Initiative](http://www.openarchives.org/) standards. Additionally, [rdryad](http://cran.fhcrc.org/web/packages/rdryad/index.html) uses OAIHarvester to interface with [Dryad](http://datadryad.org/). The [rfigshare](http://cran.r-project.org/web/packages/rfigshare/) package works in a similar spirit to **dataverse** with [http://figshare.com/](http://figshare.com/).
---
[![](http://ropensci.org/public_images/github_footer.png)](http://ropensci.org)