-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sbtools get #272
sbtools get #272
Changes from all commits
5ecf0e5
bc12df5
1fccc36
50b1433
c610b1c
0a24ae9
50ee859
6dbc173
2b4aa6e
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Large diffs are not rendered by default.
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,174 @@ | ||
--- | ||
title: "sbtools - Download Data" | ||
date: "9999-07-01" | ||
author: "Lindsay R. Carr" | ||
slug: "sbtools-get" | ||
image: "img/main/intro-icons-300px/r-logo.png" | ||
output: USGSmarkdowntemplates::hugoTraining | ||
parent: Introduction to USGS R Packages | ||
weight: 2 | ||
draft: true | ||
--- | ||
|
||
```{r setup, include=FALSE, warning=FALSE, message=FALSE} | ||
library(knitr) | ||
|
||
knit_hooks$set(plot=function(x, options) { | ||
sprintf("<img src='../%s%s-%d.%s'/ title='%s'/>", | ||
options$fig.path, options$label, options$fig.cur, options$fig.ext, options$fig.cap) | ||
|
||
}) | ||
|
||
opts_chunk$set( | ||
echo=TRUE, | ||
fig.path="static/sbtools-get/", | ||
fig.width = 6, | ||
fig.height = 6, | ||
fig.cap = "TODO" | ||
) | ||
|
||
set.seed(1) | ||
``` | ||
|
||
This lesson will describe the basic functions to manage ScienceBase authenticated sessions and view or download ScienceBase items. If you aren't sure what a ScienceBase item is, head back to the [previous lesson on `sbitems`](/sbtools-sbitem). | ||
|
||
Don't forget to load the library if you're in a new R session! | ||
|
||
```{r sbtools-library, message=FALSE, warning=FALSE} | ||
library(sbtools) | ||
``` | ||
|
||
```{r sbtools-auth, echo=FALSE} | ||
# run vizlab::storeSBcreds() once before this can work | ||
home <- path.expand('~') | ||
sbCreds <- file.path(home, ".vizlab/sbCreds") | ||
credList <- readRDS(sbCreds) | ||
un <- rawToChar(credList$username) | ||
pw <- rawToChar(credList$password) | ||
sbtools::authenticate_sb(un, pw) | ||
``` | ||
|
||
## Authentication | ||
|
||
This section is specific to authentication with ScienceBase. If you don't have a ScienceBase account, skip to the next section. Just know that you will only be able to download public data. | ||
|
||
The first step to authenticating (or logging in) to ScienceBase is to use the function `authenticate_sb`. The arguments are your username and password. Alternatively, you can use the function interactively by not supplying any arguments. It will prompt you for your username in the R console and then your password in a pop-up window. Be very cautious when using the username and password arguments - don't include these in any scripts! To be safe, you can leave out the arguments and use the interactive login. Try interactively logging in: | ||
|
||
```{r sbtools-login, eval=FALSE} | ||
authenticate_sb() | ||
``` | ||
|
||
To double check that your authentication was successful, use the function `is_logged_in`. It will return a logical to let you know if you are logged in or not. No arguments are needed. | ||
|
||
```{r sbtools-verifylogin} | ||
is_logged_in() | ||
``` | ||
|
||
Each user has a specific ScienceBase item associated with their account. You can inspect the items and files attached to your home item and even add new items and files (both discussed in the [next section](/usgs-packages/sbtools-modify)) . To determine the ScienceBase ID of your home item, use the function `user_id` in an authenticated session. No arguments are necessary. | ||
|
||
```{r sbtools-userid} | ||
user_id() | ||
``` | ||
|
||
When you're done with your session, you can actively logout using the `session_logout` function. No arguments are required. If you do not do this, you will be automatically logged out after a certain amount of time or when you close R. | ||
|
||
## Inspect and download items | ||
|
||
The first inspection step for ScienceBase items is to determine if the item even exists. To do this, use the function `identifier_exists`. The only required argument is `sb_id` which can be either a character string of the item id or an `sbitem`. It will return a logical to indicate whether the item exists or not. | ||
|
||
```{r sbtools-identifierexists} | ||
identifier_exists("4f4e4acae4b07f02db67d22b") | ||
identifier_exists("thisisnotagoodid") | ||
``` | ||
|
||
You can use the function `item_exists` to check whether or not alternative identifiers (a scheme-type-key tuple) exist (visit the [sbitem lesson](/usgs-packages/sbtools-sbitem) if you don't know about alternative identifiers). The function has three required arguments - `scheme`, `type`, and `key`. Note that the table of alternative identifiers on ScienceBase is in a different order than this function accepts: `type, scheme, key` on ScienceBase but `scheme, type, key` for `item_exists`. | ||
|
||
```{r sbtools-itemexists} | ||
# test a made up tuple | ||
item_exists(scheme = "made", type = "this", key = "up") | ||
|
||
# test a tuple from the SB item "4f4e4acae4b07f02db67d22b" | ||
item_exists(scheme = "State Inventory", type = "UniqueKey", key = "P1281") | ||
|
||
# test the same scheme & type with a made up key | ||
item_exists(scheme = "State Inventory", type = "UniqueKey", key = "1234") | ||
``` | ||
|
||
Let's inspect various ScienceBase items. There are functions to look at the parent item, metadata fields, sub-items, and associated files. Each of these functions require the id of the sbitem as the first argument. For all of these examples, we are going to use the same sbitem id, "4f4e4b24e4b07f02db6aea14". | ||
|
||
First, let's inspect the parent item. The function to use is `item_get_parent`, and the item id is the only necessary argument. | ||
|
||
```{r sbtools-parent} | ||
ex_id <- "4f4e479de4b07f02db491e34" | ||
ex_id_parent <- item_get_parent(ex_id) | ||
ex_id_parent$title | ||
``` | ||
|
||
Now, let's see if this item has any children by using the `item_list_children` function. | ||
|
||
```{r sbtools-children} | ||
ex_id_children <- item_list_children(ex_id) | ||
length(ex_id_children) | ||
sapply(ex_id_children, function(item) item$title) | ||
``` | ||
|
||
Let's check to see if this item has any files attached to it using `item_list_files`. This will return a dataframe with the three columns: `fname` (filename), `size` (file size in bytes), and `url` (the URL to the file on ScienceBase). | ||
|
||
```{r sbtools-files} | ||
ex_id_files <- item_list_files(ex_id) | ||
nrow(ex_id_files) | ||
ex_id_files$fname | ||
``` | ||
|
||
To actually get the files into R as data, you need to use their URLs and the appropriate parsing function. Both of the files returned for this item are XML, so you can use the `xml2` function, `read_xml`. As practice, we will download the first XML file. | ||
|
||
```{r sbtools-filedownload} | ||
xml2::read_xml(ex_id_files$url[1]) | ||
``` | ||
|
||
You can also inspect specific metadata fields of ScienceBase items. To do this, use the `item_get_fields` function. If you wish to see all fields associated with an item you could use `item_get` (discussed below) which will return the entire item. `item_get_fields` requires a second argument to the item id called `fields` that is a character vector of the fields you want to retrieve. See the [developer documentation for a SB item model](https://my.usgs.gov/confluence/display/sciencebase/ScienceBase+Item+Core+Model) for a list of potential fields. You can also use the argument `drop` to indicate that if only one field is requested, the object returned remains a list (`drop=FALSE`) or becomes a vector (`drop=TRUE`). The default is `drop=TRUE`. | ||
|
||
```{r sbtools-fields} | ||
# request multiple fields | ||
multi_fields <- item_get_fields(ex_id, c("summary", "tags")) | ||
length(multi_fields) | ||
names(multi_fields) | ||
|
||
# single field, drop=TRUE | ||
single_field_drop <- item_get_fields(ex_id, "summary") | ||
names(single_field_drop) | ||
class(single_field_drop) | ||
|
||
# single field, drop=FALSE | ||
single_field <- item_get_fields(ex_id, "summary", drop=FALSE) | ||
single_field | ||
class(single_field) | ||
``` | ||
|
||
If a field is empty, it will return `NULL`. | ||
|
||
```{r sbtools-fields-empty} | ||
# request nonexistent fields | ||
item_get_fields(ex_id, c("dates", "citation")) | ||
``` | ||
|
||
Now that we've inspected the item, let's actually pull the item down. There are a number of extra fields to inspect now. | ||
|
||
```{r sbtools-get} | ||
ex_id_item <- item_get(ex_id) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. i think There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. hmmm not sure I really follow. So There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. hmm. well, maybe i was wrong. i just tried There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hmm was just peering through the history for the function and don't see anything too obvious, but I haven't ever touched the code so who knows. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. So do you still suggest this getting moved up top and deleting the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hmm...in light of all this re-education I've just had, what would you think about deleting both There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hmm yeah that's a good point. Although, I could see how using that rather than searching through a list could be more readable. E.g. if you've got a a bunch of items and you're doing an lapply:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I suppose so. And you can avoid downloading some additional amount of text using There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'll add a note about item_get_fields giving you a subset of what item_get will, and delete as.sbitem. I struggled with a use-case for a non-power user, so I think it would be best to avoid it all together. |
||
names(ex_id_item) | ||
``` | ||
|
||
## Web feature services to visualize spatial data | ||
|
||
This function allows you to pull down web feature services (WFS) data from ScienceBase. Note that this is not the most robust function. The developers thought this could be a cool feature, but didn't want to invest too much time if there wouldn't be demand. If you'd use it a lot, visit the [`sbtools` GitHub page](https://github.com/USGS-R/sbtools/issues) and let the developers know through a new issue or "thumbs-up" an existing, related issue. | ||
|
||
When this function does work, you can use the results to create a map of the data in R. Here's a simple example using the R package `maps`. The item we will use as an example contains low flow estimations for New Jersey. We can map the sites used in the study. | ||
|
||
```{r sbtools-wfs} | ||
nj_wfs <- item_get_wfs("58cbe556e4b0849ce97dcd31") | ||
names(nj_wfs) | ||
|
||
maps::map("county", "new jersey") | ||
points(nj_wfs$longitude, nj_wfs$latitude, col="red") | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this looks like a good solution already, but is
secrets
the way of the future? you mentioned that you and JW got it working for you...are you ready to switch over here?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it's ready thanks to the windows complications. I'll make an issue for later
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See #277