-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sbtools get #272
sbtools get #272
Changes from 6 commits
5ecf0e5
bc12df5
1fccc36
50b1433
c610b1c
0a24ae9
50ee859
6dbc173
2b4aa6e
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Large diffs are not rendered by default.
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,179 @@ | ||
--- | ||
title: "sbtools - Download Data" | ||
date: "9999-07-01" | ||
author: "Lindsay R. Carr" | ||
slug: "sbtools-get" | ||
image: "img/main/intro-icons-300px/r-logo.png" | ||
output: USGSmarkdowntemplates::hugoTraining | ||
parent: Introduction to USGS R Packages | ||
weight: 2 | ||
draft: true | ||
--- | ||
|
||
```{r setup, include=FALSE, warning=FALSE, message=FALSE} | ||
library(knitr) | ||
|
||
knit_hooks$set(plot=function(x, options) { | ||
sprintf("<img src='../%s%s-%d.%s'/ title='%s'/>", | ||
options$fig.path, options$label, options$fig.cur, options$fig.ext, options$fig.cap) | ||
|
||
}) | ||
|
||
opts_chunk$set( | ||
echo=TRUE, | ||
fig.path="static/sbtools-get/", | ||
fig.width = 6, | ||
fig.height = 6, | ||
fig.cap = "TODO" | ||
) | ||
|
||
set.seed(1) | ||
``` | ||
|
||
This lesson will describe the basic functions to manage ScienceBase authenticated sessions and view or download ScienceBase items. If you aren't sure what a ScienceBase item is, head back to the [previous lesson on `sbitems`](/sbtools-sbitem). | ||
|
||
Don't forget to load the library if you're in a new R session! | ||
|
||
```{r sbtools-library, message=FALSE, warning=FALSE} | ||
library(sbtools) | ||
``` | ||
|
||
```{r sbtools-auth, echo=FALSE} | ||
# run vizlab::storeSBcreds() once before this can work | ||
home <- path.expand('~') | ||
sbCreds <- file.path(home, ".vizlab/sbCreds") | ||
credList <- readRDS(sbCreds) | ||
un <- rawToChar(credList$username) | ||
pw <- rawToChar(credList$password) | ||
sbtools::authenticate_sb(un, pw) | ||
``` | ||
|
||
## Authentication | ||
|
||
This section is specific to authentication with ScienceBase. If you don't have a ScienceBase account, skip to the next section. Just know that you will only be able to download public data. | ||
|
||
The first step to authenticating (or logging in) to ScienceBase is to use the function `authenticate_sb`. The arguments are your username and password. Alternatively, you can use the function interactively by not supplying any arguments. It will prompt you for your username in the R console and then your password in a pop-up window. Be very cautious when using the username and password arguments - don't include these in any scripts! To be safe, you can leave out the arguments and use the interactive login. Try interactively logging in: | ||
|
||
```{r sbtools-login, eval=FALSE} | ||
authenticate_sb() | ||
``` | ||
|
||
To double check that your authentication was successful, use the function `is_logged_in`. It will return a logical to let you know if you are logged in or not. No arguments are needed. | ||
|
||
```{r sbtools-verifylogin} | ||
is_logged_in() | ||
``` | ||
|
||
Each user has a specific ScienceBase id associated with their account. The user ids can be used to inspect what top-level items saved under your account (discussed in next section). To determine your user id, use the function `user_id` in an authenticated session. No arguments are necessary. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. i know the function is
or maybe There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think |
||
|
||
```{r sbtools-userid} | ||
user_id() | ||
``` | ||
|
||
When you're done with your session, you can actively logout using the `session_logout`. No arguments are required. If you do not do this, you will be automatically logged out after a certain amount of time or when you close R. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. "the |
||
|
||
## Inspect and download items | ||
|
||
The first inspection step for ScienceBase items is to determine if the item even exists. To do this, use the function `identifier_exists`. The only required argument is `sb_id` which can be either a character string of the item id or an `sbitem`. It will return a logical to indicate if the item exists or not. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. based on http://www.quickanddirtytips.com/education/grammar/if-versus-whether, i'd recommend "It will return a logical to indicate whether the item exists or not." There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 📖 learning so much! |
||
|
||
```{r sbtools-identifierexists} | ||
identifier_exists("4f4e4acae4b07f02db67d22b") | ||
identifier_exists("thisisnotagoodid") | ||
``` | ||
|
||
ScienceBase items can be described by alternative identifiers, e.g. digital object identifiers, IPDS codes, etc. They are defined on ScienceBase with a scheme, type, and key. For examples of identifiers, see the "Additional Information | Identifiers" section of [Differential Heating](https://www.sciencebase.gov/catalog/item/580587a2e4b0824b2d1c1f23). | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. noticing that this section is partially redundant with line 203 of PR #271 (https://github.com/USGS-R/training-curriculum/pull/271/files#diff-41534c65bd2e578b204b437073f26e2dR203). could possibly move the formal, full definition of identifiers (with examples) into https://github.com/USGS-R/training-curriculum/blob/master/content/usgs-packages/sbtools_sbitem.Rmd. it'll probably still be good to have a reminder here and in sbtools_Modify.Rmd, but these could become shorter definitions that refer back to sbtools_sbitem.Rmd |
||
|
||
You can use the function `item_exists` to check whether or not a scheme-type-key tuple already exists. The function has three required arguments - `scheme`, `type`, and `key`. Note that the table of alternative identifiers on ScienceBase is in a different order than this function accepts. On ScienceBase: type, scheme, key. For `item_exists`: scheme, type, key. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. good tip! |
||
|
||
```{r sbtools-itemexists} | ||
# test a made up tuple | ||
item_exists(scheme = "made", type = "this", key = "up") | ||
|
||
# test a tuple from the SB item "4f4e4acae4b07f02db67d22b" | ||
item_exists(scheme = "State Inventory", type = "UniqueKey", key = "P1281") | ||
|
||
# test the same scheme & type with a made up key | ||
item_exists(scheme = "State Inventory", type = "UniqueKey", key = "1234") | ||
``` | ||
|
||
You can create sbitems from just the ScienceBase id. To do this use `as.sbitem`. *why you would use it* | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. i think you could safely leave out |
||
|
||
```{r sbtools-as-sbitem} | ||
antarctica_sbitem <- as.sbitem("4f4e4b24e4b07f02db6aea14") | ||
class(antarctica_sbitem) | ||
antarctica_sbitem | ||
``` | ||
|
||
Let's inspect various ScienceBase items. There are functions to look at the parent item, metadata fields, sub-items, and associated files. Each of these functions require the id of the sbitem as the first argument. For all of these examples, we are going to use the same sbitem id, "4f4e4b24e4b07f02db6aea14". | ||
|
||
First, let's inspect the parent item. The function to use is `item_get_parent`, and the item id is the only necessary argument. | ||
|
||
```{r sbtools-parent} | ||
ex_id <- "4f4e479de4b07f02db491e34" | ||
ex_id_parent <- item_get_parent(ex_id) | ||
ex_id_parent$title | ||
``` | ||
|
||
Now, let's see if this item has any children by using the `item_list_children` function. Notice that this function says "list" and not "get" as the previous one did. Functions with "list" only return a few fields associated with each item. Functions with "get" are pulling down all available information, including files, associated with an item. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. i really like that you've attempted a high-level explanation here. that said, i don't think There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Oh hmmm good point. I think I was going for the fact that there is more info. What's your take? Any higher-level difference here worth noting? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. well...hmm...no, i'm not seeing any consistent patterns here...
you could almost claim that There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ok, I'll just leave this out completely. |
||
|
||
```{r sbtools-children} | ||
ex_id_children <- item_list_children(ex_id) | ||
length(ex_id_children) | ||
sapply(ex_id_children, function(item) item$title) | ||
``` | ||
|
||
Let's check to see if this item has any files attached to it using `item_list_files`. This will return a dataframe with the three columns: `fname` (filename), `size` (file size in bytes), and `url` (the URL to the file on ScienceBase). | ||
|
||
```{r sbtools-files} | ||
ex_id_files <- item_list_files(ex_id) | ||
nrow(ex_id_files) | ||
ex_id_files$fname | ||
``` | ||
|
||
To actually get the files into R as data, you need to use their URLs and the appropriate parsing function. Both of the files returned for this item are XML, so you can use the `xml2` function, `read_xml`. As practice, we will download the first XML file. | ||
|
||
```{r sbtools-filedownload} | ||
xml2::read_xml(ex_id_files$url[1]) | ||
``` | ||
|
||
You can also inspect specific metadata fields of ScienceBase items. To do this, use the `item_get_fields` function. This function requires a second argument to the item id called `fields` that is a character vector of the fields you want to retrieve. See the [developer documentation for a SB item model](https://my.usgs.gov/confluence/display/sciencebase/ScienceBase+Item+Core+Model) for a list of potential fields. You can also use the argument `drop` to indicate that if only one field is requested, the object returned remains a list (`drop=FALSE`) or becomes a vector (`drop=TRUE`). The default is `drop=TRUE`. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 💯 for the link |
||
|
||
```{r sbtools-fields} | ||
# request multiple fields | ||
multi_fields <- item_get_fields(ex_id, c("summary", "tags")) | ||
length(multi_fields) | ||
names(multi_fields) | ||
|
||
# single field, drop=TRUE | ||
single_field_drop <- item_get_fields(ex_id, "summary") | ||
names(single_field_drop) | ||
class(single_field_drop) | ||
|
||
# single field, drop=FALSE | ||
single_field <- item_get_fields(ex_id, "summary", drop=FALSE) | ||
single_field | ||
class(single_field) | ||
``` | ||
|
||
If a field is empty, it will return `NULL`. | ||
|
||
```{r sbtools-fields-empty} | ||
# request a nonexistent fields | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. "request a nonexistent fields" -> "request nonexistent fields" |
||
item_get_fields(ex_id, c("dates", "citation")) | ||
``` | ||
|
||
Now that we've inspected the item, let's actually pull the item down. There are a number of extra fields to inspect now. | ||
|
||
```{r sbtools-get} | ||
ex_id_item <- item_get(ex_id) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. i think There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. hmmm not sure I really follow. So There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. hmm. well, maybe i was wrong. i just tried There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hmm was just peering through the history for the function and don't see anything too obvious, but I haven't ever touched the code so who knows. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. So do you still suggest this getting moved up top and deleting the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hmm...in light of all this re-education I've just had, what would you think about deleting both There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hmm yeah that's a good point. Although, I could see how using that rather than searching through a list could be more readable. E.g. if you've got a a bunch of items and you're doing an lapply:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I suppose so. And you can avoid downloading some additional amount of text using There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'll add a note about item_get_fields giving you a subset of what item_get will, and delete as.sbitem. I struggled with a use-case for a non-power user, so I think it would be best to avoid it all together. |
||
names(ex_id_item) | ||
``` | ||
|
||
## Web feature services to visualize spatial data??? | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. yeah...i'll work on this next There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ok, @lindsaycarr, my training-oriented notes are at #276 and i made a new sbtools issue at DOI-USGS/sbtools#244. it won't be super impressive, but I did identify three sb_ids whose WFSes can be retrieved. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. per luke's comments on DOI-USGS/sbtools#244, you could introduce |
||
|
||
*Need to pick a different item. This one errs since there is "no ScienceBase WFS Service available".* | ||
|
||
```{r sbtools-wfs} | ||
# ex_id_wfs <- item_get_wfs(ex_id) | ||
# names(ex_id_item) | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this looks like a good solution already, but is
secrets
the way of the future? you mentioned that you and JW got it working for you...are you ready to switch over here?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it's ready thanks to the windows complications. I'll make an issue for later
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See #277