Skip to content

Commit

Permalink
🛰️
Browse files Browse the repository at this point in the history
  • Loading branch information
cboettig committed Feb 10, 2024
1 parent 4a8d80a commit a7ff426
Show file tree
Hide file tree
Showing 16 changed files with 199 additions and 82 deletions.
4 changes: 2 additions & 2 deletions _freeze/tutorials/R/2-earthdata/execute-results/html.json
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
{
"hash": "936e34da6da3eac5bbee72becc66c013",
"hash": "4819be6ba03f2a0e18b3442281c34a39",
"result": {
"markdown": "---\ntitle: \"NASA EarthData\"\nformat: html\n---\n\n\nThe NASA EarthData program provides access to an extensive collection of spatial data products from each of its 12 Distributed Active Archive Centers ('DAACs') on the high-performance S3 storage system of Amazon Web Services (AWS). We can take advantage of range requests with NASA EarthData URLs, but unlike the previous examples,\nNASA requires an authentication step. NASA offers several different mechanisms, including `netrc` authentication, token-based authentication, and S3 credentials, but only the first of these works equally well from locations both inside and outside of AWS-based compute, so there really is very little reason to learn the other two.\n\nThe [`earthdatalogin` package in R](https://boettiger-lab.github.io/earthdatalogin/) or the `earthaccess` package in Python handle the authentication. The R package sets up authentication behind the scenes using environmental variables.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nearthdatalogin::edl_netrc()\n```\n:::\n\n\n(A default login is supplied though users are encouraged to [register](https://urs.earthdata.nasa.gov/home) for their own individual accounts.) Once this is in place, EarthData's protected URLs can be used like any other: \n\n\n::: {.cell}\n\n```{.r .cell-code}\nterra::rast(\"https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/HLSL30.020/HLS.L30.T56JKT.2023246T235950.v2.0/HLS.L30.T56JKT.2023246T235950.v2.0.SAA.tif\",\n vsi=TRUE)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\nclass : SpatRaster \ndimensions : 3660, 3660, 1 (nrow, ncol, nlyr)\nresolution : 30, 30 (x, y)\nextent : 199980, 309780, 7190200, 7300000 (xmin, xmax, ymin, ymax)\ncoord. ref. : WGS 84 / UTM zone 56N (EPSG:32656) \nsource : HLS.L30.T56JKT.2023246T235950.v2.0.SAA.tif \nname : HLS.L30.T56JKT.2023246T235950.v2.0.SAA \n```\n:::\n:::\n",
"markdown": "---\ntitle: \"NASA EarthData\"\n---\n\n\nThis tutorial demonstrates how begin to generalize the pattern introduced in the introductory example to work with other data sources, in particular, with NASA EarthData. \n\nNASA recently announced completion of the transfer of some [59 petabytes of data](https://www.earthdata.nasa.gov/eosdis/cloud-evolution) to the Amazon cloud -- a core component of NASA's Transformation to Open Science (TOPS) mission. Researchers are frequently told that to take advantage of such \"cloud data\" they will need to pay (or find a grant or other program to pay) for cloud computing resources. This approach is sometimes describes as \"send the compute to the data\". While this narrative is no doubt beneficial to Amazon Inc, it exacerbates inequity and is often misleading. The purpose of having data in a cloud storage platform is not just to make it faster to access that data on rented cloud computing platforms. The high bandwith and high disk speeds provided by these systems can be just as powerful when you provide your own compute. Consistent with NASA's vision, this means that high-performance access is free\n\n> NASA Earth science data have been freely openly and available to all users since EOSDIS became operational in 1994. Under NASA's full and open data policy, all NASA mission data (along with the algorithms, metadata, and documentation associated with these data) must be freely available to the public. This means that anyone, anywhere in the world, can access the more than 59 PB of NASA Earth science data without restriction\n\nAll we need is software that can treat the cloud storage as if it were local storage: _a virtual filesystem_. The ability to do this -- the HTTP range request standard -- has been around for over two decades and is widely implemented in open source software. Unfortunately, many users and workflows are stuck in an old model that assumes individual files must always be downloaded first. \n\n\n\n\n\n\nTo make this work with NASA EarthData however, we have one additional challenge involving the problem of _authentication_. NASA offers several different mechanisms, including (1) `netrc` authentication, (2) token-based authentication, and (3) S3 credentials, but only the first of these works equally well from locations both inside and outside of AWS-based compute, so there really is very little reason to learn the other two. \n\nThe [`earthdatalogin` package in R](https://boettiger-lab.github.io/earthdatalogin/) or the `earthaccess` package in Python handle the authentication. The R package sets up authentication behind the scenes using environmental variables.\n\n\n::: {.cell hash='2-earthdata_cache/html/unnamed-chunk-2_44795769054d73bef405fd262dac49bd'}\n\n```{.r .cell-code}\nearthdatalogin::edl_netrc()\n```\n:::\n\n\n(A default login is supplied though users are encouraged to [register](https://urs.earthdata.nasa.gov/home) for their own individual accounts.) Once this is in place, EarthData's protected URLs can be used like any other. For instance, after authenticating, we can read this NASA harmonized LandSat-Sentinel2 `tif` whether we are running locally:\n\n\n::: {.cell hash='2-earthdata_cache/html/unnamed-chunk-3_fd847ddb31b3bf7630ca7a7821656a37'}\n\n```{.r .cell-code}\nterra::rast(\"https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/HLSL30.020/HLS.L30.T56JKT.2023246T235950.v2.0/HLS.L30.T56JKT.2023246T235950.v2.0.SAA.tif\",\n vsi=TRUE)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\nclass : SpatRaster \ndimensions : 3660, 3660, 1 (nrow, ncol, nlyr)\nresolution : 30, 30 (x, y)\nextent : 199980, 309780, 7190200, 7300000 (xmin, xmax, ymin, ymax)\ncoord. ref. : WGS 84 / UTM zone 56N (EPSG:32656) \nsource : HLS.L30.T56JKT.2023246T235950.v2.0.SAA.tif \nname : HLS.L30.T56JKT.2023246T235950.v2.0.SAA \n```\n:::\n:::\n\n\nAn important aspect about this approach is that it does not require any custom wrappers or specialized functions to access data. Most R packages that work with spatial data, including `terra`, `sf`, `stars` and others, do so through the use of GDAL for parsing spatial data formats. This means that they all support the [GDAL Virtual Filesystem](https://gdal.org/user/virtual_file_systems.html) for cloud-native reads out of the box. `earthdatalogin()` takes advantage of this by setting authentication credentials as GDAL environmental variable configuration, allowing these existing to seamlessly read NASA data. No need to learn any additional access functions.\n\n\n## Working with STAC & gdalcubes\n\n\n\n::: {.cell hash='2-earthdata_cache/html/setup_3baf4c1b5077bcfc0fab4dd82164e88f'}\n\n```{.r .cell-code}\nlibrary(earthdatalogin)\nlibrary(rstac)\nlibrary(gdalcubes)\n```\n:::\n\n\n`earthdatalogin` also includes optional configuration settings for GDAL which can improve performance of cloud-based data access. Set the GDAL environmental variables using `gdal_cloud_config()`. An additional helper function exposes the usual GDAL environmental variable to the `gdalcubes` R package. \n\n\n::: {.cell hash='2-earthdata_cache/html/unnamed-chunk-4_17b81e9d4ad30b73c77d4ed861c8ed0f'}\n\n```{.r .cell-code}\nedl_netrc() # Authenticate\ngdal_cloud_config() # Optimize GDAL for cloud\nwith_gdalcubes() # Export settings to gdalcubes package\n\ngdalcubes_options(parallel = TRUE)\n```\n:::\n\n\n\nNASA provides their own search system. NASA also provides a STAC-based search (see below), which allows us to use the standard syntax we have already seen. However, NASA's STAC API is significantly slower and more prone to server errors than the STAC APIs provided by Element84, Microsoft Planetary Computer, and others. \n\nHere, we use NASA's own search protocol, which is less general, but gives us a chance to illustrate the use of `gdalcubes` using arbitrary URL lists when no STAC catalog is available. We search a handful of dates for illustrative purposes, but this approach can easily scale to larger lists without needing additional RAM or disk space. \n\n\n::: {.cell hash='2-earthdata_cache/html/unnamed-chunk-5_e81ae49aa355f949da68dc25eb5b178f'}\n\n```{.r .cell-code}\nstart <- \"2020-01-01\"\nend <- \"2020-01-03\" \nurls <- edl_search(short_name = \"MUR-JPL-L4-GLOB-v4.1\",\n temporal = c(start, end))\n```\n:::\n\n\nThese netcdf files lack appropriate metadata (projection, extent) that GDAL expects. We can provide this manually using the GDAL VRT mechanism:\n\n\n::: {.cell hash='2-earthdata_cache/html/unnamed-chunk-6_eccb2df8eb45a7dfde434c506295b91e'}\n\n```{.r .cell-code}\nvrt <- function(url) {\n prefix <- \"vrt://NETCDF:/vsicurl/\"\n suffix <- \":analysed_sst?a_srs=OGC:CRS84&a_ullr=-180,90,180,-90\"\n paste0(prefix, url, suffix)\n}\n\n# date associated with each file\nurl_dates <- as.Date(gsub(\".*(\\\\d{8})\\\\d{6}.*\", \"\\\\1\", urls), format=\"%Y%m%d\")\n```\n:::\n\n\nBecause each file in this list of URLs has the same spatial extent, resolution, and projection, we can now manually construct our space-time data cube from these netcdf slices:\n\n\n::: {.cell hash='2-earthdata_cache/html/unnamed-chunk-7_bd828ad3710695c92bb3dd2d2cb09e61'}\n\n```{.r .cell-code}\ndata_gd <- gdalcubes::stack_cube(vrt(urls), datetime_values = url_dates)\n```\n:::\n\n\nWe use gdalcubes to crop each file \n\n\n::: {.cell hash='2-earthdata_cache/html/unnamed-chunk-8_dd8e17f912864a356a164b576e4a0513'}\n\n```{.r .cell-code}\nextent = list(left=-93, right=-76, bottom=41, top=49,\n t0=start, t1=end)\n\nbench::bench_time({\n data_gd |> \n gdalcubes::crop(extent) |> \n aggregate_time(dt=\"P3D\", method=\"mean\") |> \n plot(col = viridisLite::viridis(10))\n})\n```\n\n::: {.cell-output-display}\n![](2-earthdata_files/figure-html/unnamed-chunk-8-1.png){width=672}\n:::\n\n::: {.cell-output .cell-output-stdout}\n```\nprocess real \n 2.29s 27.34s \n```\n:::\n:::\n\n\n\n\n# Search via STAC\n\n\nThe STAC Catalog search system we illustrated in the introductory example is a widely used standard. This means that we can use the same packages and same code we have already learned to access entirely different geospatial data products prepared by an entirely different provider. In this example, we will illustrate searching NASA's EarthData catalog using the STAC interface. We can also browse the collection of [NASA Earthdata STAC Catalogs](https://radiantearth.github.io/stac-browser/#/external/cmr.earthdata.nasa.gov/stac/) in a web browser. (***NOTE***: Unfortunately, at this time, NASA's implementation of the STAC standard is incomplete. The web browser only shows the first 10 entries under any heading, despite the NASA STAC API actually having the complete records. Also, NASA's STAC service is considerably slower than those provided by Element84 or Microsoft Planetary Computer.)\n\n\n::: {.cell hash='2-earthdata_cache/html/unnamed-chunk-9_c10ec0cfdcda2a6e6a18ad1f8a87ade7'}\n\n```{.r .cell-code}\nitems <- stac(\"https://cmr.earthdata.nasa.gov/stac/POCLOUD\") |> \n stac_search(collections = \"MUR-JPL-L4-GLOB-v4.1\",\n datetime = paste(start,end, sep = \"/\")) |>\n post_request() |>\n items_fetch()\n\nitems\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n###STACItemCollection\n- matched feature(s): 3\n- features (3 item(s) / 0 not fetched):\n - 20200101090000-JPL-L4_GHRSST-SSTfnd-MUR-GLOB-v02.0-fv04.1\n - 20200102090000-JPL-L4_GHRSST-SSTfnd-MUR-GLOB-v02.0-fv04.1\n - 20200103090000-JPL-L4_GHRSST-SSTfnd-MUR-GLOB-v02.0-fv04.1\n- assets: data, metadata, opendap\n- item's fields: \nassets, bbox, collection, geometry, id, links, properties, stac_extensions, stac_version, type\n```\n:::\n:::\n\n",
"supporting": [],
"filters": [
"rmarkdown/pagebreak.lua"
Expand Down
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
14 changes: 0 additions & 14 deletions _freeze/tutorials/R/earthdata/execute-results/html.json

This file was deleted.

Loading

0 comments on commit a7ff426

Please sign in to comment.