Skip to content
This repository has been archived by the owner on Jun 2, 2023. It is now read-only.

Metadata cleanup #12

Merged
merged 9 commits into from
Nov 3, 2020
6 changes: 0 additions & 6 deletions 2_observations.yml
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,6 @@ targets:
2_observations:
depends:
- out_data/temperature_observations.csv
- out_data/flow_observations.csv

# daily flow and temperature data
out_data/temperature_observations.csv:
Expand All @@ -21,8 +20,3 @@ targets:
var = I('wtemp(C)'),
in_file = 'in_data/Data - ERL paper/Forcing_attrFiles/no_dam_forcing_60__days118sites.csv')

out_data/flow_observations.csv:
command: extract_obs(
out_file = target_name,
var = I('discharge(cfs)'),
in_file = 'in_data/Data - ERL paper/Forcing_attrFiles/no_dam_forcing_60__days118sites.csv')
9 changes: 8 additions & 1 deletion 3_inputs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ targets:
- out_data/AT_basin_attributes.csv
- out_data/weather_drivers.zip
- out_data/pred_discharge.csv
- out_data/obs_discharge.csv

out_data/AT_basin_attributes.csv:
command: extract_AT_attributes(
Expand All @@ -29,8 +30,14 @@ targets:
out_data/weather_drivers.zip:
command: zip_this(out_file = target_name, weather_drivers)


out_data/pred_discharge.csv:
command: subset_pred_discharge(
out_file = target_name,
in_file = 'in_data/Data - ERL paper/Forcing_attrFiles/no_dam_forcing_60__days118sites.csv')

out_data/obs_discharge.csv:
command: extract_obs(
out_file = target_name,
var = I('discharge(cfs)'),
in_file = 'in_data/Data - ERL paper/Forcing_attrFiles/no_dam_forcing_60__days118sites.csv')

16 changes: 10 additions & 6 deletions in_text/text_01_spatial.yml
Original file line number Diff line number Diff line change
Expand Up @@ -38,23 +38,27 @@ entities:
attr-def: >-
Latitude of the site location.
attr-defs: NA
data-min: NA
data-max: NA
data-units: NA
data-min: 30.14549
data-max: 48.90596
data-units: decimal degrees
-
attr-label: long
attr-def: >-
Longitude of the site location.
attr-defs: NA
data-min: NA
data-max: NA
data-units: NA
data-min: -123.3299
data-max: -70.97964
data-units: decimal degrees


data-name: GIS points of sites used in this study.
data-description: Location of USGS river gages used in this study.

file-format: Shapefile Data Set
process-date: 20201028
indirect-spatial: U.S.A.
latitude-res: 0.00001
longitude-res: 0.00001

build-environment: >-
This dataset was generated using XX.
39 changes: 10 additions & 29 deletions in_text/text_02_observations.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ title: >-


abstract: >-
Mean daily temperature and discharge observations retrieved from NWIS. The temperature observations were used to train and validate all temperature models, while flow observations were used as a model input during training for a subset of the temperature models. The model training period was from 2010-10-01 to 2014-09-30, and the test period was from 2014-10-01 to 2016-09-30.
Mean daily temperature and discharge observations retrieved from NWIS. The temperature observations were used to train and validate all temperature models. The model training period was from 2010-10-01 to 2014-09-30, and the test period was from 2014-10-01 to 2016-09-30.

cross-cites:
-
Expand Down Expand Up @@ -41,32 +41,13 @@ entities:
data-min: NA
data-max: NA
data-units: degrees Celsius
-
data-name: flow_observations.csv
data-description: Observed mean daily discharge observation retrieved from NWIS for the 118 gages used in this study. Flow observations were used as a driver in the water temperature model. The data were retrieved from NWIS and are limited to the test and training period, from 2010-10-01 through 2016-09-30.
attributes:
-
attr-label: site_no
attr-def: >-
USGS unique site identifier.
attr-defs: NA
data-min: NA
data-max: NA
data-units: NA
-
attr-label: datetime
attr-def: >-
Date of temperature observation.
attr-defs: NA
data-min: NA
data-max: NA
data-units: NA
-
attr-label: discharge(cfs)
attr-def: Observed mean daily discharge
attr-defs: NA
data-min: NA
data-max: NA
data-units: degrees Celsius

file-format: comma-delimited files

process-date: XX
indirect-spatial: U.S.A.
latitude-res: 0.00001
longitude-res: 0.00001
data-name: Water temperature observations
data-description: >-
Water temperature observations used to train and validate models described in Rahmani et al. 2020.
file-format: comma seperated file format (csv)
37 changes: 32 additions & 5 deletions in_text/text_03_inputs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ title: >-
Exploring the exceptional performance of a deep learning stream temperature model and the value of streamflow data: 3 model inputs

abstract: >-
Inputs to the deep learning models included daily weather forcing data, as well as river catchment attributes.
Inputs to the deep learning models included daily weather forcing data, river catchment attributes, and simulated or observed flow.

cross-cites:
-
Expand Down Expand Up @@ -305,14 +305,41 @@ entities:
data-min: NA
data-max: NA
data-units: cubic feet per second


-
data-name: flow_observations.csv
data-description: Observed mean daily discharge observation retrieved from NWIS for the 118 gages used in this study. Flow observations were used as a driver in the water temperature model. The data were retrieved from NWIS and are limited to the test and training period, from 2010-10-01 through 2016-09-30.
attributes:
-
attr-label: site_no
attr-def: >-
USGS unique site identifier.
attr-defs: NA
data-min: NA
data-max: NA
data-units: NA
-
attr-label: datetime
attr-def: >-
Date of temperature observation.
attr-defs: NA
data-min: NA
data-max: NA
data-units: NA
-
attr-label: discharge(cfs)
attr-def: Observed mean daily discharge
attr-defs: NA
data-min: NA
data-max: NA
data-units: degrees Celsius

build-environment: Multiple computer systems were used to generate these data, including XX. The open source languages R and Python was used on all systems, as well as XX.

process-date: !expr format(Sys.time(),'%Y%m%d')
indirect-spatial: U.S.A.
latitude-res: 0.1
longitude-res: 0.1
data-name: weather data, river catchment metadata

data-name: Model driver data
data-description: >-
Inputs (drivers) for the temperature models described in Rahmani et al. 2020, including weather drivers, river basin attributes, and simulated and observed river discharge.
file-format: comma seperated file format (csv)
19 changes: 14 additions & 5 deletions in_text/text_05_predictions.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,15 +2,15 @@ title: >-
Exploring the exceptional performance of a deep learning stream temperature model and the value of streamflow data: 5 model prediction data

abstract: >-
A deep learning model framework was used to make water temperature predictions in 118 river catchments across the U.S. All four model (LR, noQ, obsQ, and simQ) predictions are included. Additionally, a deep learning model was used to simulate discharge, which was used as inputs to the water temperature model.
A deep learning model framework was used to make water temperature predictions in 118 river catchments across the U.S. All four model (LR, noQ, obsQ, and simQ) predictions are included.

cross-cites:
-
authors: ['XX']
authors: ['DP Feng', 'K Fang', "CP Shen"]
title: >-
Cross cite code base?
pubdate: XX
link: XX
Enhancing streamflow forecast and extracting insights using continental-scale long-short term memory networks with data integration
pubdate: 2020
link: https://doi.org/10.1029/2019WR026793

build-environment: >-
We used XX open source XX; Any supercomputing resources used? XX
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this one, let's plan to (1) get more info from farshid about the compute environment (i added a bullet to "Text chunks we hope Farshid can fill in" in the "ERL data release plan" doc) and (2) add a reference to the environment.yml. I will make an issue for this so I can plan to do it - i have most of the info in hand already but will need a few minutes to put it together.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(We could also just write one cover-everything text chunk that we use for all metadata files)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added notes to #7

Expand Down Expand Up @@ -46,3 +46,12 @@ entities:
data-min: NA
data-max: NA
data-units: degrees Celsius

process-date: !expr format(Sys.time(),'%Y%m%d')
indirect-spatial: U.S.A.
latitude-res: 0.00001
longitude-res: 0.00001
data-name: Model predictions
data-description: >-
Stream water temperature predictions from each model described in Rahmani et al. 2020.
file-format: comma seperated file format (csv)
9 changes: 9 additions & 0 deletions in_text/text_06_evaluation.yml
Original file line number Diff line number Diff line change
Expand Up @@ -77,3 +77,12 @@ entities:

build-environment: >-
We used XX open source XX.

process-date: !expr format(Sys.time(),'%Y%m%d')
indirect-spatial: U.S.A.
latitude-res: 0.00001
longitude-res: 0.00001
data-name: Model evaluation metrics
data-description: >-
Evaluation metrics used to compare stream temperature models in Rahmani et al. 2020.
file-format: comma seperated file format (csv)
6 changes: 6 additions & 0 deletions in_text/text_SHARED.yml
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,13 @@ funding-credits: >-
process-description: >-
At the core of the modeling framework is a deep learning model that uses inputs of XX.

process-date: 20201028
latitude-res: 0.1
longitude-res: 0.1
distro-person: Samantha K. Oliver

build-environment: Multiple computer systems were used to generate these data, including linux, OSX. The open source languages R and Python were used on all systems. XX

liability-statement: >-
Unless otherwise stated, all data, metadata and related materials are considered to satisfy the quality standards relative to the purpose for which the data were collected.
Although these data and associated metadata have been reviewed for accuracy and completeness and approved for release by the U.S. Geological Survey (USGS),
Expand Down
Loading