-
Notifications
You must be signed in to change notification settings - Fork 32
Tutorial
This tutorial will guide you through preparing a workflow to take in downloaded CMIP5 sea surface temperature data and output plots of the Niño 3.4 index. The tutorial workflow uses processing scripts from the collection that can be accessed at the CWSLab climate tools repository.
This tutorial assumes that you have connected to the CWSLab virtual machine and performed the first three steps in the Getting Started section.
You can use this system to create a file location and structure of your choosing. However, for this tutorial we will use the default data structure based on the CMIP5 data reference syntax and path /short/$PROJECT/$USER/
where $USER is your username and $PROJECT your NCI project code.
The first step is to set up the input dataset. For this case, we are starting with CMIP5 data from the NCI downloaded archive. In the module select area, select the CMIP5
module (found under DataSets\GCM
in the Climate and Weather Science Laboratory
package) and drag it onto the workspace. You can select the modules easily by typing its name into the module select box. This module represents the entire collection of downloaded CMIP5 data at NCI.
As the workflow stands, this dataset will include the entire CMIP5 archive! We need to add constraints to restrict the number of files down to a more reasonable level. Do this using the Constraint Builder
module. This module adds restrictions to a dataset so it only includes the data that you require.
Select the Constraint Builder
from the Modules panel and drag it to the work space. Connect its output to the input of the CMIP5
module. To add constraints, you will need to type into the constraint_string
box with the Constraint Builder
selected. For a first example, we will restrict our Dataset to monthly 'tos' (temperature at ocean surface) data from the 'inmcm4' model, experiments 'rcp45' and 'rcp85'. These constraints can be added by typing the following string into the constraint_string
box:
experiment = rcp85, rcp45 ; variable = tos ; model = inmcm4 ; frequency = mon
The name of the attribute you want to constrain is followed by an equal sign then a list of values that the constraint can take, separated by commas. Different constraints are separated by a semicolon. If you do not want to restrict the values that a constraint can take then leave it out of the string completely.
###Step 2: Add an operation to the workflow: Create Python CDAT Catalogue Files
We have now created our input Dataset, the next step is to begin running processing tasks on it.
Downloaded CMIP5 data is usually split into separate files for different time slices. In our example workflow, we now need to join these individual downloaded netCDF files into a single catalogue for each 'model run'. Once we perform this operation we will have two catalogue files: one for the rcp85
experiment and one for the rcp45
experiment.
In this workflow we use a module called Merge Timeseries
, which uses the Python CDAT library to create a single-file catalogue of these files. This module can be found under the Aggregation
group in the Climate and Weather Science Laboratory
package, or again by typing its name into the module search box.
Add the Merge Timeseries
module to your workflow and connect it to the output of the CMIP5
module. Your workflow should look similar to this:
The workflow is now in a state that it can be run.
The next step is to add the remaining modules to the workflow. To the output of the Merge Timeseries
add a Crop
module. This module selects data between two timepoints and/or lat/lon limits.
When you drag the Crop
into the workflow you will notice that it has multiple input ports, unlike the modules you have used so far. This is because this module requires extra input from the user; a timeend
string, and a timestart
string; and lat/lon limits 'latnorth', 'latsouth', 'loneast', 'lonwest'. These parameters require input - they set the time bounds of the aggregation. In the attached screenshot I have set the aggregation to begin at 2060 and end at 2080.
To complete the workflow, add a Nino3.4
module to the output of the netCDF from CDML
. Then add a Plot Timeseries
module. This module requires you to enter a variable name to be plotted, for this workflow the variable needs to be set as tos
. Finally, add an Image Viewer
module. These modules calculate the Niño 3.4 index from the input, plot the result and then diplay results in the Vistrails Spreadsheet Module. Your completed workflow should look much like this:
Click on the 'Execute' button and the workflow will execute, one module at a time. You can simulate the workflow before running it by changing the simulate_execution
setting in your user configuration. If you have started VisTrails from a command line then any error or info messages will appear in the terminal. If a step in the workflow fails the module will turn red.
Now you can experiment by changing some parameters in the workflow. Try including other models like ACCESS1-0
or MIROC5
by adding them to the Constraint Builder
, or altering the year_start
and year_end
strings in the netCDF from CDML.
You can also check the metadata in the output netCDF files. If you run ncdump -h
on one of the Niño 3.4 output files (they will be found at a path like /short/$PROJECT/$USER/CMIP5/GCM/native/INM/inmcm4/rcp85/mon/ocean/tos/r1i1p1/tos_Omon_inmcm4_rcp85_r1i1p1_2060-2080_nino34_native.nc
). You should see a vistrails_history
metadata attribute with a record of the workflow and scripts run on that file, with their git versions if available.