Skip to content
Damien Irving edited this page May 19, 2015 · 1 revision

A big advantage of using the CWSLab workflow tool is the provenance information it keeps about the code and data associated with any particular workflow. VisTrails already allows you to produce pretty flow diagrams (see here), and the CWSLab plugin adds to this functionality by recording the nitty-gritty details of every single command line program that was run throughout the workflow.

Code provenance

Details of the code used in a given workflow is stored within the global history attribute of the netCDF output files. In particular, three pieces of information are archived:

  • The name of the associated VisTrails file and its git hash (i.e. the unique 40-character identifier that can be used to find the exact version of the file that was executed in your git repository)
  • The name of the script that was executed to produced the file and its git hash
  • The entire history of command line programs that were run all the way back to the initial download of the data

Data provenance

All CMIP5 files have a version global attribute which indicates the precise version of the data used. This attribute is retained throughout any workflow, which means you'll always know which version of CMIP5 data you're using.

Clone this wiki locally