Skip to content

Commit

Permalink
Merge pull request #456 from hdolinh/2023-10-delta
Browse files Browse the repository at this point in the history
2023 10 delta
  • Loading branch information
hdolinh authored Oct 24, 2023
2 parents 861f6c1 + 2c55d76 commit b96db97
Show file tree
Hide file tree
Showing 4 changed files with 35 additions and 11 deletions.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
43 changes: 32 additions & 11 deletions materials/sections/reproducible-workflows-targets.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -13,37 +13,57 @@ bibliography: book.bib
::: {.callout-note icon=false}
## Acknowledgements

This lesson is adapted from the journal article [Improving ecological data science with workflow management software](https://doi.org/10.1111/2041-210X.14113) by Brousil et al, and the journal's accompanying example, [A worked targets example for ecologists](https://targets-ecology.netlify.app/).
This lesson is adapted from the following resources:

- Journal article [Improving ecological data science with workflow management software](https://doi.org/10.1111/2041-210X.14113) by Brousil et al
- Brousil et al's accompanying example, [A worked targets example for ecologists](https://targets-ecology.netlify.app/)
- RLadies Santa Barbara Chapter Workshop: [An introduction to `targets` for R](https://youtu.be/qxhLH6sIZqQ?feature=shared)

:::

## Challenges of Workflows

All research projects have a workflow of some kind and typically includes steps like: data preparation and harmonization, running analyses or models, creating visualizations, and more.

However, many environmental research projects are becoming increasingly more complex as researchers are utilizing larger datasets that require complicated analytical methods. More complexity means more steps, and more room for error or poor organizational methods that make projects difficult to reproduce.
![Example of ideal straightforward data workflow](images/reproducible-workflows-targets-1.png)

However, many environmental research projects are becoming **increasingly more complex** as researchers are utilizing **larger datasets** that require **complicated analytical methods**. More complexity means more steps, and more room for error or poor organizational methods that make projects difficult to reproduce. More complex analysis may also mean **longer run times**, which can make updating functions and analysis time-consuming.

This is where reproducible workflow tools and packages, like the R package `targets`, can play a huge role in streamlining complex workflows and ease the organization and sharing of projects.
![Example of a more realistic data workflow...](images/reproducible-workflows-targets-2.png)

Other interchangeable terms we hear for workflows are:
This is where **reproducible workflow tools and packages**, like the R package `targets`, can play a huge role in **streamlining complex workflows** and ease the organization and sharing of projects.

Other **interchangeable terms for workflows** are:

- Workflow Management Systems (WMS)
- Data pipelines
- Data workflow

## Benefits of Reproducible Workflows

A major benefit of WMS is the capacity to track the status of all required files and functions to prevent steps in a larger pipeline from being skipped and by ensuring that data are kept up to date as models or harmonization routines change. Despite WMS's benefits, adopting a WMS requires moving away from performing serial analytical operations within single or multiple scripts to instead breaking an analysis into smaller functions that are modular (Figure 1), thereby providing more computational flexibility. [@brousil2023]
Using a **reproducible workflow** allows us to:

## Challenges of Reproducible Workflows
- **track the status** of all required files and functions which makes it easier to keep all steps in the overall workflow up-to-date [@brousil2023]
- break our analysis and data processing steps into smaller functions that are **modular** which results in more **computational flexibility** [@brousil2023] and makes it easier to debug when errors occur
- **reduce the computational tasks** to only run as necessary as opposed to anytime there is an update in on the steps in the workflow [@brousil2023]
- **utilize continuous integration** (automating tasks) so that we spend less time on manual work and are less prone to simple errors (e.g. misspellings) [@brousil2023]

WMS requires moving away from performing serial analytical operations within single or multiple scripts to instead breaking an analysis into smaller functions that are modular, thereby providing more computational flexibility [@brousil2023]
Overall a reproducible workflow enhances our research projects because it **improves our understanding** of our work for ourselves and for collaborators, makes our work **more efficient and automated**, and **increases reproducibility**.

Indeed, researchers will face a familiar trade-off—to invest the personnel time in learning and deploying a tool like WMS to save personnel and compute time later or to accept that less efficient, but more familiar analytical frameworks will come with costs in personnel time, compute time and potentially additional associated payments for computing resources. The balance will depend on the complexity of the analysis, the expectation of reusing the code over time and the resources available to the researcher (e.g. funds for computing or personnel) [@brousil2023].
::: {.callout-warning icon=false}
### Challenges of Reproducible Workflows

## Leveraging Reproducible Workflows & Tools
While the benefits of reproducible workflows are immense, workflows and the utilizing workflow management tools can be **intimidating** at the start due to:

- **high learning-curve** for implementing reproducible workflow tools [@brousil2023]
- **limited training resources** and opportunities to applying WMS for environmental researchers and professionals [@brousil2023]
- **infrequent use** of WMS and reproducible workflows in the environmental field means there are less examples to learn from and a **lack of standardized methods** for using WMS [@brousil2023]
:::

WMS may not be needed by most beginners, but learning about these tools early may inspire researchers with the understanding that their analyses can scale with the ambition and complexity of their most pressing research questions. Implementing WMS is an investment that does take time but can save a great deal of time and frustration later [@brousil2023].

### R Package: `targets` for Reproducible Worklows {.unnumbered}
## Leveraging `targets` for Reproducible Worklows

WMS may not be needed by most beginners, but learning about these tools early may inspire researchers with the understanding that their analyses can scale with the ambition and complexity of their most pressing research questions. Implementing WMS is an investment that does take time but can save a great deal of time and frustration later [@brousil2023].

::: {.callout-caution icon=false}
#### What is the `targets` package?
Expand All @@ -52,4 +72,5 @@ WMS may not be needed by most beginners, but learning about these tools early ma
`targets` can also help users build, visualize, and manage workflows from raw files to outputs.
:::


## Exercise: Creating a Pipeline using `targets`
3 changes: 3 additions & 0 deletions materials/session_08.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -5,3 +5,6 @@ title-block-banner: true

{{< include /sections/reproducible-workflows-targets.qmd >}}




0 comments on commit b96db97

Please sign in to comment.