Skip to content

Commit

Permalink
Add 'Related R Packages' section to JOSS draft
Browse files Browse the repository at this point in the history
  • Loading branch information
jpdunc23 committed Jan 29, 2024
1 parent ae987e6 commit eda0e92
Show file tree
Hide file tree
Showing 3 changed files with 146 additions and 5 deletions.
91 changes: 91 additions & 0 deletions vignettes/paper.bib
Original file line number Diff line number Diff line change
Expand Up @@ -77,3 +77,94 @@ @Manual{chang-r6-2022
year = {2022},
note = {https://r6.r-lib.org, https://github.com/r-lib/R6/},
}

@article{chalmers-simdesign-2020,
author = {Chalmers, R. Philip AND Adkins, Mark C. },
journal = {The Quantitative Methods for Psychology},
publisher = {TQMP},
title = {Writing Effective and Reliable Monte Carlo Simulations with the SimDesign Package},
year = {2020},
volume = {16},
number = {4},
url = {http://www.tqmp.org/RegularArticles/vol16-4/p248/p248.pdf },
pages = {248-280},
doi = {10.20982/tqmp.16.4.p248}
}

@article{gasparini-rsimsum-2018,
doi = {10.21105/joss.00739},
url = {https://doi.org/10.21105/joss.00739},
year = {2018}, publisher = {The Open Journal},
volume = {3},
number = {26},
pages = {739},
author = {Gasparini, Alessandro},
title = {rsimsum: Summarise results from Monte Carlo simulation studies},
journal = {Journal of Open Source Software}
}

@article{blair-declaredesign-2019,
Author = {Blair, Graeme and Cooper, Jasper and Coppock, Alexander and Humphreys, Macartan},
Title = {{Declaring and Diagnosing Research Designs}},
Journal = {American Political Science Review},
Year = {2019},
Volume = {113},
Number = {3},
Pages = {838--859},
}

@Manual{joshi-simhelpers-2024,
title = {simhelpers: Helper Functions for Simulation Studies},
author = {Joshi, Megha and Pustejovsky, James},
year = {2024},
note = {R package version 0.2.0},
url = {https://meghapsimatrix.github.io/simhelpers/index.html},
}

@Manual{scheer-simTool-2020,
title = {simTool: Conduct Simulation Studies with a Minimal Amount of Source Code},
author = {Scheer, Marsel},
year = {2020},
note = {R package version 1.1.7},
url = {https://cran.r-project.org/web/packages/simTool/index.html},
}

@Manual{epskamp-parSim-2024,
title = {parSim: Parallel Simulation Studies},
author = {Sacha Epskamp},
year = {2023},
note = {R package version 0.1.5},
url = {https://cran.r-project.org/web/packages/parSim/parSim.pdf},
}

@misc{bien-simulator-2016,
title={The Simulator: An Engine to Streamline Simulations},
author={Jacob Bien},
year={2016},
eprint={1607.00021},
archivePrefix={arXiv},
primaryClass={stat.CO}
}

@Article{couch-infer-2021,
title = {{infer}: An {R} package for tidyverse-friendly statistical inference},
author = {Simon P. Couch and Andrew P. Bray and Chester Ismay and Evgeni Chasnovski and Benjamin S. Baumer and Mine Çetinkaya-Rundel},
journal = {Journal of Open Source Software},
year = {2021},
volume = {6},
number = {65},
pages = {3661},
doi = {10.21105/joss.03661},
}

@article{hofert-simsalapar-2016,
title={Parallel and Other Simulations in R Made Easy: An End-to-End Study},
volume={69},
url={https://www.jstatsoft.org/index.php/jss/article/view/v069i04},
doi={10.18637/jss.v069.i04},
number={4},
journal={Journal of Statistical Software},
author={Hofert, Marius and Mächler, Martin},
year={2016},
pages={1–44}
}
60 changes: 55 additions & 5 deletions vignettes/paper.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@ bibliography: paper.bib

# Summary

`simChef` is an R package that empowers data science practitioners to rapidly
`simChef` is an `R` package that empowers data science practitioners to rapidly
plan, carry out, and summarize statistical simulation studies in a flexible,
efficient, and low-code manner. Drawing substantially from the Predictability,
Computability, and Stability (PCS) framework [@yu-veridical-2020], `simChef`
Expand Down Expand Up @@ -92,7 +92,7 @@ At its core, `simChef` breaks down a simulation experiment into four modular com
- `DGP`: the data-generating processes from which to *generate* data
- `Method`: the methods (or models) to *fit* in the experiment
- `Evaluator`: the evaluation metrics used to *evaluate* the methods' performance
- `Visualizer`: the visualization functions used to *visualize* outputs from the method fits or evaluation results (can be tables, plots, or even R Markdown snippets to display)
- `Visualizer`: the visualization functions used to *visualize* outputs from the method fits or evaluation results (can be tables, plots, or even `R` Markdown snippets to display)

![Overview of the four core components in a `simChef` `Experiment`. `simChef`
provides four classes that implement distinct simulation objects in
Expand Down Expand Up @@ -200,12 +200,62 @@ Once saved, the user can add new `DGP` and `Method` objects to the experiment an
Considering the example above, when we add `new_method` and call `run_experiment` with `use_cached = TRUE`, `simChef` finds that the cached results are missing combinations of `new_method`, existing DGPs, and their associated parameters, giving nine new configurations.
Replicates for the new combinations are then appended to the cached results.

`simChef` also provides users with a convenient API to automatically generate an R Markdown document.
`simChef` also provides users with a convenient API to automatically generate an `R` Markdown document.
This documentation gathers the scientific details, summary tables, and visualizations side-by-side with the user's custom source code and parameters for data-generating processes, statistical methods, evaluation metrics, and plots.
A call to `init_docs` generates empty markdown files for the user to populate with their overarching simulation objectives and with descriptions of each of the `DGP`, `Method`, `Evaluator`, and `Visualizer` objects included in the `Experiment`.
Finally, a call to `render_docs` prepares the R Markdown document, either for iterative design and analysis of the simulation or to provide a high-quality overview that can be shared easily.
Finally, a call to `render_docs` prepares the `R` Markdown document, either for iterative design and analysis of the simulation or to provide a high-quality overview that can be shared easily.
We provide an example of the simulation documentation [here](https://philboileau.github.io/simChef-case-study/results/empirical-fdr-comparison/empirical-fdr-comparison.html).
Corresponding R source code is available on [GitHub](https://github.com/PhilBoileau/simChef-case-study).
Corresponding `R` source code is available on [GitHub](https://github.com/PhilBoileau/simChef-case-study).

# Related `R` packages

A number of existing `R` packages and projects address needs related `simChef`'s
functionality. The `batchtools` package [@lang-batchtools-2017] provides
abstractions for "problems", "algorithms", and "experiments", similar to
`simChef`'s `DGP`, `Method`, and `Experiment` objects, respectively.
Additionally, `batchtools` provides a number of utilities for shared-memory and
distributed memory computations, including for interacting with high-performance
computing cluster schedulers such as Slurm and Torque. `simChef` is able to
leverage these utilities for distributed computations via the backends provided
by the `future.batchtools` package which is part of the `future` ecosystem of
`R` packages [@bengtsson-unifying-2021]. Whereas `batchtools` is a general tool
for distributed mapping operations, `simChef` specializes in data science
simulations and provides additional functionality tailored to that setting
including its `tidy` grammar of simulation experiments, the `Evaluator` and
`Visualizer` concepts, and automated documentation capabilities discussed above.

Many existing packages aim to simplify the process of creating simulation
experiments by reducing coding burden through distributed computing helpers and
preset methods for generating, computing, and summarizing simulation replicates.
`SimDesign` [@chalmers-simdesign-2020] focuses on Monte Carlo simulation
experiments and provides a function `runSimulation` that accepts user-defined
`generate`, `analyse`, and `summarise` functions, with support for distributed
computation via the `parallel` base `R` package and `future`. `simulator`
[@bien-simulator-2016] provides a `tidy` grammar of simulation experiments and
highly modular helpers for evaluating and managing simulation outputs, relying
on the `parallel` package for distributed computation. Other packages provide a
small number of well-tailored helper functions for specific simulation settings
or distributed computation, including `simhelpers` [@joshi-simhelpers-2024],
`simTool` [@scheer-simTool-2020], `parSim` [@epskamp-parSim-2024], `rsimsum`
[@gasparini-rsimsum-2018], and `simsalapar` [@hofert-simsalapar-2016]. To our
knowledge, no single existing package includes `simChef`'s combination of
conceptual modularity, `tidy` grammar, computational flexibility, simulation
workflow management, and automated documentation.

Another category of related packages are those that share conceptual
similarities with `simChef` in terms of providing helpful abstractions for the
design and analysis of simulation experiments, but at a finer level of detail
than `simChef` intends. For example, the package `DeclareDesign`
[@blair-declaredesign-2019] provides various `declare_*` functions for defining
and evaluating statistical research questions, with an emphasis on the social
sciences. The package `infer` [@couch-infer-2021] provides a `tidy` API for
statistical inference, providing the ability to specify random variables and
their relationships, define a null hypothesis, generate data under that
hypothesis, and calculate distributions of statistics based on that hypothesis.
Both of these packages and many of the packages discussed above could be
employed in a user's `DGP`, `Method`, `Evaluator`, or `Visualizer` and deployed
via an `Experiment` to carry out a large-scale simulation with automated
documentation in harmony with `simChef`.

# Discussion

Expand Down
Binary file modified vignettes/paper.pdf
Binary file not shown.

0 comments on commit eda0e92

Please sign in to comment.