diff --git a/cover-letter.pdf b/cover-letter.pdf index 12b5431..9986842 100644 Binary files a/cover-letter.pdf and b/cover-letter.pdf differ diff --git a/images/fig-date-1.png b/images/fig-date-1.png deleted file mode 100644 index f5619ac..0000000 Binary files a/images/fig-date-1.png and /dev/null differ diff --git a/images/fig-date-1.svg b/images/fig-date-1.svg new file mode 100644 index 0000000..2a05fe5 --- /dev/null +++ b/images/fig-date-1.svgdiff --git a/images/fig-map-1.png b/images/fig-map-1.png deleted file mode 100644 index 3e37f90..0000000 Binary files a/images/fig-map-1.png and /dev/null differ diff --git a/images/fig-map-1.svg b/images/fig-map-1.svg new file mode 100644 index 0000000..2dc4941 --- /dev/null +++ b/images/fig-map-1.svgdiff --git a/images/fig-outlier-1.png b/images/fig-outlier-1.png deleted file mode 100644 index 4162ff8..0000000 Binary files a/images/fig-outlier-1.png and /dev/null differ diff --git a/images/fig-outlier-1.svg b/images/fig-outlier-1.svg new file mode 100644 index 0000000..1f0928b --- /dev/null +++ b/images/fig-outlier-1.svgdiff --git a/images/fig-season-1.png b/images/fig-season-1.png deleted file mode 100644 index ad874b2..0000000 Binary files a/images/fig-season-1.png and /dev/null differ diff --git a/images/fig-season-1.svg b/images/fig-season-1.svg new file mode 100644 index 0000000..b3169ac --- /dev/null +++ b/images/fig-season-1.svgdiff --git a/images/fig-site-1.png b/images/fig-site-1.png deleted file mode 100644 index 0404bfa..0000000 Binary files a/images/fig-site-1.png and /dev/null differ diff --git a/images/fig-site-1.svg b/images/fig-site-1.svg new file mode 100644 index 0000000..9d32386 --- /dev/null +++ b/images/fig-site-1.svgdiff --git a/images/importflow.png b/images/importflow.png deleted file mode 100644 index 066206b..0000000 Binary files a/images/importflow.png and /dev/null differ diff --git a/images/importflow.tif b/images/importflow.tif new file mode 100644 index 0000000..e5b77d4 Binary files /dev/null and b/images/importflow.tif differ diff --git a/images/qcex.png b/images/qcex.png deleted file mode 100644 index e4c5661..0000000 Binary files a/images/qcex.png and /dev/null differ diff --git a/images/qcex.tif b/images/qcex.tif new file mode 100644 index 0000000..8d5c31d Binary files /dev/null and b/images/qcex.tif differ diff --git a/images/workflow.png b/images/workflow.png deleted file mode 100644 index 1a1945f..0000000 Binary files a/images/workflow.png and /dev/null differ diff --git a/images/workflow.tif b/images/workflow.tif new file mode 100644 index 0000000..1f06679 Binary files /dev/null and b/images/workflow.tif differ diff --git a/manu-draft.docx b/manu-draft.docx index 5367d92..ff6c103 100644 Binary files a/manu-draft.docx and b/manu-draft.docx differ diff --git a/manu-draft.qmd b/manu-draft.qmd index 9c37aab..62ffd99 100644 --- a/manu-draft.qmd +++ b/manu-draft.qmd @@ -3,9 +3,12 @@ format: docx: reference-doc: my_styles.docx number-sections: true -title: "MassWateR: Improving Quality Control, Analysis, and Sharing of Water Quality Data" +title: "MassWateR: Improving quality control, analysis, and sharing of water quality data" bibliography: refs.bib csl: plos-one.csl +crossref: + fig-title: Fig + fig-prefix: Fig author: - Marcus W Beck: email: mbeck@tbep.org @@ -21,9 +24,9 @@ author: correspondence: false institute: - - TBEP: Tampa Bay Estuary Program, St. Petersburg, Florida 33701 USA + - TBEP: Tampa Bay Estuary Program, St. Petersburg, Florida, USA - ACA: ACASAK Consulting, Boston, Massachusetts USA - - MASS: Massachusetts Bays National Estuary Partnership, Boston, Massachusetts 02125 USA + - MASS: Massachusetts Bays National Estuary Partnership, Boston, Massachusetts, USA filters: - templates/scholarly-metadata.lua @@ -39,6 +42,7 @@ execute: message: false ft.keepnext: false ft.align: left + fig-format: svg fig-dpi: 300 --- @@ -57,6 +61,8 @@ load(file = here('tabs/filerequirements.RData')) Short title: *MassWateR* R package for water quality data +{{< pagebreak >}} + ## Abstract {.unnumbered} The long-term protection and restoration of aquatic resources depends on robust monitoring data; data that require systematic quality control and analysis tools. The *MassWateR* R package facilitates quality control, analysis, and data sharing for discrete surface water quality data collected by monitoring programs of various size and technical capacity. The tools were developed to address regional needs for programs in Massachusetts, USA, but the principles and outputs can be applicable to monitoring data collected anywhere. Users can create quality control reports, perform outlier analyses, and assess trends by season, date, and site for more than 40 parameters. Users can also prepare data for submission to the United States Environmental Protection Agency Water Quality Exchange, thus sharing data to the largest water quality database in the United States. The automated and reproducible workflow offered by *MassWateR* is expected to increase the quantity and quality of publicly available data to support the management of aquatic resources. @@ -77,7 +83,7 @@ To our knowledge, there are no existing R packages on CRAN that can be used to f Users can engage with *MassWateR* to achieve different goals. This design was intentional based on likely differences in needs among the user community. Although increasing data submission and facilitating QC reporting was the primary goal, we also assumed that users may not want to do both. That is, state institutions require QC reporting for regulatory assessments, whereas data submission to WQX may be a separate process. Users may also simply have a need to explore trends or to summarize their data, while also wanting to extend these analyses beyond *MassWateR* using additional R packages. @fig-workflow demonstrates how a user may apply the functions in *MassWateR* once the required data are imported. The functions allow a user to engage with their data several ways. The first step, QC screening, is often iterative as a user can modify parts of the raw data based on input checks or outliers. The second step can be used to create a QC report for submission to a regulatory agency. The third step is data analysis and visualization, using MassWateR functions and downstream analysis with additional R packages and functions. The fourth and final step can create a formatted table for WQX submission. -![Workflow demonstrating how a user could engage with the *MassWateR* package. WQX: Water Quality Exchange; QC: Quality Control.](images/workflow.png){#fig-workflow width=100% fig-alt="Workflow diagram showing four ways to engage with *MassWateR*"} +![Workflow demonstrating how a user could engage with the *MassWateR* package. WQX: Water Quality Exchange; QC: Quality Control.](images/workflow.tif){#fig-workflow width=100% fig-alt="Workflow diagram showing four ways to engage with *MassWateR*"} No matter the user need, all data inputs to *MassWateR* must follow a strict format. Developing a workflow to accommodate data inputs from the dozens of potential users from several organizations that use different data formats would have been impractical. As such, the primary limitation to using the package is to adhere to the formatting requirements for all input files. Several [resources](https://massbays-tech.github.io/MassWateR/RESOURCES.html) are provided on the package web page to assist users in formatting their data. These resources included several training activities that were conducted during package development and templates demonstrating the appropriate format and rationale. The trainings, *pkgdown* [website](https://massbays-tech.github.io/MassWateR/) [@Wickham22], and [Community of Practice forum](https://massbays.discourse.group/login) were also implemented so that learning R was not a significant limitation for using the package. @@ -113,7 +119,7 @@ Additionally, functions may often include a suffix that describes the relevant f The primary task of the `read` functions is to ensure all imported files follow the required format for the package. Excel files are the expected format for all inputs and the `read` functions use the `read_excel` function from the *readxl* package [@Wickham23]. The `read` functions do very little other than import the file - once the file is imported it is immediately passed to one of the relevant `check` functions inside the `read` function. There are several checks for each type of input file, with the number of checks increasing based on the complexity of the input file. Each check is printed to the R console on completion, whereas an error is returned at the first instance of a failed check, at which point the function exits. The error will typically indicate which parts of the input file need to be changed to rectify the issue, often indicating a specific cell in the Excel file that requires attention. As such, the workflow is intended to be iterative, where a user imports a file, receives an error, manually changes the input file in Excel, then imports the data again until all checks pass (@fig-importflow). Again, this design was intentional as many monitoring agencies and groups organize data differently and a standard input format for the package was the best option to accommodate all potential users. This may also encourage future standardization among monitoring groups for how data are maintained to ease formatting challenges to using *MassWateR*. A user only needs to format their data once to use the package. -![Pseudocode demonstrating the iterative process of importing a required data file for *MassWateR*. All read functions import an Excel file from the user-specified path (e.g., `respth`) and the imported file is then passed to a check function. The function exits if an error is encountered, allowing the user to manually fix the identified error and then import again. After all checks are passed, a formatting function is applied to correct minor issues (e.g., standardize date format as YYYY-MM-DD) and the final data object is returned (e.g., `resdat`).](images/importflow.png){#fig-importflow fig-alt="Diagram showing pseudocode for importing and fixing a data file with *MassWateR*" width=100%} +![Pseudocode demonstrating the iterative process of importing a required data file for *MassWateR*. All read functions import an Excel file from the user-specified path (e.g., `respth`) and the imported file is then passed to a check function. The function exits if an error is encountered, allowing the user to manually fix the identified error and then import again. After all checks are passed, a formatting function is applied to correct minor issues (e.g., standardize date format as YYYY-MM-DD) and the final data object is returned (e.g., `resdat`).](images/importflow.tif){#fig-importflow fig-alt="Diagram showing pseudocode for importing and fixing a data file with *MassWateR*" width=100%} A correctly formatted input file would be imported as follows, with the messages in the console indicating the checks that were performed and that all checks were successful. Below demonstrates what would be shown for the results file using an example dataset included with the package. A total of fifteen checks are applied to the results file (described in detail in the [help documentation](https://massbays-tech.github.io/MassWateR/reference/checkMWRresults.html) and [vignettes](https://massbays-tech.github.io/MassWateR/articles/inputs.html#surface-water-quality-results)). @@ -275,7 +281,7 @@ qcMWRreview(fset = fsetls, output_dir = getwd()) #> Report created successfully! File located at /tmp/RtmpUzzrbC/qcreview.docx ``` ![The first two of sixteen pages of the quality control report created by `qcMWRreview()` that evaluates the results data relative to data quality objectives. The first page shows the data quality objectives for accuracy, frequency, and completeness. The second page shows QC results for frequency and completeness. Parameters shown in red or marked as 'MISS' failed the data quality objectives. Users can edit the Word file as needed, e.g., entering the organization name or adding notes. -](images/qcex.png){#fig-qcex} +](images/qcex.tif){#fig-qcex} The QC report is built using several functions that can be used individually as needed. In particular, the `tabMWRacc()`, `tabMWRfre()`, and `tabMWRcom()` create *flextable* [@Gohel23] objects that can be viewed in RStudio and are compatible with Word output. These functions are useful for understanding how the QC checks are created for the separate components of the QC report. For example, the `tabMWRacc()` function evaluates accuracy checks for field duplicates, lab duplicates, field blanks, lab blanks, and lab spikes/instrument checks for QC records in the results file based on DQOs in the accuracy file. The function can return a summary of all checks as follows (only ammonia and total phosphorus are shown for brevity): diff --git a/motivation-letter.pdf b/motivation-letter.pdf deleted file mode 100644 index 1153415..0000000 Binary files a/motivation-letter.pdf and /dev/null differ diff --git a/notes b/notes index 956aa72..067f3e9 100644 --- a/notes +++ b/notes @@ -1,3 +1,4 @@ -* Remove figures from manuscript, but keep captions where they are, upload figures separately. +* Remove figures from manuscript, but keep captions where they are, upload figures separately. Rename Figure files as Fig1.png, etc. * Add table names, captions, and referneces to flextable function output -* Funding info does not go in acknowledgments: This work was supported by an Exchange Network grant from the US Environmental Protection Agency awarded to the Massachusetts Bays National Estuary Partnership, Grant No\. OS\-84029801\-0. \ No newline at end of file +* Funding info does not go in acknowledgments: This work was supported by an Exchange Network grant from the US Environmental Protection Agency awarded to the Massachusetts Bays National Estuary Partnership, Grant No\. OS\-84029801\-0. +* Replace mail symbol for corresponding author to asterisk \ No newline at end of file