vignette updates

SPATIAL-Lab · Jan 11, 2024 · e72a832 · e72a832
1 parent af4942a
commit e72a832
Showing 1 changed file with 25 additions and 19 deletions.
diff --git a/vignettes/foram_intro.Rmd b/vignettes/foram_intro.Rmd
@@ -30,17 +30,21 @@ library(BPER)
 *****
 # Loading Proxy Data
 
-Now that the **BPER** package has been installed and loaded, load proxy data. Proxy data can be loaded into the model in a variety of formats. For example, you can directly load a data frame R object, or provide a path from the working directory to a .csv or .xlsx file. The function `load_foram` should be used to load and check the proxy data. 
+Now that the **BPER** package has been installed and loaded, load proxy data. Proxy data can be loaded into the model in a variety of formats. For example, you can directly load a data frame R object, or provide a path from the working directory to a .csv or .xlsx file. The function `load_foram` should be the first function used and will load and check the proxy data. 
 
 The input data should be formatted with 8 columns and n row(s), with columns ordered as: age, d11B, d11B2s, d11Bspec, Mg/Ca, Mg/Ca2s, d18O, d18O2s. The column order matters. Where '...2s' indicates 2 sigma uncertainties. 
 
 You must indicate which modern species' d11B vital effects are to be applied for each row of d11B data - this goes in the d11Bspec column. Use abbreviations including: `Grub` (*G. ruber*), `Tsac` (*T. sacculifer*), `Ouni` (*O. universa*), `custom` (specify values below using `priors_foram_adj` function), or `borate`.
 
-Now let's run the `load_foram` function on the input data and save the output as an object called `lf_out` (this will be a list) in the global environment. Here, we'll use low resolution PETM data as an example, and load it directly as a data frame. But, you can also load from a .csv or .xlsx. Simply provide the file path from the directory in quotes for the `foram_data` argument in the `load_foram` function instead of a data frame. For example, when you use this function on other data stored as .csv, you would use this code: 
+Here, we'll use low resolution ODP Site 1209 PETM data as an example, and load it directly as a data frame. To load from a .csv or .xlsx, simply provide the file path from the directory in quotes for the `foram_data` argument in the `load_foram` function instead of a data frame. 
+
+For example, when you use this function on other data stored as .csv, you would use this code snippet: 
 
 `lf_out <- load_foram(foram_data = "path/to/your/file.csv")`
 
-Blanks or NAs are okay in your data - you do not need to have every measurement for each sample. You can have repeat sample ages for replicates etc., and you may provide d11B data from multiple species (just indicate which modern species', or `custom`, calibration you'd like to use for each row in d11Bspec). 
+Blanks or NAs are okay in your data - you do not need to have every measurement for each sample. You can have repeat sample ages for replicates etc., and you may provide d11B data from multiple species. Just indicate which modern species', or `custom`, calibration you'd like to use for each row in d11Bspec. 
+
+Now let's run the `load_foram` function on the input data and save the output as an object called `lf_out` (this will be a list) in the global environment. 
 ```{r, loadData}
 x <- data.frame(c(55956, 55956, 55954, 55932, 55926, 55924, 55901, 55899), 
                 c(14.61, 15.46, NA, 14.49, 14.11, NA, 14.46, 13.94), 
@@ -58,20 +62,20 @@ lf_out <- load_foram(foram_data = x)
 *****
 # Age Indexing Proxy Data 
 
-Next, take the returned list from `load_foram`, which we called `lf_out` and use it for the `load_proxy` argument in the `age_index` function. Specify: the `age_units` used in the input data set (`kyr` or `Myr`), the time step type (`step_type`): `regular`, `every`, `both` or `custom`. 
+Next, take the returned list from `load_foram`, which we called `lf_out` and use it for the `load_proxy` argument in the `age_index` function. Specify: the `age_units` used in the input data set (`kyr` or `Myr` are the only two options), the time step type (`step_type`): `regular`, `every`, `both` or `custom`. 
 
 ## Time step options
 
-* For `regular`: regularly spaced intervals for time steps. If 'regular' is used, you must specify a value for the `step_int` argument (i.e., step interval; in the same units indicated for `age_units`). 
+* `regular`: regularly spaced intervals for time steps. If 'regular' is used, you must specify a value for the `step_int` argument (i.e., step interval; in the same units indicated for `age_units`). 
 
-* For `every`: there will be a time step at each age for which there is data. No further arguments are needed. 
+* `every`: there will be a time step at each age for which there is data. No further arguments are needed. 
 for every unique age in the data set. 
 
-* For `both`: there will be a time step at each age for which there is data plus regular spaced time steps. `step_int` must be specified for if `both` is used. 
+* `both`: there will be a time step at each age for which there is data plus regular spaced time steps. `step_int` must be specified for if `both` is used. 
 
-* For `custom`: The user provides a vector of ages for which they want time steps using the `cust_steps` argument in the function. 
+* `custom`: The user provides a vector of ages for which they want time steps using the `cust_steps` argument in the function. 
 
-Note that this information, and all the documentation on the functions and arguments within the functions can be easily accessed by typing `?<function name>` in the command line (e.g., type `?age_index`). 
+Note that this information, and all the documentation on the functions and arguments within the functions can be easily accessed by typing `?<function name>` in the command line. For example, type `?age_index` into the command line. If using R studio, you should see the function documentation pop up in the help window.
 
 For our example, let's use `regular` for `step_type` with a 10 kyr step interval. This function returns a list. Let's call that list `ai_out`. 
 ```{r, ageIndex}
@@ -84,9 +88,10 @@ ai_out <- age_index(load_proxy = lf_out, age_units = 'kyr', step_type = 'regular
 
 ## Adjusting parameter values and priors used in the MCMC inversion
 
-Many of the parameter values, including those that define priors, can be revised using `parms_foram_adjust` function. There are two arguments in this function, `parms2change` and `change_values`. For `parms2change` provide a vector of character strings for the parameters you would like to adjust from default values. The user-changeable parameters are listed in the help documentation of this function (type `?parms_foram_adj`). For `change_values` argument, provide a vector of numeric values which correspond to the parameter names in `parms2change`. 
+Many of the parameter values, including those that define priors, can be revised using `parms_foram_adjust` function. There are two arguments in this function, `parms2change` and `change_values`. For `parms2change` provide a vector of character strings for the parameters you would like to adjust from default values. For `change_values` argument, provide a vector of numeric values which correspond to the parameter names in `parms2change`. 
+
+The user-changeable parameters are listed in the help documentation of this function (type `?parms_foram_adj`). You can also take a look at all of the default values for all of the user-accessible parameters by opening up one of the package system files. To do this, type in the console: 
 
-You can take a look at all of the default values for all of the user-accessible parameters by opening up one of the package system files. Just type in the console: 
 `file.edit(system.file("extdata", "parms_in_foram.R", package = "BPER"))`
 
 To demonstrate how to adjust these parameters, we will change the parameters which define the prior distribution of the % of secondary calcite for the diagenetic correction on d18O.
@@ -100,21 +105,22 @@ parms_foram_adj <- parms_foram_adj(parms2change = parm_names, change_values = pa
 
 ## Loading parameter values and priors used in the MCMC inversion
 
-Next, we will load in any adjustments to default parameter and prior values. You may load in the adjusted parameters using the `parms_forams_adj` argument in the `priors_foram` function OR do not include this argument to maintain default parameter values. 
+Next, load in any adjustments to default parameter and prior values. You may load in the adjusted parameters using the `parms_foram_adj` argument in the `priors_foram` function. This is what we will do for this example, however, you can leave this argument empty to maintain the default model parameter values. 
 
 Include the `age_index` return object (here we called it `ai_out`) in arguments. 
 
-Specify which 2nd carbonate chem variable to use in CO2 calculations. Type `?priors_forams` into the console for list of options. 
+Specify which 2nd carbonate chemistry variable to use in CO2 calculations. Type `?priors_foram` into the console for list of options. 
 
 We will also need to specify the type of prior to use for the 2nd carbonate chemistry variable with the `cc2ndparm.pt` argument:
 
-You can choose `t1`, where the user specifies the prior distribution which will be sampled at time step 1, and this distribution is allowed to shift for subsequent time steps following an autocorrelated time series model.
+* You can choose `t1`, where the user specifies the prior distribution which will be sampled at time step 1, and this distribution is allowed to shift for subsequent time steps following an autocorrelated time series model.
 
-Or you can choose `ts`, where the user supplies a time series record of their choosing which will be used to define the prior distribution at each time step. If 'ts' is selected, the user must include time series data object for 'cc2ndparmTS' argument consisting of 3 columns: age, mean value for 2nd carbonate chemistry variable, and sd for 2nd carbonate chemistry variable. This record should span the data interval, but values do not need to be included for every time step as the record will be linearly interpolated to create a time-continuous record. 
+* Or you can choose `ts`, where the user supplies a time series record of their choosing which will be used to define the prior distribution at each time step. If `ts` is selected, the user must include time series data object for `cc2ndparmTS` argument consisting of 3 columns: age, mean value for 2nd carbonate chemistry variable, and 1sd for 2nd carbonate chemistry variable. This record should span the data interval, but values do not need to be included for every time step as the record will be linearly interpolated to create a time-continuous record. 
 
-Here, let's demonstrate using a DIC record for the second carbonate chemistry variable. Let's indicate `ts` for `cc2ndparm.pt`, and supply an input a data.frame containing a DIC record which will be used to define the DIC prior at each time step. Let's run the 'priors_foram' function and save the returned object as pf_out. Note that `priors_foram` can take several minutes to run.
-```{r, loadPriors}
+Here, let's demonstrate using DIC for the second carbonate chemistry variable. Let's indicate `ts` for `cc2ndparm.pt`, and supply an input data.frame containing a DIC record which will be used to define the DIC prior at each time step. Here, we use a carbon cycle model result, and call it `dic_pri`. 
 
+Let's run the `priors_foram` function and save the returned object as `pf_out`. Note that `priors_foram` can take several minutes to run as it includes MCMC inversions to generate modern foraminiferal d11Bf vs. d11B of borate calibrations.
+```{r, loadPriors}
 dic_pri <- data.frame(c(56200, 55934, 55928, 55919, 55900, 55883), 
                       c(2320, 2328, 2405, 2440, 2481, 2500), 
                       c(150, 150, 150, 150, 150, 150)) 
@@ -132,7 +138,7 @@ pf_out <- priors_foram(parms_foram_adj = parms_foram_adj,
 
 Now we will run the MCMC inversion by supplying the `priors_foram` return object (`pf_out` from above). Specify a vector of character strings indicating the parameters you would like saved in the MCMC output (`save.parms` argument). Also indicate the number of iterations (`n.iter`), the number of chains (`n.chains`; recommendation is to use from 3 to 9 chains depending on the application), the number of burn-in iterations (`n.burnin`; should be less than `n.iter`) and the amount of thinning (`n.thin`; i.e., saves every nth iteration). 
 
-Here, we will use a much lower number for iterations and burn-in than is recommended so that we can more quickly run the function for demonstration purposes. Even with lower iterations, the model can take several minutes to initialize and run. For this particular model, it is recommended to use on the order of hundreds of thousands of iterations, which will take several hours. 
+Here, we will use a much lower number for iterations and burn-in than is recommended so that we can more quickly run the function for demonstration purposes. Even with lower iterations, the model can take several minutes to initialize and run. For this particular model, it is recommended to use on the order of hundreds of thousands of iterations, which may take several hours to days depending on the amount of proxy data included and computational capability. 
 ```{r, runInversion}
 inv_out <- run_inversion(priors = pf_out, 
                          save.parms = c("sal", "tempC", "xca", "xmg", "xso4", "d11Bsw", "d18Osw", "d18Osw.local","pco2", "dic", "pH", "press"), 
@@ -169,7 +175,7 @@ Note that the chains (each chain is a different color) in these plots show poor
 *****
 # Plotting Results
 
-You can use the `post_plot` function to generate a time series plot of the posterior distributions for a time-dependent parameter. For `type` argument, the user can select either `CI` to plot the 95% credible interval, or `draws` for random draws of time series posteriors. 500 random draws is deafault, but this can be adjusted using the `n.draws` argument. 
+You can use the `post_plot` function to generate a time series plot of the posterior distributions for a time-dependent parameter. For `type` argument, the user can select either `CI` to plot the 95% credible interval, or `draws` for random draws of time series posteriors. 500 random draws is default, but this can be adjusted using the `n.draws` argument. 
 
 Let's try out both types of time series plots using our pCO2 posteriors:
 ```{r, tsPlots}