-
Notifications
You must be signed in to change notification settings - Fork 1
/
README.Rmd
264 lines (170 loc) · 14.5 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
---
output: github_document
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
```
# slimr <img src='man/figures/logo.png' style="float:right; height:278px;"/>
[https://rdinnager.github.io/slimr/](https://rdinnager.github.io/slimr/)
<!-- badges: start -->
[![R build status](https://github.com/rdinnager/slimr/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/rdinnager/slimr/actions)
[![slimr status badge](https://rdinnager.r-universe.dev/badges/slimr)](https://rdinnager.r-universe.dev/slimr)
<!-- badges: end -->
The goal of `slimr` is to run SLiM population genetics forward simulations from R. It also has utilities for monitoring the simulations, and bringing the resulting data into R for post-processing and visualization.
The package is described in detail in the following manuscript:
Dinnage, R., Sarre, S. D., Duncan, R. P., Dickman, C. R., Edwards, S. V., Greenville, A. C., Wardle, G. M., & Gruber, B. (2023). slimr: An R package for tailor-made integrations of data in population genomic simulations over space and time. Molecular Ecology Resources, 00, e13916. [https://doi.org/10.1111/1755-0998.13916](https://onlinelibrary.wiley.com/doi/10.1111/1755-0998.13916)
## Setup
To be able to use `slimr` productively it is important to have a working version of [SLiM](https://messerlab.org/slim/) and `slimr`.
### Installing slimr and SLiM
As a first step you need to install the package `slimr` from github.
```{r, eval=FALSE}
if (!require("devtools")) install.packages(devtools)
devtools::install_github("rdinnager/slimr") #downloads the latest version
```
`slimr` is also on R-Universe, and can be installed alternatively using the following code:
```{r eval=FALSE}
install.packages('slimr', repos = c('https://rdinnager.r-universe.dev', 'https://cloud.r-project.org'))
```
We are working to make `slimr` available on CRAN soon, but for now it is only available on github and R-Universe.
To facilitate the installation of SLiM, we provide a function ```slim_setup()```. This functions aims to install SLiM and automatically link `slimr` to it. Note that the default and recommended method of installation requires the package [`reticulate`](https://rstudio.github.io/reticulate/), so make sure you install that first of you want the smoothest possible installation. `slimr` takes advantage of `reticulate`'s interface to 'conda' to install SLiM, which should work on all platforms. Alternatively, if you are a Windows user, `slim_setup(method = 'binary')` will attempt to download a precompiled binary that we have provided via github. If this does not work, or you'd prefer to install SLiM yourself (perhaps to optimize performance by compiling with your own system's efficient libraries), please follow the excellent documentation on how to install SLiM in the SLiM manual, available on the SLiM website.
* [Windows](http://benhaller.com/slim/SLiM_Manual.pdf#page=64)
* [Linux](http://benhaller.com/slim/SLiM_Manual.pdf#page=59)
* [MacOS](http://benhaller.com/slim/SLiM_Manual.pdf#page=54)
## A first test run
The next step is to make sure `slimr` is "linked" to slim (by knowing the location of the folder where the slim program is installed.). To check you can try:
```{r}
slimr::slim_is_avail()
```
And if true is returned you are ready to go and run the test example below. In the case the test was not successful we need to find the path where the slim executable has been installed and set this path in the R environment accordingly.
For example if SLiM is installed at `C:/Program Files/R/R-4.3.1/library/slimr` then you need to specify this via:
```{r, eval=FALSE}
Sys.setenv(SLIM_PATH='C:/Program Files/R/R-4.3.1/library/slimr/slim.exe')
```
If you want to add this path permanently to your RStudio installation (so no need to run `Sys.setenv()` every time) you can use the command ``` usethis::edit_r_environ()``` and add a line like ```SLIM_PATH='C:/Program Files/R/R-4.3.1/library/slimr'``` to this file.
## Run a first simple test
To be able to run SLiM within slimr we need to create a simple slim script. Below is the `slimr` version of the first slim script as presented in the excellent [SLiM](https://messerlab.org/slim/) manual, called receipes. There are hundreds of more advanced script and we provide also several worked out examples in the manuscript (Russell et al. submitted, [bioRxiv](https://www.biorxiv.org/content/10.1101/2021.08.05.455258v2) ). The example below is simply to test if slim is installed correctly and works together with `slimr`.
Running this code should give you the output as below (a return value of zero means success).
```{r recipe_1, warning=FALSE}
library(slimr)
slim_script(
slim_block(initialize(),
{
## set the overall mutation rate
initializeMutationRate(1e-7);
## m1 mutation type: neutral
initializeMutationType("m1", 0.5, "f", 0.0);
## g1 genomic element type: uses m1 for all mutations
initializeGenomicElementType("g1", m1, 1.0);
## uniform chromosome of length 100 kb
initializeGenomicElement(g1, 0, 99999);
## uniform recombination along the chromosome
initializeRecombinationRate(1e-8);
}),
slim_block(1,
{
sim.addSubpop("p1", 500);
}),
slim_block(10000,
{
sim.simulationFinished();
})
) -> script_1
script_1
#run slim from within slimr
slim_run(script_1)
```
`slimr` shines because it can create so called templates to run scripts with different parameter settings. It also allows to "collect" the output from SLiM and load it back into R, so you can run a full simulation with lots of parameters settings using parallel cores if available and analyse the output directly in R.
See the vignettes for details of these features and how to use them.
## A complete workflow
Below is a simple "complete" workflow.
- Creating a script in R
- Run SLiM
- Load the results into R
- Analyse and visualise the results.
As an example we simply will run a single population of 33 individuals for 100 generations, with a single chromosome of (length=100000 bases), and high mutation rate (1e-5) [to shorten run-time]. Every 10 generation we will save the genetic status of each individual for 100 generations. Once completed we read back in the genotypes (as a genlight object) and count the number of loci every of the 10 generations (the number of loci increases due to the mutations, that occurs during the simulation.
```{r}
library(slimr)
# package adegenet as we are using genlight objects
if (!require(adegenet)) install.packages("adegenet")
library(adegenet)
slim_script(
slim_block (initialize(),
{
initializeMutationRate(1e-5);
## m1 mutation type: neutral
initializeMutationType("m1", 0.5, "f", 0.0);
## g1 genomic element type: uses m1 for all mutations
initializeGenomicElementType("g1", m1, 1.0);
## uniform chromosome of length 100 kb
initializeGenomicElement(g1, 0, 99999 );
## uniform recombination along the chromosome
initializeRecombinationRate(1e-8);
}),
slim_block(1, early(),
{
## create a population of 33 individuals
sim.addSubpop("p1", 33);
}),
slim_block(1, 100, late(),
{
#create an output
r_output(p1.genomes.output(), "p1", do_every=10);
}),
slim_block(100,late(),
{
sim.simulationFinished();
})
)->script_2
```
As before we setup a script, but this time we would like to run SLiM in parallel (does not make much sense as it is a single run only, but this is for demonstration purposes)
```{r}
library(future)
# we want to run SLiM using 5 local cores
# please note on Linux and Windows you would like to use multicore as this is faster as it is a 'real' fork,
# but multisession works on all systems
plan(multisession, workers = 5)
sr <- slim_run(script_2 , parallel = TRUE)
```
And finally we want to get the results back into R. The ```sr``` object has information on all the runs and output files and data and can be used to get the results directly back into R. Please note that SLiM will still produce all the output files it as if it would have been run from the command line, hence the user can also explore those files.
In a typical slimr workflow though the user wants to have the data loaded back into R and depending on the output (in this case we used the function `genomes.output()` to save) we can easily create ```genlight``` objects that hold the genomics data in a very memory efficient way. To do so we use the ```slim_extract_genlight``` function.
Before we use this function we may want to explore the ```sr``` object which is a tibble that holds all the information that was used during runs.
```{r}
str(sr)
sr$output_data
```
Now we can use the ```slim_extract_genlight``` function to convert the SLiM output files back into R, which creates a list of 10 genlight objects, one for each generation.
```{r}
gls<- slim_extract_genlight(sr, by=c("generation"))
gls
#the genlight object for generation 10
gls$genlight[[1]]
#number of loci at generation 10
nLoc(gls$genlight[[1]])
```
And finally we can use a simply ```lapply``` to find the number of loci in the genlight for each generation (e.g. the number of loci that are polymorphic).
```{r}
nloci <- unlist(lapply(gls$genlight, nLoc))
barplot(nloci, col=rainbow(length(gls$genlight)), names.arg = 1:10, xlab="generation",
ylab="# loci")
```
As we expect, this value gradually increases over time.
Within a simulation, it is also possible to output any file format or intermediate result in a variety of common and self-defined formats, (e.g VCF files, Fst values or a pedigree trees) and then extract the information using the workflow outlined above.
A very important feature, not shown here, is that you can create so called template scripts that have placeholders for parameters, that can be filled via a R data structure and all run all simuations via a single call to SLiM using ```slim_run()```. Please see the vignettes for such an example.
We believe that a simplified workflow will allow researchers to explore their parameter space in more detail and therefore, ultimately, will lead to a better understanding of the system under study.
Have fun using `slimr`!!!
Russell et al.
(Thanks to Bernd Gruber for making the above example script!)
## Code of Conduct
Please note that the `slimr` project is released with a [Contributor Code of Conduct](https://contributor-covenant.org/version/2/0/CODE_OF_CONDUCT.html). By contributing to this project, you agree to abide by its terms.
## A Note on Autocomplete
Autocomplete is supported for SLiM code used within `slimr`, though with some unavoidable limitations. More specifically, autocomplete is normally limited to the suggestions of the names of existing functions and arguments. This creates a complication for functions that are methods within SLiM classes (the majority of functions since SLiM is an object-oriented language). This is because the operator to access elements inside an object in SLiM is `.` (similar to Python), but in R `.` is not an operator, R assumes the `.` is part of the name of an object. This means if you type `sim.addSubpop`, R will not recognize `addSubpop` as a function to look up for autocompletion.
In `slimr`, there are two ways you can get around this limitation for accessing autocomplete for methods from within R. The simplest way that results in the most readable code is to use the `slim_load_globals()` function. When run this function will load object stubs for commonly used SLiM class instances into the R global environment. This includes the `sim` object, which is the main SLiMSim class instance used to track the simulation from within SLiM (or an instance of class Species in SLiM 4.0 or greater, where it represents the main species simulation in a single species simulation). It can also load as many numbered global variables used in SLiM as desired, named as in SLiM like `p1`, `p2`, `…`, `pn` for the first n Subpopulations, `g1`, `g2`, `…`, `gn` for GenomicElementTypes, etc (see documentation of `slim_load_globals()` for details). Functions (or properties) can then be accessed within these instances using the standard R `$` operator for accessing elements of a list. For convenience, slimr converts any `$` in SLiM code into `.`, allowing you to leave the code as is after autocompletion has been used.
The second method is less readable but avoids having to load otherwise unnecessary objects into your global environment. To use it, you type the object name followed by the R operator %.%, which is included in slimr, they then type the class name of the object (such as Genome, or SLiMSiM) and then using the `$` operator, methods and properties of that class can be accessed and autocompleted. An example would be `sim%.%Species$addSubpop()`. This is a little verbose so `slimr` also includes abbreviated versions of all SLiM classes. For Species the abbreviation is `Sp`, so you could type `sim%.%Sp$addSubpop()`. Just as for the other solution, `slimr` knows how to properly replace the above code with the correct SLiM code, which would be `sim.addSubpop()`. See the below figure for a screenshot of both the methods in action in the RStudio IDE.
<img src="man/figures/README-Figure_2.png" width="400">
*Caption: Screenshots of working with slimr autocomplete in RStudio, which requires some special consideration due to the object oriented programming style of SLiM. a) An example of autocomplete for SLiM code using the `slim_load_globals()` function which loads a set of objects that store what SLiM classes contain, in this case the sim object, which is a SLiM 4.0 Species class. This lets the user press tab to bring up the arguments of `addSubpop()` function, which is a method of Species. The lower panel shows that `slimr` automatically replaces the expression with the correct SLiM code. b) An example of using the alternative method for autocomplete by typing the name of an object (`sim`), followed by `%.%` and then the class name or an abbreviation of the class name (`Sp`), and then using `$` to access methods or properties of the class. Again, the lower panel shows how `slimr` correctly replaces the construction with valid SLiM code. By prefixing any of the methods or properties inside the `slimr` class objects with `?`, you can also bring up the full documentation of the method or property from the SLiM manual.*