-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathacc_form.Rmd
327 lines (224 loc) · 10.9 KB
/
acc_form.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
---
output:
pdf_document: default
html_document: default
---
<!--HOW TO COMPLETE THIS FORM:-->
<!--
1. Checkboxes in this document appear as follows:
- [ ] This is a checkbox
To check a checkbox, replace [ ] by [x], as follows:
- [x] This is a checked checkbox
Note that current versions of RStudio for Mac (this will change with RStudio versions 1.3 and higher) will not create a formatted checkbox but will leave the original characters, i.e., literally "[ ]" or "[x]". It's fine to submit a PDF in this form.
2. For text answers, simply type the relevant text in the areas indicated. A blank line starts a new paragraph.
3. Comments (like these instructions) provide additional instructions throughout the form. There is no need to remove them; they will not appear in the compiled document.
4. If you are comfortable with Markdown syntax, you may choose to include any Markdown-compliant formatting in the form. For example, you may wish to include R code chunks and compile this document in R Markdown.
-->
---
title: "ACC Form"
author: "Juan C. Laria"
---
This form documents the artifacts associated with the article (i.e., the data and code supporting the computational findings) and describes how to reproduce the findings.
# Part 1: Data
- [ ] This paper does not involve analysis of external data (i.e., no data are used or the only data are generated by the authors via simulation in their code).
<!--
If box above is checked and if no simulated/synthetic data files are provided by the authors, please skip directly to the Code section. Otherwise, continue.
-->
- [x] I certify that the author(s) of the manuscript have legitimate access to and permission to use the data used in this manuscript.
<!-- If data are simulated using random number generation, please be sure to set the random number seed in the code you provide -->
## Abstract
<!--
Provide a short (< 100 words), high-level description of the data
-->
The original dataset contains an expression set on diffuse large B-cell lymphoma. It accompanies the BioNet packages as example data.
## Availability
- [x] Data **are** publicly available.
- [ ] Data **cannot be made** publicly available.
If the data are publicly available, see the *Publicly available data* section. Otherwise, see the *Non-publicly available data* section, below.
### Publicly available data
- [ ] Data are available online at:
- [x] Data are available as part of the paper’s supplementary material.
- [ ] Data are publicly available by request, following the process described here:
- [ ] Data are or will be made available through some other mechanism, described here:
<!-- If data are available by request to the authors or some other data owner, please make sure to explain the process of requesting access to the data. -->
### Non-publicly available data
<!--
The Journal of the American Statistical Association requires authors to make data accompanying their papers available to the scientific community except in cases where: 1) public sharing of data would be impossible, 2) suitable synthetic data are provided which allow the main analyses to be replicated (recognizing that results may differ from the "real" data analyses), and 3) the scientific value of the results and methods outweigh the lack of reproducibility.
Please discuss the lack of publicly available data. For example:
- why data sharing is not possible,
- what synthetic data are provided, and
- why the value of the paper's scientific contribution outweighs the lack of reproducibility.
-->
## Description
To pre-process the data, we selected the genes for which individual Cox scores, obtained after fitting univariate Cox regression models, were more significant than a certain threshold. After removing missing values, the data were composed of 190 observations, 78 genetic features, and one clinical variable, which is a factor variable with several levels.
The original data can be loaded with
```{r, eval = FALSE}
library(DLBCL)
data(exprLym)
```
Package `DLBCL` can be installed with
```{r, eval = FALSE}
if (!requireNamespace("BiocManager", quietly = TRUE)){
install.packages("BiocManager")
}
BiocManager::install("DLBCL")
```
Additional preprocessing to convert the data into standard `data.frame` format is optional.
```{r, eval = FALSE}
data <- t(exprs(exprLym))
pdata <- pData(exprLym)
dlbcl <- merge(data, pdata, "row.names")
row.names(dlbcl) <- dlbcl$Row.names
dlbcl$Row.names <- NULL
dlbcl$StatusAtFollowUp <- NULL
dlbcl$Status <- factor(dlbcl$Status + 0)
dlbcl$time <- dlbcl$FollowUpYears
dlbcl$FollowUpYears <- NULL
```
We made the post-processed data frame available from the Supplementary Materials.
### File format(s)
<!--
Check all that apply
-->
- [ ] CSV or other plain text.
- [x] Software-specific binary format (.Rda, Python pickle, etc.): pkcle
- [ ] Standardized binary format (e.g., netCDF, HDF5, etc.):
- [ ] Other (please specify):
### Data dictionary
<!--
A data dictionary provides information that allows users to understand the meaning, format, and use of the data.
-->
- [x] Provided by authors in the following file(s): `Section7_real/dlbcl_processed.RData`
- [ ] Data file(s) is(are) self-describing (e.g., netCDF files)
- [ ] Available at the following URL:
### Additional Information (optional)
<!--
OPTIONAL: Provide any additional details that would be helpful in understanding the data. If relevant, please provide unique identifier/DOI/version information and/or license/terms of use.
-->
# Part 2: Code
## Abstract
<!--
Provide a short (< 100 words), high-level description of the code. If necessary, more details can be provided in files that accompany the code.
-->
We provide both the R package `glasp` and the R scripts to replicate the Figures and Tables in the paper. They are also available at the github repositories `jlaria/glasp` and `jlaria/glasp-code`, respectively.
## Description
R package `glasp` can be installed directly from github with
```{r, eval = FALSE}
devtools::install_github("jlaria/glasp", dependencies = TRUE)
```
Installing `glasp` along with other extra R packages should be enough to replicate the Figures and Section 7 of the paper (Application to right-censored survival data).
However, the simulations in Sections 4 and 5 require many dependencies (some of them were removed from CRAN recently). To avoid impossible dependencies and configuration issues, we wrapped everything in a docker image, that can be pulled with
```
docker run -it jlaria/glasp:0.0.1
```
Additionally, if you have `vscode`, `docker` and the `ms-vscode-remote.remote-containers` extension for `vscode`, you can open the cloned repository `jlaria/glasp-code` in a remote container and `vscode` will automatically install the required dependencies. Additional documentation can be found [here](https://code.visualstudio.com/docs/remote/containers).
### Code format(s)
<!--
Check all that apply
-->
- [x] Script files
- [x] R
- [ ] Python
- [ ] Matlab
- [ ] Other:
- [x] Package
- [x] R
- [ ] Python
- [ ] MATLAB toolbox
- [ ] Other:
- [ ] Reproducible report
- [ ] R Markdown
- [ ] Jupyter notebook
- [ ] Other:
- [ ] Shell script
- [x] Other (please specify):
- [x] Dockerfile
### Supporting software requirements
#### Version of primary software used
<!--
(e.g., R version 3.6.0)
-->
`R version 3.6.3`
#### Libraries and dependencies used by the code
<!--
Include version numbers (e.g., version numbers for any R or Python packages used)
-->
Package `glasp_0.0.1` loads some extra libraries. A `sessionInfo()` reveals the dependencies.
```{r}
library(glasp)
sessionInfo()
```
### Supporting system/hardware requirements (optional)
<!--
OPTIONAL: System/hardware requirements including operating system with version number, access to cluster, GPUs, etc.
-->
The simulations were run using a standalone Spark cluster with several workstations as parallel backend. However, they can run locally as long as `sparklyr` is installed.
### Parallelization used
- [ ] No parallel code used
- [ ] Multi-core parallelization on a single machine/node
- Number of cores used:
- [x] Multi-machine/multi-node parallelization
- Number of nodes and cores used: 3 nodes, 14 cores
### License
- [ ] MIT License (default)
- [ ] BSD
- [x] GPL v3.0
- [ ] Creative Commons
- [ ] Other: (please specify below)
### Additional information (optional)
<!--
OPTIONAL: By default, submitted code will be published on the JASA GitHub repository (http://github.com/JASA-ACS) as well as in the supplementary material. Authors are encouraged to also make their code available in a public repository. If relevant, please provide unique identifier/DOI/version information.
-->
# Part 3: Reproducibility workflow
<!--
The materials provided should provide a straightforward way for reviewers and readers to reproduce analyses with as few steps as possible.
-->
## Scope
The provided workflow reproduces:
- [ ] Any numbers provided in text in the paper
- [x] All tables and figures in the paper
- [ ] Selected tables and figures in the paper, as explained and justified below:
## Workflow
### Format(s)
<!--
Check all that apply
-->
- [ ] Single master code file
- [ ] Wrapper (shell) script(s)
- [ ] Self-contained R Markdown file, Jupyter notebook, or other literate programming approach
- [ ] Text file (e.g., a readme-style file) that documents workflow
- [ ] Makefile
- [x] Other (more detail in *Instructions* below)
### Instructions
<!--
Describe how to use the materials provided to reproduce analyses in the manuscript. Additional details can be provided in file(s) accompanying the reproducibility materials.
-->
To replicate the Figures, run the following bash commands, respectively, inside the top level directory `glasp-code`.
> Due to hardware specs, it is expected that the results vary a little from the ones described in the paper.
```
Rscript Figures/fig1.R
Rscript Figures/fig2.R
Rscript Section7_real/real_surv.R
```
To replicate the simulation Tables, use the following.
> The following simulations might take some time to finish, depending on the hardware. They will span to use all the cores in the system. Please, use with caution. It is recommended to run this inside the docker container provided.
```
Rscript Section4_linear/main.R
Rscript Section5_cox/surv.R
```
### Expected run-time
Approximate time needed to reproduce the analyses on a standard desktop machine:
- [ ] < 1 minute
- [ ] 1-10 minutes
- [ ] 10-60 minutes
- [x] 1-8 hours
- [ ] > 8 hours
- [ ] Not feasible to run on a desktop machine, as described here:
### Additional information (optional)
<!--
OPTIONAL: Additional documentation provided (e.g., R package vignettes, demos or other examples) that show how to use the provided code/software in other settings.
-->
# Notes (optional)
<!--
OPTIONAL: Any other relevant information not covered on this form. If reproducibility materials are not publicly available at the time of submission, please provide information here on how the reviewers can view the materials.
-->