-
-
Notifications
You must be signed in to change notification settings - Fork 26
/
Copy pathREADME.Rmd
256 lines (188 loc) · 7.13 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
---
title: "PARALLEL: Stata module for parallel computing"
author: ""
date: ""
output:
md_document:
variant: "markdown_github"
---
<!-- This file was originally used to create README.md, but is currently stale (README.md has been updated directlyed). -->
# PARALLEL: Stata module for parallel computing
Parallel lets you **run Stata faster**, sometimes faster than MP itself. By organizing your job in several Stata instances, parallel allows you to work with out-of-the-box parallel computing. Using the the `parallel` prefix, you can get **faster simulations, bootstrapping, reshaping big data**, etc. without having to know a thing about parallel computing. With **no need of having Stata/MP** installed on your computer, parallel has showed to dramatically speedup computations up to two, four, or more times depending on how many processors your computer has.
See also the HTML version of the program [help file](https://rawgit.com/gvegayon/parallel/master/ado/parallel.html).
Stata 2017 conference presentation: <https://github.com/gvegayon/parallel/blob/master/talks/20170727_stata_conference/20170727_stata_conference_handout.pdf>
SSC at Boston College: <http://ideas.repec.org/c/boc/bocode/s457527.html> (though the SSC version is a bit out-of-date, see below)
1. [Installation](#installation)
2. [Minimal examples](#minimal-examples)
2. [Authors](#authors)
Citation {#citation}
=======
When using `parallel`, please include the following:
Vega Yon GG, Quistorff B. parallel: A command for parallel computing. The Stata Journal. 2019;19(3):667-684. doi:10.1177/1536867X19874242
Or use the following bibtex entry:
```bib
@article{
VegaYon2019,
author = {George G. {Vega Yon} and Brian Quistorff},
title ={parallel: A command for parallel computing},
journal = {The Stata Journal},
volume = {19},
number = {3},
pages = {667-684},
year = {2019},
doi = {10.1177/1536867X19874242},
URL = {https://doi.org/10.1177/1536867X19874242},
eprint = {https://doi.org/10.1177/1536867X19874242}
}
```
Installation {#installation}
=======
If you have a previous installation of `parallel` installed from a different source (SSC, specific folder, specific URL) you should uninstall that first. Once installed it is suggested to restart Stata.
SSC
---
For accessing SSC version of parallel
``` stata
ssc install parallel, replace
mata mata mlib index
```
Development Version (Latest/Master)
--------------------------
For accessing the latest development version of parallel (from here) using Stata version \>=13
``` stata
net install parallel, from(https://raw.github.com/gvegayon/parallel/master/) replace
mata mata mlib index
```
For Stata version \<13, download as zip, unzip, and then replace the above `net install` with
``` stata
net install parallel, from(full_local_path_to_files) replace
```
Development Version (Other Releases)
------------------------------------
Access other development releases via the [Releases Page](https://github.com/gvegayon/parallel/releases). You can use the release tag to install over the internet. For example,
``` stata
net install parallel, from(https://raw.github.com/gvegayon/parallel/v1.15.8.19/) replace
mata mata mlib index
```
Or you can download the release and install locally (for Stata \<13).
Minimal examples {#minimal-examples}
===============
The following minimal examples have been written to introduce how to use the module. Please notice that the only examples actually designed to show potential speed gains are [parfor](#parfor) and [bootstrap](#bootstraping).
The examples have been executed on a Dell Vostro 3300 notebook running Ubuntu 14.04 with an Intel Core i5 CPU M 560 (2 physical cores) with 8Gb of RAM, using Stata/IC 12.1 for Unix (Linux 64-bit x86-64).
For more examples and details please refer to the module's help file or the wiki [Gallery page](https://github.com/gvegayon/parallel/wiki/Gallery).
```{r setup, echo=FALSE}
knitr::opts_chunk$set(autodep = TRUE, echo=FALSE, comment='')
```
## Simple parallelization of egen
When conducted over groups, parallelizing `egen` can be useful. In the following example we show how to use `parallel` with `by: egen`.
```{stata}
parallel setclusters 2, f
sysuse auto
parallel, by(foreign): egen maxp = max(price)
tab maxp
```
Which is the ``parallel'' way to do:
```{stata}
sysuse auto
bysort foreign: egen maxp = max(price)
tab maxp
```
## Bootstrapping {#examples-bootstrap}
In this example we'll evaluate a regression model using bootstrapping which, together with simulations, is one of the best ways to use parallel
```{stata}
sysuse auto, clear
parallel setclusters 4, f
timer on 1
parallel bs, reps(5000): reg price c.weig##c.weigh foreign rep
timer off 1
timer list
```
Which is the ``parallel way'' to do:
```{stata}
sysuse auto, clear
timer on 2
bs, reps(5000) nodots: reg price c.weig##c.weigh foreign rep
timer off 2
timer list
```
## Simulation
From the `simulate` stata command:
```{stata}
parallel setclusters 2, f
program define lnsim, rclass
version 12.1
syntax [, obs(integer 1) mu(real 0) sigma(real 1) ]
drop _all
set obs `obs'
tempvar z
gen `z' = exp(rnormal(`mu',`sigma'))
summarize `z'
return scalar mean = r(mean)
return scalar Var = r(Var)
end
parallel sim, expr(mean=r(mean) var=r(Var)) reps(10000): lnsim, obs(100)
summ
```
which is the parallel way to do
```{stata}
program define lnsim, rclass
version 12.1
syntax [, obs(integer 1) mu(real 0) sigma(real 1) ]
drop _all
set obs `obs'
tempvar z
gen `z' = exp(rnormal(`mu',`sigma'))
summarize `z'
return scalar mean = r(mean)
return scalar Var = r(Var)
end
simulate mean=r(mean) var=r(Var), reps(10000) nodots: lnsim, obs(100)
summ
```
## parfor {#examples-parfor}
In this example we create a short program (`parfor`) which is intended to work as a `parfor` program, this is, looping through 1/N in a parallel fashion
```{stata}
// Cleaning working space
clear all
timer clear
// Set up
set seed 123
local n = 5e6
set obs `n'
gen x = runiform()
gen y_pll = .
clonevar y_ser = y_pll
// Loop replacement function
prog def parfor
args var
forval i=1/`=_N' {
qui replace `var' = sqrt(x) in `i'
}
end
// Running the algorithm in parallel fashion
timer on 1
parallel setclusters 4, f
parallel, prog(parfor): parfor y_pll
timer off 1
// Running the algorithm in a serial way
timer on 2
parfor y_ser
timer off 2
// Is there any difference?
list in 1/10
gen diff = y_pll != y_ser
tab diff
// Comparing time
timer list
di "Parallel is `=round(r(t2)/r(t1),.1)' times faster"
```
Building {#building}
================
If you need to use `parallel` on an older version of Stata than what we build here, you can build and install the package locally.
You will need to install [Stata devtools](https://github.com/gvegayon/devtools) to build the package and `log2html` to build the html version of the help.
Then you can go to `ado/` and either `do compile.do` or `do compile_and_install.do` depending on whether you want to just build the package (`.mlib`) or also install. There are also several build build checks in the `makefile` that can easily be run from Linux.
Authors {#authors}
======
George G. Vega [aut,cre]
g.vegayon %at% gmail
Brian Quistorff [aut]
brian-work %at% quistorff . com