-
Notifications
You must be signed in to change notification settings - Fork 1
/
README
402 lines (286 loc) · 13.8 KB
/
README
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
INTRODUCTION:
This describes how to create and maintain the ModENCODE GBrowse public
EC2 image. Most of this is vanilla; the only complexity arises in the
management of the disk images. The document is divided into several
parts:
1. Initialization of the Virtual Machine
2. Installation of GBrowse
3. Data marshalling and synchronization
4. Data loading
5. Increasing volume sizes
6. Removing unneeded EBS volumes
1) Initialization of the Virtual Machine
These instructions are pretty generic. The only subtlety is the use of
logical volumes and RAID to overcome Amazon's 1 TB EBS volume
limitation and to improve performance. We first use RAID0 to combine
two EBS volumes into a single disk array, thereby increasing disk
throughput and decreasing latency. We then build logical volumes on
top of one or more RAIDs using the logical volume manager. This gives
us the flexibility to increase volume size as the modENCODE dataset
grows. Finally, we build XFS filesystems on top of the logical volumes
because of XFS's ability to resize while mounted as well as good
performance characteristics.
------------------------------------------------------------------------
1. Create the virtual machine
Find a recent version of Ubuntu's 64-bit AMI. I used the Maverick
10.10 amd64 server image (ami-cef405a7). Launch it as a "m1.large"
machine (you can get better performance with m1.xlarge, but large is
pretty good. Make sure to assign a security group that has both the
SSH and HTTP ports open!
------------------------------------------------------------------------
2. Install the requisite disk software
You need MDADM, LVM2, and XFS packages installed.
apt-get install mdadm
apt-get install lvm2
apt-get install xfsprogs
It also helps to have euca2ools running, to give command-line access
to EC2:
apt-get install euca2ools
For convenience, create a .eucarc file containing the environment
variables EC2_ACCESS_KEY, EC2_SECRET_KEY and EC2_URL (very important!).
------------------------------------------------------------------------
3. Create the first set of volumes.
For testing purposes, I initially created two RAIDs and then combined
them together into a single logical volume group. You do not need to
do it this way:
zone=`curl http://169.254.169.254/latest/meta-data/placement/availability-zone`
(please be sure to choose the zone in which the current
instance is residing!):
# euca-create-volume --size 500 --zone $zone
VOLUME vol-47325b2a 500 creating 2011-12-21T19:55:41.000Z
# euca-create-volume --size 500 --zone $zone
VOLUME vol-31325b5c 500 creating 2011-12-21T19:55:59.000Z
After the volumes have settled, attach them to the current instance:
instance=`curl http://169.254.169.254/latest/meta-data/instance-id`
euca-attach-volume -i $instance -d /dev/sdg1 vol-47325b2a
euca-attach-volume -i $instance -d /dev/sdg2 vol-31325b5c
Wait for attachments to complete, then create the first RAID
mdadm --create --verbose /dev/md0 --level=0 -c256 --raid-devices=2 /dev/sdg1 /dev/sdg2
mdadm --detail --scan | sed s/=00/=0/ >> /etc/mdadm/mdadm.conf
If all goes well, there will be a new block device called
/dev/md0. The last step, which adds the information on the
newly-created RAID to mdadm.conf, is not strictly needed, but helps
document the configuration in case things get messed up at some time
in the future.
Create a new volume group containing it:
pvcreate /dev/md0
vgcreate vg0 /dev/md0
This will create a volume group named "vg0".
To add additional space to the volume group you can repeat with
another RAID:
# euca-create-volume --size 500 --zone $zone
VOLUME vol-98329f22 500 creating 2011-12-21T19:55:41.000Z
# euca-create-volume --size 500 --zone $zone
VOLUME vol-22c92898 500 creating 2011-12-21T19:55:59.000Z
euca-attach-volume -i $instance -d /dev/sdh1 vol-98329f22
euca-attach-volume -i $instance -d /dev/sdh2 vol-31325b5c
mdadm --create --verbose /dev/md1 --level=0 -c256 --raid-devices=2 /dev/sdh1 /dev/sdh2
mdadm --detail --scan | sed s/=00/=0/ >> /etc/mdadm/mdadm.conf
pvcreate /dev/md1
vgextend vg0 /dev/md1
After this step, volume group vg0 has 2 TB in capacity, contributed in
equal parts by the RAID volumes /dev/md0 and /dev/md1
Now we can create as many logical volumes as needed. I created two,
one for the browser flat files, and one for the mysql databases.
lvcreate -L 1T -n lv0 vg0
blockdev --setra 65536 /dev/vg0/lv0
mkfs.xfs /dev/vg0/lv0
mkdir /modencode/browser_data
mount -o noatime /dev/vg0/lv0 /modencode/browser_data/
chown ubuntu /modencode/browser_data/
lvcreate -L 65G -n lv1 vg0
blockdev --setra 65536 /dev/vg0/lv1
mkfs.xfs /dev/vg0/lv1
mkdir /modencode/browser_data/mysql
mount -o noatime /dev/vg0/lv1 /modencode/browser_data/mysql
chown mysql.mysql /modencode/browser_data/mysql
Note that we've got a log of unused disk capacity in vg0 (we can
display it using the vgdisplay command). We can grow the logical
volumes and their XFS filesystems at any point in the future.
We're going to relocate the mysql databases from the image root onto
/modencode/browser_data/mysql using a mount trick:
sudo /etc/init.d/mysql stop
sudo rm -rf /var/lib/mysql/* # you saw this right!
mount /modencode/browser_data/mysql /var/lib/mysql -o bind,rw
mysql_install_db
sudo /etc/init.d/mysql start
mysqladmin -u root password 'modencode'
mysql -e 'grant select on *.* to nobody@localhost'
Last, but not least, record the filesystems into /etc/fstab:
/dev/vg0/lv0 /modencode/browser_data xfs noatime 0 2
/dev/vg0/lv1 /modencode/mysql xfs noatime 0 2
/modencode/mysql /var/lib/mysql none rw,bind 0 0
Optionally, change the readahead buffer at boot time for the two
logical volumes. This modestly increases database performance:
/etc/rc.local:
# tune the logical volumes for better read performance
blockdev --setra 65536 /dev/vg0/lv0
blockdev --setra 65536 /dev/vg0/lv1
------------------------------------------------------------------------
3. Install GBrowse
This has gotten much easier recently:
apt-get install gbrowse
The version installed in Ubuntu 10.10 is 2.39. If you wish to get the
bleeding edge version (which has performance and feature
improvements), follow the directions at
http://gmod.org/wiki/GBrowse_2.0_HOWTO.
You may wish to make sure that the gbrowse user_accounts database is
initialized to allow for logins. This is probably not needed, but won't hurt:
sudo mkdir /var/www/conf/user_accounts
sudo chown www-data /var/www/conf/user_accounts/
gbrowse_metadb_config.pl
------------------------------------------------------------------------
4. Data marshalling and synchronization
These steps need to be performed on modencode.oicr.on.ca. The trick is
to mirror the browser datasets onto the virtual machine in an
efficient manner.
On modencode.oicr.on.ca create a staging directory for what will be
copied to AWS. Using the scripts at
https://github.com/lstein/modENCODE-GBrowse-Cloud, run the following
command:
dump_databases.pl
This writes SQL database dumps into the directory
/browser_data/mysql_dumps_new. Note that it does not update
/browser_data/mysql_dumps, which is created automatically by a cron
job, and doesn't capture all the databases needed for the mirror.
The next step figures out what data files are needed for the mirror
and creates a directory of links in preparation for an rsync:
extract_gbrowse_binary_filenames.pl | clean_and_tally.pl |\
create_link_dir.pl 2>&1 | tee file_sizes.txt
After this step, standard error (and file_sizes.txt) will contain a
list of the volume sizes needed, and a directory named "browser_data"
in the current directory contains a series of symbolic links to the
files that need to be transferred to the AWS instance. Confirm that
there is enough sufficient disk capacity on the AWS instance, and if
necessary, grow the file systems using the recipe in "Increasing
volume sizes".
The next part is pretty annoying because the modencode machine doesn't
have outgoing ssh access, and we have to tunnel it. First find an OICR
machine that has outgoing SSH access. I used xfer.res.oicr.on.ca for
this purpose.
Now create a new ssh keypair on this machine:
ssh-keygen -f MyPrivateKey
This will generate the private key "MyPrivateKey" and the public key
"MyPrivateKey.pub".
Now append the contents of MyPrivateKey.pub to the AWS instance's
.ssh/authorized_keys file. I think I did a cut-and-paste between
terminals!
Copy the private key to modencode.oicr.on.ca, since this machine does
not share its home directory.
Now, on the xfer machine, set up the tunnel:
ssh -f -R12345:xx-xx-xx-xx.compute-ec2.amazon.com:22 modencode.oicr.on.ca sleep 1000
You will need to replace the xx-xx-xx-xx part with the correct DNS name
for the AWS instance.
Log into modencode.oicr.on.ca, change into the directory that contains
the "browser_data", and run the following bizarro command:
rsync -Ravz --copy-links -e'ssh -o "StrictHostKeyChecking no" \
-iMyPrivateKey -p12345 -lubuntu' ./browser_data localhost:/modencode/
If you do not wish to type it out, this command is found in the shell
script transfer.sh in the git distribution.
It is a good idea to run the rsync in a "screen" session to avoid
accidental hangups. Depending on how much incremental data there is to
transfer, this may run for several days. We see about 10 GB/hour (20
mb/s).
------------------------------------------------------------------------
4. Data loading
Once the data is transferred to the AWS instance, you will load the
databases, configuration files and reload mysql. All these steps occur
on the AWS instance:
First move the configuration files into place.
cd /modencode/browser_data/conf
tar cf - * | (cd /etc/gbrowse2; sudo tar xvf -)
cd /etc/gbrowse2
find . -name '*gz' -exec sudo gunzip -f {} \;
Second, load the MYSQL databases:
load_mysql.pl # found in the GIT repository
Third, restart the web server:
sudo /etc/init.d/apache2 restart
------------------------------------------------------------------------
5. Increasing volume sizes
If you need to increase the size of one of the data volumes, it is
relatively easy to do.
First, you may wish to snapshot the current instance. This will make
it easier to restore the system if you make a mistake.
Unmount the volume that you will be resizing, and stop services
that depend on it. I prefer to stop everything:
# /etc/init.d/mysql stop
# /etc/init.d/apache2 stop
# umount /dev/vg0/lv1
# umount /dev/vg0/lv0
Determine whether you already have sufficient unused capacity in the
volume group:
# vgdisplay vg0
--- Volume group ---
VG Name vg0
System ID
Format lvm2
Metadata Areas 2
Metadata Sequence No 28
VG Access read/write
VG Status resizable
MAX LV 0
Cur LV 2
Open LV 2
Max PV 0
Cur PV 2
Act PV 2
VG Size 1.37 TiB
PE Size 4.00 MiB
Total PE 358398
Alloc PE / Size 341453 / 1.30 TiB
Free PE / Size 16945 / 66.19 GiB
VG UUID jIxAi6-0tfc-drJX-6AXY-lpnE-0amE-ohHLkT
The relevant line of output is "Free PE / Size". In this example, we
have 66 GB free. If this is sufficient, we can simply allocate some of
it to the appropriate logical volume. This example adds 20G extra to
logical volume lv0. You can also specify an absolute size to grow the
volume to.
# lvextend -L +20G /dev/vg0/lv0
Then tell XFS to grow the filesystem to fit the capacity of the
volume:
# mount /dev/vg0/lv0 /modencode/browser_data
# xfs_growfs /modencode/browser_data
If you do not have sufficient capacity in the volume group, then you
will need to create and add a new EBS volume to it. Although not
necessary, I recommend you do the RAID striping trick again in order
to get better I/O performance:
# euca-create-volume --size 200 --zone us-east-1c
VOLUME vol-47325b2a 200 creating 2011-12-21T19:55:41.000Z
# euca-create-volume --size 200 --zone us-east-1c
VOLUME vol-31325b5c 200 creating 2011-12-21T19:55:59.000Z
# euca-attach-volume --instance i-7a41761a --device /dev/sdj1 vol-47325b2a
# euca-attach-volume --instance i-7a41761a --device /dev/sdj2 vol-31325b5c
# mdadm --create --verbose /dev/md2 --level=0 -c256 --raid-devices=2 /dev/sdj1 /dev/sdj2
mdadm: array /dev/md2 started.
# mdadm --detail --scan | sed s/=00/=0/ >> /etc/mdadm/mdadm.conf)
# pvcreate /dev/md2
Physical volume "/dev/md2" successfully created
# vgextend vg0 /dev/md2
Volume group "vg0" successfully extended
# lvextend -L +400G /dev/vg0/lv0
# mount /dev/vg0/lv0 /modencode/browser_data
# xfs_growfs /modencode/browser_data
Remount the other volume if you need to, and restart services.
------------------------------------------------------------------------
6. Removing unneeded EBS volumes
If you add capacity to the volume group in small increments as shown
in the previous section, you may end up with multiple smallish EBS
volumes RAIDed together and wish to consolidate them into a smaller
number of large volumes. You can do this by first adding a large
volume as described in the previous section, and then removing and
inactivating the smaller ones as shown in the following steps.
Turn off Apache and Mysql:
# /etc/init.d/apache2 stop; /etc/init.d/mysql stop
Unmount the volumes (important!)
# umount /modencode/browser_data
# umount /modencode/mysql
Move all data off the RAID you are planning to decomission:
# pvmove /dev/md1
Remove this RAID from the volume group:
# vgreduce vg0 /dev/md1
Turn off the RAID:
# mdadm --stop /dev/md1
Now edit /etc/mdadm/mdadm.conf to remove references to /dev/md1.
After this, you can use the Amazon console (or euca2ools) to detach
and destroy the underlying EBS volumes. Make sure you know which ones
to remove!