-
Notifications
You must be signed in to change notification settings - Fork 13
VM hadoop setup instructions
Request VM from openstack.cern.ch
Install admin frontend das mongodb packages in a normal way how we install stuff on cmsweb VM [1]. Please note we need to use newest architecture, slc6_amd64_gcc493 which contains python 2.7. Here is what should be done in step 7 of instructions [1]:
(VER=HG1509a REPO="comp" A=/data/cfg/admin; ARCH=slc6_amd64_gcc493; cd /data; $A/InstallDev -A $ARCH -R comp@$VER -s image -v $VER -r comp=$REPO -p "admin frontend das mongodb backend")
I created /data/wma area and copied stuff from my lxplus.cern.ch:~valya/workspace/wma/ area over there
Create install area
mkdir -p /data/wma/usr/lib/python2.7/site-packages
Create setup.sh file
#!/bin/bash
source /data/srv/current/apps/das/etc/profile.d/init.sh
export JAVA_HOME=/usr/lib/jvm/java
#export PATH=$PATH:$PWD/mongodb/bin
export PYTHONPATH=$PYTHONPATH:/data/wma/usr/lib/python2.7/site-packages
Set-up your environment source setup.sh
, this will setup MongoDB, python 2.7, pymongo
Install pip [2] (optional)
curl https://bootstrap.pypa.io/get-pip.py > get-pip.py
python get-pip.py
Install java on VM
sudo yum install java-1.8.0-openjdk-devel.x86_64
Create /etc/yum.repos.d/cloudera.repo with the following content:
[cloudera]
gpgcheck=0
name=Cloudera
enabled=1
priority=15
baseurl=https://cern.ch/it-service-hadoop/yum/cloudera-cdh542
Install hadoop and yarn
sudo yum install hadoop-hdfs.x86_64 hadoop.x86_64 hive.noarch hadoop-libhdfs.x86_64
sudo yum install hadoop-hdfs-namenode.x86_64 hadoop-hdfs-datanode.x86_64
sudo yum install hadoop-yarn-nodemanager.x86_64
Configure hadoop [3, 4]
sudo cp -r /etc/hadoop/conf.empty /etc/hadoop/conf.my_cluster
sudo alternatives --install /etc/hadoop/conf hadoop-conf /etc/hadoop/conf.my_cluster 50
sudo alternatives --set hadoop-conf /etc/hadoop/conf.my_cluster
ls /etc/hadoop/conf.my_cluster/
sudo vim /etc/hadoop/conf.my_cluster/core-site.xml
Here is relevant part you should have in core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
Adjust hdfs-site.xml
sudo vim /etc/hadoop/conf.my_cluster/hdfs-site.xml
Create relevant part in hdfs-site.xml
<property>
<name>dfs.namenode.name.dir</name>
<value>file:///var/lib/hadoop-hdfs/cache/hdfs/dfs/name</value>
</property>
Format local HDFS
sudo -u hdfs hdfs namenode -format
Start HDFS
cd /etc/init.d/
sudo service hadoop-hdfs-datanode start
sudo service hadoop-hdfs-namenode start
Start Yarn manager (for MapReduce jobs):
sudo service hadoop-yarn-nodemanager start
Create some areas on HDFS
sudo -u hdfs hadoop fs -mkdir /tmp
sudo -u hdfs hadoop fs -chmod -R 1777 /tmp
sudo -u hdfs hadoop fs -mkdir /test
sudo -u hdfs hadoop fs -chmod -R 1777 /test
hadoop fs -ls /tmp
# now we ready to put anything to hadoop, e.g.
hadoop fs -put local_file /tmp
hadoop fs -ls /tmp
Install pydoop
cd /data/wma/soft/pydoop
python setup.py install --prefix=/data/wma/usr
Install avro
cd /data/wma/soft/avro-1.7.7
python setup.py install --prefix=/data/wma/usr
Install bz2file
cd /data/wma/soft
git clone [email protected]:nvawda/bz2file.git
python setup.py install --prefix=/data/wma/usr
Fetch WMCore framework
cd /data/wma
git clone [email protected]:dmwm/WMCore.git
Get WMArchive framework
cd /data/wma
git clone [email protected]:dmwm/WMArchive.git
Remove DAS from deploy area, otherwise it will be started
rm /data/srv/enabled/das
Adjust wmarch_config.py file if necessary. For time being we create static area and copy over there necessary web files:
mkdir /data/wma/WMArchive/data
cp -r WMArchive/src/css /data/wma/WMArchive/data
cp -r WMArchive/src/js /data/wma/WMArchive/data
cp -r WMArchive/src/images /data/wma/WMArchive/data
cp -r WMArchive/src/templates /data/wma/WMArchive/data
Check if app_wmarchive_*.conf files exists in /data/srv/current/config/frontend/
, if not copy those files over there:
sudo cp app_wmarchive_* /data/srv/current/config/frontend/
sudo chown _sw /data/srv/current/config/frontend/app_wmarchive_*
sudo chgrp _config /data/srv/current/config/frontend/app_wmarchive_*
Start WMArchive service
cd /data/wma
./run_wma.sh
Check if web server is running (WMArchive runs on port 8247):
curl http://localhost:8247/wmarchive/web/
On vocms013 we should be able to access the service via https request (once frontend is configured):
curl -k --key ~/.globus/userkey.pem --cert ~/.globus/usercert.pem https://vocms013.cern.ch/wmarchive/web/
At this point we're ready to insert data into WMArchive. Below two different approaches. One, to use testClient script and give it a json file:
WMArchive/test/python/testClient.py --json=WMArchive/test/data/fwjr_processing.json
STATUS 200 REASON OK
data {u'result': [u'{"ids": ["9d8bb0d3ddd54b6bc9158b5beb7eeb14", "cfe2a1feec3f0a5d708e5203a3870874", "1b6377ed38074e06da2a76f6efec7c35", "e436a319f6e9068bd8ceb285c0e3a0a3", "51027c9a760c8ac7617da956aaf74d20", "003f1046bbe24c17354848aef7ae611a", "4dac4a63d4612c7ba670437eb4629a46", "c3d7bf06561f59b6988bcc4c5a9b6697", "db7ac9841321ec550a8a68927f682c4e", "cd9f2c9c009ba465df97fa85872c7222"], "stype": "mongodb"}']} <type 'dict'>
Posted {u'result': [u'{"ids": ["9d8bb0d3ddd54b6bc9158b5beb7eeb14", "cfe2a1feec3f0a5d708e5203a3870874", "1b6377ed38074e06da2a76f6efec7c35", "e436a319f6e9068bd8ceb285c0e3a0a3", "51027c9a760c8ac7617da956aaf74d20", "003f1046bbe24c17354848aef7ae611a", "4dac4a63d4612c7ba670437eb4629a46", "c3d7bf06561f59b6988bcc4c5a9b6697", "db7ac9841321ec550a8a68927f682c4e", "cd9f2c9c009ba465df97fa85872c7222"], "stype": "mongodb"}']}
Or, we may run simple test with curl client (adjust ids if necessary):
curl -D /dev/stdout -X POST -H "Content-type: application/json" -d "{\"data\":{\"name\":1}}" http://localhost:8247/wmarchive/data/
curl -D /dev/stdout -H "Content-type: application/json" http://localhost:8247/wmarchive/data/eed35faf3b73d58157aa53d097899e8d
Here are some commands to use
# single document injection
curl -D /dev/stdout -X POST -H "Content-type: application/json" -d "{\"data\":{\"name\":1}}" http://localhost:8247/wmarchive/data/
# single document retrieval
curl -D /dev/stdout -H "Content-type: application/json" http://localhost:8247/wmarchive/data/eed35faf3b73d58157aa53d097899e8d
# multiple documents injection
curl -D /dev/stdout -X POST -H "Content-type: application/json" -d "{\"data\":[{\"name\":1}, {\"\name\":2}]}" http://localhost:8246/wmarchive/data/
# multiple documents retrieval
curl -D /dev/stdout -X POST -H "Content-type: application/json" -d "{\"query\":[\"eed35faf3b73d58157aa53d097899e8d\", \"bcee13403f554bc14f644ffdeaa93372\"]}" http://localhost:8246/wmarchive/data/
- https://cms-http-group.web.cern.ch/cms-http-group/tutorials/environ/vm-setup.html
- https://pip.pypa.io/en/stable/installing/
- https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/SingleCluster.html#Standalone_Operation
- http://www.cloudera.com/content/www/en-us/documentation/cdh/5-0-x/CDH5-Installation-Guide/cdh5ig_hdfs_cluster_deploy.html?scroll=topic_11_2