Skip to content

Commit

Permalink
added partially complete EMR documentation
Browse files Browse the repository at this point in the history
  • Loading branch information
Ryan Zotti authored and Ryan Zotti committed Oct 15, 2016
1 parent cb57797 commit 31e3ec2
Showing 1 changed file with 32 additions and 0 deletions.
32 changes: 32 additions & 0 deletions Sparking_Water_EMR_instructions.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
## Instructions

Log into AWS and go to the EMR console. Create an EMR Spark cluster. I chose emr-4.7.1, since it comes with Spark 1.6.1, which is compatible with the latest version of H2O's Sparkling Water. I chose 8 m3.xlarge instances, which right now are $0.27 per instance per hour. Make sure you select security groups that let you ssh into the servers.

Run thes

# Log in as Hadoop user and make ec2-user directories
# If you don't do this step your Spark code will immediately fail with permission issues
sudo su
su hadoop
hadoop fs -mkdir -p /user/ec2-user
hadoop fs -chown ec2-user /user/ec2-user

# Now back to being ec2-user

# Download Sparking Water to /home/ec2-user/
wget http://h2o-release.s3.amazonaws.com/sparkling-water/rel-1.6/8/sparkling-water-1.6.8.zip

# Unzip the file
unzip sparkling-water-1.6.8.zip

sudo pip install h2o_pysparkling_1.6
sudo pip install tabulate
sudo pip install six
sudo pip install future

export SPARK_HOME=/usr/lib/spark
export HADOOP_CONF_DIR=/etc/hadoop/conf
export MASTER="yarn-client"

# Start up pysparking
/home/ec2-user/sparkling-water-1.6.8/bin/pysparkling

0 comments on commit 31e3ec2

Please sign in to comment.