-
Notifications
You must be signed in to change notification settings - Fork 3
Create Index
Anil Shanbhag edited this page Sep 14, 2015
·
1 revision
Scenario 1: Input files are partitioned and distributed across the different machines. On each machine ensure that they are in the same directory on each machine. Check scripts/fabfile.py
. Adapt the code to point to the right directories. Run the following 3 commands.
fab bulk_sample_gen
fab create_robust_tree
fab write_partitions
Scenario 2: Input files are in HDFS. In this case, use the spark shell to sample the data and write to a filename sample
. Then run:
fab create_robust_tree
Writing out partitions by reading files from HDFS is currently unimplemented.