Create Index

Scenario 1: Input files are partitioned and distributed across the different machines. On each machine ensure that they are in the same directory on each machine. Check scripts/fabfile.py. Adapt the code to point to the right directories. Run the following 3 commands.

fab bulk_sample_gen
fab create_robust_tree
fab write_partitions

Scenario 2: Input files are in HDFS. In this case, use the spark shell to sample the data and write to a filename sample. Then run:

fab create_robust_tree

Writing out partitions by reading files from HDFS is currently unimplemented.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create Index

Clone this wiki locally