Performance Tuning

Hadoop is a complex piece of software with a variegation of components including a distributed file system, a distributed computing framework with job trackers, data nodes, and numerous simultaneously running JVM instances. With any complex software environment, there are tunings that can be employed to ensure both efficient use of space (network bandwidth, hard drive, memory, etc.) and time (object creation, combiners, in-memory combiners, etc.). This section presents various tricks to Hadoop/Faunus that can be used to tune a Faunus jobs and Faunus MapReduce extensions.

Faunus Specific Tunings

Use sequence files for repeated analyses: The Hadoop sequence file is the most optimal file format for Faunus. If repeated analysis is going to be done on a graph, then it is beneficial to generate a sequence file representation of that graph in HDFS. This file can then be the input for repeated analyses. This is as simple as running the Faunus script g.V._() with the following faunus.properties.

faunus.graph.output.format.class=org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat
faunus.data.output.location=graph.dat

Useful Blog Posts

Below is a collection of blog posts that discuss tips and tricks for Hadoop.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance Tuning

Faunus Specific Tunings

Useful Blog Posts

Clone this wiki locally