-
Notifications
You must be signed in to change notification settings - Fork 58
Performance Tuning
okram edited this page Aug 3, 2012
·
29 revisions
Hadoop is a complex piece of software with a variegation of components including a distributed file system, a distributed computing framework with job trackers, data nodes, and numerous simultaneously running JVM instances. With any complex software environment, there are tunings that can be employed to ensure both efficient use of space (network bandwidth, hard drive, memory, etc.) and time (object creation, combiners, in-memory combiners, etc.). This section presents various tricks to Hadoop/Faunus that can be used to tune a Faunus jobs and Faunus MapReduce extensions.
-
Use sequence files for repeated analyses: The Hadoop sequence file is the most optimal file format for Faunus. If repeated analysis is going to be done on a graph, then it is beneficial to generate a sequence file representation of that graph in HDFS. This file can then be the input for repeated analyses. This is as simple as running the Faunus script
g.V._()
with the followingfaunus.properties
.
faunus.graph.output.format.class=org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat
faunus.data.output.location=graph.dat
Below is a collection of blog posts that discuss tips and tricks for Hadoop.