Skip to content

Faunus Graph

vadasg edited this page Nov 7, 2012 · 13 revisions

The source of any Faunus job is a FaunusGraph. FaunusGraph is simply a wrapper to a collection of Hadoop configurations and some Faunus specific configurations. A FaunusGraph is typically created using one of the FaunusFactory.open() methods. However, it is possible to create a new FaunusGraph and manually configure the graph.



FaunusGraph Construction

A Faunus properties file such as below is used to construct a FaunusGraph. Assume the file is named bin/faunus.properties.

# input graph parameters
faunus.graph.input.format=com.thinkaurelius.faunus.formats.graphson.GraphSONInputFormat
faunus.input.location=graph-of-the-gods.json
# output data parameters
faunus.graph.output.format=com.thinkaurelius.faunus.formats.graphson.GraphSONOutputFormat
faunus.sideeffect.output.format=org.apache.hadoop.mapreduce.lib.output.TextOutputFormat
faunus.output.location=output
faunus.output.location.overwrite=true

With FaunusFactory, a properties file is turned in a FaunusGraph.

gremlin> g = FaunusFactory.open('bin/faunus.properties')
==>faunusgraph[graphsoninputformat]

Hadoop-Specific Configurations

As stated previously, a FaunusGraph is loaded with Hadoop specific configuration information that is percolated from the master cluster configuration (e.g. set up during cluster construction) to various job level configurations.

gremlin> g.getConfiguration()    
==>keep.failed.task.files=false
==>io.seqfile.compress.blocksize=1000000
==>dfs.df.interval=60000
==>dfs.datanode.failed.volumes.tolerated=0
==>mapreduce.reduce.input.limit=-1
==>mapred.task.tracker.http.address=0.0.0.0:50060
==>mapred.userlog.retain.hours=24
==>dfs.max.objects=0
==>dfs.https.client.keystore.resource=ssl-client.xml
==>mapred.local.dir.minspacestart=0
...

Faunus Properties

Within this configuration, there are Faunus-specific configurations called properties. These properties can be isolated with FaunusGraph.getProperties().

gremlin> g.getProperties()        
==>faunus.graph.input.format=com.thinkaurelius.faunus.formats.graphson.GraphSONInputFormat
==>faunus.input.location=graph-of-the-gods.json
==>faunus.graph.output.format=com.thinkaurelius.faunus.formats.graphson.GraphSONOutputFormat
==>faunus.sideeffect.output.format=org.apache.hadoop.mapreduce.lib.output.TextOutputFormat
==>faunus.output.location=output
==>faunus.output.location.overwrite=true

Moreover, FaunusGraph provides getters/setters for mutating these properties.

gremlin> g.setGraphOutputFormat(NoOpOutputFormat.class)
==>null
gremlin> g.getGraphOutputFormat()
==>class com.thinkaurelius.faunus.formats.noop.NoOpOutputFormat
gremlin> g.getProperties()       
==>faunus.graph.input.format=com.thinkaurelius.faunus.formats.graphson.GraphSONInputFormat
==>faunus.input.location=graph-of-the-gods.json
==>faunus.graph.output.format=com.thinkaurelius.faunus.formats.noop.NoOpOutputFormat
==>faunus.sideeffect.output.format=org.apache.hadoop.mapreduce.lib.output.TextOutputFormat
==>faunus.output.location=output
==>faunus.output.location.overwrite=true

Chaining Graphs

To conclude, a useful FaunusGraph method is getNextGraph(). This generates a new FaunusGraph that is the “inverse” of the current with the input formats and output locations reconfigured to allow for simple graph chaining.

gremlin> g = FaunusFactory.open('bin/faunus.properties')
==>faunusgraph[graphsoninputformat]
gremlin> h = g.getNextGraph()
==>faunusgraph[graphsoninputformat]
gremlin> h.getProperties()
==>faunus.graph.input.format=com.thinkaurelius.faunus.formats.graphson.GraphSONInputFormat
==>faunus.input.location=output/job-1
==>faunus.graph.output.format=com.thinkaurelius.faunus.formats.graphson.GraphSONOutputFormat
==>faunus.sideeffect.output.format=org.apache.hadoop.mapreduce.lib.output.TextOutputFormat
==>faunus.output.location=output_
==>faunus.output.location.overwrite=true
Clone this wiki locally