-
Notifications
You must be signed in to change notification settings - Fork 58
Faunus Graph
This is the documentation for Faunus 0.4.
Faunus was merged into Titan and renamed Titan-Hadoop in version 0.5.
Documentation for the latest Titan version is available at http://s3.thinkaurelius.com/docs/titan/current.
The source of any Faunus job is a FaunusGraph
. FaunusGraph
is simply a wrapper to a collection of Hadoop- and Faunus-specific configurations. Most importantly, it captures the location and type of the input graph and output graph. A FaunusGraph
is typically created using one of the FaunusFactory.open()
methods.
A Faunus configuration file is used to construct a FaunusGraph
. Assume a file named bin/faunus.properties
as represented below.
# input graph parameters
faunus.graph.input.format=com.thinkaurelius.faunus.formats.graphson.GraphSONInputFormat
faunus.input.location=graph-of-the-gods.json
# output data parameters
faunus.graph.output.format=com.thinkaurelius.faunus.formats.graphson.GraphSONOutputFormat
faunus.sideeffect.output.format=org.apache.hadoop.mapreduce.lib.output.TextOutputFormat
faunus.output.location=output
faunus.output.location.overwrite=true
With FaunusFactory
, a configuration file is turned in a FaunusGraph
. The toString()
of the FaunusGraph
denotes the input and output format of the graph. For instance, as seen below, a graph of type GraphSON is the input and a graph of type GraphSON is the output.
gremlin> g = FaunusFactory.open('bin/faunus.properties')
==>faunusgraph[graphsoninputformat->graphsonoutputformat]
A FaunusGraph
is loaded with Hadoop specific configuration information that is percolated from the master cluster configuration (e.g. set up during cluster construction) to various job level configurations.
gremlin> g.getConf()
==>keep.failed.task.files=false
==>io.seqfile.compress.blocksize=1000000
==>dfs.df.interval=60000
==>dfs.datanode.failed.volumes.tolerated=0
==>mapreduce.reduce.input.limit=-1
==>mapred.task.tracker.http.address=0.0.0.0:50060
==>mapred.userlog.retain.hours=24
==>dfs.max.objects=0
==>dfs.https.client.keystore.resource=ssl-client.xml
==>mapred.local.dir.minspacestart=0
...
Note, it is possible to provide a prefix to look for in FaunusGraph.getConf(String prefix)
.
gremlin> g.getConf('mapred')
==>mapred.disk.healthChecker.interval=60000
==>mapred.task.tracker.http.address=0.0.0.0:50060
==>mapred.userlog.retain.hours=24
==>mapred.local.dir.minspacestart=0
==>mapred.cluster.reduce.memory.mb=-1
==>mapred.reduce.parallel.copies=5
...
Within the global configuration, there are Faunus-specific configurations. These properties can be isolated with FaunusGraph.getConf('faunus')
. In general, any prefix string can be provided (e.g. mapred
or mapred.map
).
gremlin> g.getConf('faunus')
==>faunus.graph.input.format=com.thinkaurelius.faunus.formats.graphson.GraphSONInputFormat
==>faunus.input.location=graph-of-the-gods.json
==>faunus.graph.output.format=com.thinkaurelius.faunus.formats.graphson.GraphSONOutputFormat
==>faunus.sideeffect.output.format=org.apache.hadoop.mapreduce.lib.output.TextOutputFormat
==>faunus.output.location=output
==>faunus.output.location.overwrite=true
Moreover, FaunusGraph
provides getters/setters for mutating the most commonly used properties.
gremlin> g.setGraphOutputFormat(NoOpOutputFormat.class)
==>null
gremlin> g
==>faunusgraph[graphsoninputformat->noopoutputformat]
gremlin> g.getGraphOutputFormat()
==>class com.thinkaurelius.faunus.formats.noop.NoOpOutputFormat
gremlin> g.getProperties()
==>faunus.graph.input.format=com.thinkaurelius.faunus.formats.graphson.GraphSONInputFormat
==>faunus.input.location=graph-of-the-gods.json
==>faunus.graph.output.format=com.thinkaurelius.faunus.formats.noop.NoOpOutputFormat
==>faunus.sideeffect.output.format=org.apache.hadoop.mapreduce.lib.output.TextOutputFormat
==>faunus.output.location=output
==>faunus.output.location.overwrite=true
To conclude, a useful FaunusGraph
method is getNextGraph()
. This method generates a new FaunusGraph
that is the “inverse” of the current with the input formats and output locations reconfigured to support easy graph chaining.
gremlin> g = FaunusFactory.open('bin/faunus.properties')
==>faunusgraph[graphsoninputformat->graphsonoutputformat]
gremlin> h = g.getNextGraph()
==>faunusgraph[graphsoninputformat->graphsonoutputformat]
gremlin> h.getConf('faunus')
==>faunus.graph.input.format=com.thinkaurelius.faunus.formats.graphson.GraphSONInputFormat
==>faunus.input.location=output/job-1
==>faunus.graph.output.format=com.thinkaurelius.faunus.formats.graphson.GraphSONOutputFormat
==>faunus.sideeffect.output.format=org.apache.hadoop.mapreduce.lib.output.TextOutputFormat
==>faunus.output.location=output_
==>faunus.output.location.overwrite=true