-
Notifications
You must be signed in to change notification settings - Fork 58
Faunus Graph
The source of any Faunus job is a FaunusGraph
. FaunusGraph
is simply a wrapper to a collection of Hadoop configurations and some Faunus specific configurations. A FaunusGraph
is typically created using one of the FaunusFactory.open()
methods. However, it is possible to create a new FaunusGraph
and manually configure the graph.
A Faunus properties file such as below is used to construct a FaunusGraph
. Assume the file is named bin/faunus.properties
.
# input graph parameters
faunus.graph.input.format=com.thinkaurelius.faunus.formats.graphson.GraphSONInputFormat
faunus.input.location=graph-of-the-gods.json
# output data parameters
faunus.graph.output.format=com.thinkaurelius.faunus.formats.graphson.GraphSONOutputFormat
faunus.sideeffect.output.format=org.apache.hadoop.mapreduce.lib.output.TextOutputFormat
faunus.output.location=output
faunus.output.location.overwrite=true
With FaunusFactory
, a properties file is turned in a FaunusGraph
.
gremlin> g = FaunusFactory.open('bin/faunus.properties')
==>faunusgraph[graphsoninputformat]
As stated previously, a FaunusGraph
is loaded with Hadoop specific configuration information that is percolated from the master cluster configuration (e.g. set up during cluster construction) to various job level configurations.
gremlin> g.getConfiguration()
==>keep.failed.task.files=false
==>io.seqfile.compress.blocksize=1000000
==>dfs.df.interval=60000
==>dfs.datanode.failed.volumes.tolerated=0
==>mapreduce.reduce.input.limit=-1
==>mapred.task.tracker.http.address=0.0.0.0:50060
==>mapred.userlog.retain.hours=24
==>dfs.max.objects=0
==>dfs.https.client.keystore.resource=ssl-client.xml
==>mapred.local.dir.minspacestart=0
...
Within this configuration, there are Faunus-specific configurations called properties. These properties can be isolated with FaunusGraph.getProperties()
.
gremlin> g.getProperties()
==>faunus.graph.input.format=com.thinkaurelius.faunus.formats.graphson.GraphSONInputFormat
==>faunus.input.location=graph-of-the-gods.json
==>faunus.graph.output.format=com.thinkaurelius.faunus.formats.graphson.GraphSONOutputFormat
==>faunus.sideeffect.output.format=org.apache.hadoop.mapreduce.lib.output.TextOutputFormat
==>faunus.output.location=output
==>faunus.output.location.overwrite=true
Moreover, FaunusGraph
provides getters/setters for mutating these properties.
gremlin> g.setGraphOutputFormat(NoOpOutputFormat.class)
==>null
gremlin> g.getGraphOutputFormat()
==>class com.thinkaurelius.faunus.formats.noop.NoOpOutputFormat
gremlin> g.getProperties()
==>faunus.graph.input.format=com.thinkaurelius.faunus.formats.graphson.GraphSONInputFormat
==>faunus.input.location=graph-of-the-gods.json
==>faunus.graph.output.format=com.thinkaurelius.faunus.formats.noop.NoOpOutputFormat
==>faunus.sideeffect.output.format=org.apache.hadoop.mapreduce.lib.output.TextOutputFormat
==>faunus.output.location=output
==>faunus.output.location.overwrite=true
To conclude, a useful FaunusGraph
method is getNextGraph()
. This generates a new FaunusGraph
that is the “inverse” of the current with the input formats and output locations reconfigured to allow for simple graph chaining.
gremlin> g = FaunusFactory.open('bin/faunus.properties')
==>faunusgraph[graphsoninputformat]
gremlin> h = g.getNextGraph()
==>faunusgraph[graphsoninputformat]
gremlin> h.getProperties()
==>faunus.graph.input.format=com.thinkaurelius.faunus.formats.graphson.GraphSONInputFormat
==>faunus.input.location=output/job-1
==>faunus.graph.output.format=com.thinkaurelius.faunus.formats.graphson.GraphSONOutputFormat
==>faunus.sideeffect.output.format=org.apache.hadoop.mapreduce.lib.output.TextOutputFormat
==>faunus.output.location=output_
==>faunus.output.location.overwrite=true