Faunus Graph

This is the documentation for Faunus 0.4.
Faunus was merged into Titan and renamed Titan-Hadoop in version 0.5.
Documentation for the latest Titan version is available at http://s3.thinkaurelius.com/docs/titan/current.

The source of any Faunus job is a FaunusGraph. FaunusGraph is simply a wrapper to a collection of Hadoop- and Faunus-specific configurations. Most importantly, it captures the location and type of the input graph and output graph. A FaunusGraph is typically created using one of the FaunusFactory.open() methods.

FaunusGraph Construction

A Faunus configuration file is used to construct a FaunusGraph. Assume a file named bin/faunus.properties as represented below.

# input graph parameters
faunus.graph.input.format=com.thinkaurelius.faunus.formats.graphson.GraphSONInputFormat
faunus.input.location=graph-of-the-gods.json
# output data parameters
faunus.graph.output.format=com.thinkaurelius.faunus.formats.graphson.GraphSONOutputFormat
faunus.sideeffect.output.format=org.apache.hadoop.mapreduce.lib.output.TextOutputFormat
faunus.output.location=output
faunus.output.location.overwrite=true

With FaunusFactory, a configuration file is turned in a FaunusGraph. The toString() of the FaunusGraph denotes the input and output format of the graph. For instance, as seen below, a graph of type GraphSON is the input and a graph of type GraphSON is the output.

gremlin> g = FaunusFactory.open('bin/faunus.properties')
==>faunusgraph[graphsoninputformat->graphsonoutputformat]

Hadoop-Specific Configurations

A FaunusGraph is loaded with Hadoop specific configuration information that is percolated from the master cluster configuration (e.g. set up during cluster construction) to various job level configurations.

gremlin> g.getConf()    
==>keep.failed.task.files=false
==>io.seqfile.compress.blocksize=1000000
==>dfs.df.interval=60000
==>dfs.datanode.failed.volumes.tolerated=0
==>mapreduce.reduce.input.limit=-1
==>mapred.task.tracker.http.address=0.0.0.0:50060
==>mapred.userlog.retain.hours=24
==>dfs.max.objects=0
==>dfs.https.client.keystore.resource=ssl-client.xml
==>mapred.local.dir.minspacestart=0
...

Note, it is possible to provide a prefix to look for in FaunusGraph.getConf(String prefix).

gremlin> g.getConf('mapred')
==>mapred.disk.healthChecker.interval=60000
==>mapred.task.tracker.http.address=0.0.0.0:50060
==>mapred.userlog.retain.hours=24
==>mapred.local.dir.minspacestart=0
==>mapred.cluster.reduce.memory.mb=-1
==>mapred.reduce.parallel.copies=5
...

Faunus Properties

Within the global configuration, there are Faunus-specific configurations. These properties can be isolated with FaunusGraph.getConf('faunus'). In general, any prefix string can be provided (e.g. mapred or mapred.map).

gremlin> g.getConf('faunus')        
==>faunus.graph.input.format=com.thinkaurelius.faunus.formats.graphson.GraphSONInputFormat
==>faunus.input.location=graph-of-the-gods.json
==>faunus.graph.output.format=com.thinkaurelius.faunus.formats.graphson.GraphSONOutputFormat
==>faunus.sideeffect.output.format=org.apache.hadoop.mapreduce.lib.output.TextOutputFormat
==>faunus.output.location=output
==>faunus.output.location.overwrite=true

Moreover, FaunusGraph provides getters/setters for mutating the most commonly used properties.

gremlin> g.setGraphOutputFormat(NoOpOutputFormat.class)
==>null
gremlin> g
==>faunusgraph[graphsoninputformat->noopoutputformat]
gremlin> g.getGraphOutputFormat()
==>class com.thinkaurelius.faunus.formats.noop.NoOpOutputFormat
gremlin> g.getProperties()       
==>faunus.graph.input.format=com.thinkaurelius.faunus.formats.graphson.GraphSONInputFormat
==>faunus.input.location=graph-of-the-gods.json
==>faunus.graph.output.format=com.thinkaurelius.faunus.formats.noop.NoOpOutputFormat
==>faunus.sideeffect.output.format=org.apache.hadoop.mapreduce.lib.output.TextOutputFormat
==>faunus.output.location=output
==>faunus.output.location.overwrite=true

Chaining Graphs

To conclude, a useful FaunusGraph method is getNextGraph(). This method generates a new FaunusGraph that is the “inverse” of the current with the input formats and output locations reconfigured to support easy graph chaining.

gremlin> g = FaunusFactory.open('bin/faunus.properties')
==>faunusgraph[graphsoninputformat->graphsonoutputformat]
gremlin> h = g.getNextGraph()
==>faunusgraph[graphsoninputformat->graphsonoutputformat]
gremlin> h.getConf('faunus')
==>faunus.graph.input.format=com.thinkaurelius.faunus.formats.graphson.GraphSONInputFormat
==>faunus.input.location=output/job-1
==>faunus.graph.output.format=com.thinkaurelius.faunus.formats.graphson.GraphSONOutputFormat
==>faunus.sideeffect.output.format=org.apache.hadoop.mapreduce.lib.output.TextOutputFormat
==>faunus.output.location=output_
==>faunus.output.location.overwrite=true

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Faunus Graph

This is the documentation for Faunus 0.4.
Faunus was merged into Titan and renamed Titan-Hadoop in version 0.5.
Documentation for the latest Titan version is available at http://s3.thinkaurelius.com/docs/titan/current.

FaunusGraph Construction

Hadoop-Specific Configurations

Faunus Properties

Chaining Graphs

Clone this wiki locally

Faunus Graph

This is the documentation for Faunus 0.4. Faunus was merged into Titan and renamed Titan-Hadoop in version 0.5. Documentation for the latest Titan version is available at http://s3.thinkaurelius.com/docs/titan/current.

FaunusGraph Construction

Hadoop-Specific Configurations

Faunus Properties

Chaining Graphs

Clone this wiki locally

This is the documentation for Faunus 0.4.
Faunus was merged into Titan and renamed Titan-Hadoop in version 0.5.
Documentation for the latest Titan version is available at http://s3.thinkaurelius.com/docs/titan/current.