-
Notifications
You must be signed in to change notification settings - Fork 116
Gremlin Extension
Gremlin is a graph-based traversal language developed for property graphs. In combination with Rexster, Gremlin allows users to execute ad-hoc computations on the graph backend.
Gremlin is exposed through Rexster as an Extension and scripts may be executed via the REST API or through the Gremlin Console in The Dog House.
Through Gremlin, its possible, amongst other things, to perform the following tasks:
- Add/delete vertices and edges from the graph.
- Manipulate the graph indices.
- Search for elements of a graph.
- Load graph data from a file or URL.
- Make use of JUNG algorithms.
- Make use of SPARQL queries over OpenRDF-based graphs.
- and much, much more.
In general, using the GremlinExtension
provided with Rexster, various graph management tasks can be accomplished.
The Gremlin Extension is exposed on the following extension point options: graphs, vertices and edges which means it is available on the following URIs:
http://localhost:8182/graphs/{graph}/tp/gremlin
http://localhost:8182/graphs/{graph}/vertices/{id}/tp/gremlin
http://localhost:8182/graphs/{graph}/edges/{id}/tp/gremlin
The difference among these URIs is the context within which the Gremlin session is initialized with graph variables. When simply accessing Gremlin from the graph ExtensionPoint
, the Gremlin session is given access to the requested graph. When accessed from the vertex or edge ExtensionPoint
, the requested vertex or edge is pushed into the session with the graph.
Therefore, given the following URI:
http://localhost:8182/graphs/tinkergraph/tp/gremlin?script=g.v(1)
Rexster will take the requested tinkergraph
and pass it to the Gremlin script engine in the context of the variable g
. Similarly, the vertex and edge resource will pass a v
and e
variable, respectively, to the script engine for the requested vertex or edge.
For a vertex,
http://localhost:8182/graphs/tinkergraph/vertices/1/tp/gremlin?script=v.out()
Rexster will respond with:
{
"results":[
{"_id":"2","_type":"vertex","name":"vadas","age":27},
{"_id":"3","_type":"vertex","name":"lop","lang":"java"},
{"_id":"4","_type":"vertex","name":"josh","age":32}
],
"success":true,
"version":"*.*",
"queryTime":5.963338
}
For an edge,
http://localhost:8182/graphs/tinkergraph/edges/11/tp/gremlin?script=e.inV
Rexster will respond with:
{
"results":[
{"_id":"3","_type":"vertex","name":"lop","lang":"java"}
],
"success":true,
"version":"*.*",
"queryTime":5.963338
}
By default, Rexster uses the groovy
flavor of Gremlin for processing scripts. It is possible to specify other flavors of Gremlin with the language
parameter (Note: No other Gremlin flavors are exposed at this time. This feature is for future compatibility).
The Gremlin Extension does not require any specific configuration beyond including it in the <allows>
section of the <extensions>
element of rexster.xml
. The Gremlin Extension is in the TinkerPop namespace called tp
and its name is gremlin
. Therefore, the configuration would look something like this:
<graph>
<graph-name>tinkergraph</graph-name>
<graph-type>tinkergraph</graph-type>
<graph-file>data/graph-example-1.xml</graph-file>
<extensions>
<allows>
<allow>tp:gremlin</allow>
</allows>
<extension>
<namespace>tp</namespace>
<name>gremlin</name>
<configuration>
<scripts>script-directory</scripts>
<allow-client-script>true</allow-client-script>
<cache-scripts>true</cache-scripts>
</configuration>
</extension>
</extensions>
</graph>
The <configuration>
element is an optional element which contains several settings. The first is the <scripts>
element, which represents a directory that Rexster can access for Gremlin script files. The script files must be suffixed with a .gremlin
extension for Rexster to find them. You can read more about how to utilize these script files below.
The second setting is the <allow-client=script>
option controls whether or not scripts passed to the script
parameter are executed or not. This value is true
by default when the configuration is not present.
The third setting is the cache-scripts>
element. When set to true
(the default if the configuration setting is not present), the Gremlin Extension will read the script file once the first time it is referenced and keep it in memory for future calls. If the value is false
, the Gremlin Extension will read the script each time a call is made where the script is referenced. Scripts are cached globally across graph configuration. In other words, if two graphs both reference the same script directory as the location for Gremlin scripts, the script will only be cached once.
To see the full API of the GremlinExtension
service, simply call the service without any query parameters.
http://localhost:8182/graphs/gratefulgraph/tp/gremlin
The returned JSON is provided below.
{
"message": "no scripts provided",
"api": {
"description": "evaluate an ad-hoc Gremlin script for a graph.",
"parameters": {
"rexster.returnKeys": "an array of element property keys to return (default is to return all element properties)",
"rexster.showTypes": "displays the properties of the elements with their native data type (default is false)",
"load": "a list of 'stored procedures' to execute prior to the 'script' (if 'script' is not specified then the last script in this argument will return the values",
"returnTotal": "when set to true, the full result set will be iterated and the results returned (default is false)"
"rexster.offset.end": "end index for a paged set of data to be returned",
"rexster.offset.start": "start index for a paged set of data to be returned",
"params": "a map of parameters to bind to the script engine",
"language": "the gremlin language flavor to use (default is groovy)",
"script": "the Gremlin script to be evaluated"
}
},
"success": false
}
The rexster.offset.start
and rexster.offset.end
parameters allow gremlin results to paged. The two parameters represent the respective indexes that tell the Gremlin Extension what records to return. Without paging, the following URI will return all results.
http://localhost:8182/graphs/tinkergraph/tp/gremlin?script=g.v(1).outE
{
"results":[
{"_id":"7","_type":"edge","_label":"knows","_inV":"2","_outV":"1","weight":0.5},
{"_id":"9","_type":"edge","_label":"created","_inV":"3","_outV":"1","weight":0.4},
{"_id":"8","_type":"edge","_label":"knows","_inV":"4","_outV":"1","weight":1}
],
"success":true,
"version":"*.*",
"queryTime":6.423103
}
Specifying the rexster.offset.start
alone will return all values starting from the value of that index to the end of the result set.
http://localhost:8182/graphs/tinkergraph/tp/gremlin?script=g.v(1).outE&rexster.offset.start=1
{
"results":[
{"_id":"9","_type":"edge","_label":"created","_inV":"3","_outV":"1","weight":0.4},
{"_id":"8","_type":"edge","_label":"knows","_inV":"4","_outV":"1","weight":1}
],
"success":true,
"version":"*.*",
"queryTime":6.423103
}
Specifying the rexster.offset.end
alone will return all values starting from the beginning of the list to the value of the end offset.
http://localhost:8182/graphs/tinkergraph/tp/gremlin?script=g.v(1).outE&rexster.offset.end=1
{
"results":[
{"_id":"7","_type":"edge","_label":"knows","_inV":"2","_outV":"1","weight":0.5}
],
"success":true,
"version":"*.*",
"queryTime":6.423103
}
Specifying both the rexster.offset.start
and rexster.offset.end
will return just those results that exist between those two indexes.
http://localhost:8182/graphs/tinkergraph/tp/gremlin?script=g.v(1).outE&rexster.offset.end=2&rexster.offset.start=1
{
"results":[
{"_id":"9","_type":"edge","_label":"created","_inV":"3","_outV":"1","weight":0.4}
],
"success":true,
"version":"*.*",
"queryTime":6.423103
}
Appending the returnTotal
parameter to the request will force iteration of all results to get a count of the total values to be returned as a count
in the resulting JSON. If an offset is specified, then serialization will only occur on those results within the offset, but the iteration will not break early when the end offset is reached.
The following request returns the first vertex in `g.V` and the addition of returnTotal
includes the count
of 6
in the results:
http://localhost:8182/graphs/tinkergraph/tp/gremlin?script=g.V&returnTotal=true&rexster.offset.end=1
{
"count":6,
"results":[{"name":"lop","lang":"java","_id":"3","_type":"vertex"}],
"success":true,
"version":"*.*",
"queryTime":8.957588
}
Having the count
can be useful for paging applications, where knowledge of the total result count helps drive user interface components. Of course, it comes with the cost of having to iterate all results, so therefore would not be recommended for use on very large result sets.
The rexster.returnKeys
parameter allows one to specify how to construct a JSON object representation of a returned Element
(i.e. Vertex
or Edge
). All elements are returned as JSON objects with the properties identified by the rexster.returnKeys
array being what is included in the JSON representation. The wildcard *
denotes to return all properties of the element.
http://localhost:8182/graphs/tinkergraph/tp/gremlin?script=g.idx(%22vertices%22)[[name:%22marko%22]]&rexster.returnKeys=[age]
{
"results":[
{"_id":"1","_type":"vertex","age":29}
],
"success":true,
"version":"*.*",
"queryTime":7.547388
}
Parameters can be passed to the Gremlin Extension through the params
argument. These key/values pairs become bindings to the script engine that can then be used in the script itself.
curl -v -X POST -d '{"params":{"list":[1,2,3],"text":"test"},"script":"[list[0]+list[1]+list[2], text]"}' -H "Content-Type:application/json" http://localhost:8182/graphs/tinkergraph/tp/gremlin
{
"results": [
6,
"test"
],
"success": true,
"version": "*.*",
"queryTime": 3547.145821
}
To use the Rexster type system for parameters to be placed in the bindings be sure to set rexster.showTypes
to true
, as it will control that data type parsing operation.
It is possible to get access to the Bindings
object itself by using the ScriptContext
:
curl "http://localhost:8182/graphs/tinkergraph/tp/gremlin?params.x=1&script=this.context.getBindings(javax.script.ScriptContext.ENGINE_SCOPE).containsKey('x')"
{"results":[true],"success":true,"version":"*.*","queryTime":125.693432}
curl "http://localhost:8182/graphs/tinkergraph/tp/gremlin?params.x=1&script=this.context.getBindings(javax.script.ScriptContext.ENGINE_SCOPE).containsKey('y')"
{"results":[false],"success":true,"version":"*.*","queryTime":126.561443}
This bit of the above script submitted to Rexster:
this.context.getBindings(javax.script.ScriptContext.ENGINE_SCOPE)
returns the set of Bindings. It is then possible to access the map of parameters sent to Rexster. Accessing the Bindings
is useful when it is necessary to check if a parameter was passed in the first place.
It is possible to execute “stored procedures” by utilizing the load
parameter. The load parameter accepts an array of Gremlin script file names that must exist in the configured scripts
directory of rexster.xml
(as part of the Gremlin configuration described above). Those script files are executed in order prior to the execution of the Gremlin script passed on the script
argument. In the event that no script
parameter is supplied or if <allow-client-script>
is set to false, the return value of the Gremlin Extension ends up being the output of the final script passed to load
parameter.
The benefit of storing these scripts on the server is that globally reusable functions and algorithms can be centralized and protected on the server. In doing this, it makes it possible to keep heavy logic on the server (which may amount to many lines of Gremlin) and simply reference those functions from the script
argument.
For example, add two server-side script files to the directory configured in rexster.xml
:
- vertices.gremlin
- filter-age.gremlin
The vertices.gremlin
script simply contains a script that puts the list of vertices into a variable, called vx
, in the script engine context:
vx = g.V;
The filter-age.gremlin
script contains a generic function that filters a list of vertices.
def filterOver30(vertices) {
return vertices.filter{it.age > 30}
}
You can then make use of these functions with the following REST API call:
http://localhost:8182/graphs/tinkergraph/tp/gremlin?script=filterOver30(vx)&load=[vertices,filter-age]
The URI references both scripts as an array in the load
parameter (note that the files are referenced by name without the .gremlin
extension). The script
argument calls the filterOver30
function and passes in the list of vertices established by the vertices.gremlin
script. This call returns the following:
{
"results": [
{
"name": "peter",
"age": 35,
"_id": "6",
"_type": "vertex"
},
{
"name": "josh",
"age": 32,
"_id": "4",
"_type": "vertex"
}
],
"success": true,
"version": "*.*",
"queryTime": 12.299851
}
Rexster tries to serialize results to JSON in a way that is most like the structures they originate from and uses the GraphSON for the serialization of graph elements. This approach works well in most cases but there are scenarios where the limits of JSON prevent a mirror copy of results.
First, complex objects outside of Map
, List
, and those items handled by GraphSON will be converted via toString
. In other words, a value that is a Date
or UUID
will simply be converted to a string in the JSON.
Second, Map
objects that contain keys that are graph elements have a different serialization structure. JSON must have keys that are of type String
. Therefore, using a graph element for a key, which is a common pattern when using groupCount
or groupBy
operations in Gremlin, needs to have a bit more verbosity.
Note the Map
structure in the following example from Rexster Console:
rexster[groovy]> g = rexster.getGraph("tinkergraph")
==>mocktinkertransactionalgraph[vertices:6 edges:6 directory:data/graph-example-1]
rexster[groovy]> g.V.out.groupCount.cap
==>{v[3]=3, v[2]=1, v[5]=1, v[4]=1}
The key in the Map
is the vertex that was grouped on and the value is the “count” or number of times that vertex was traversed.
When executing this Gremlin query through the Gremlin Extension:
http://localhost:8182/graphs/tinkergraph/tp/gremlin?script=g.V.out.groupCount.cap
the results look a bit different:
{
"results": [
{
"2": {
"_value": 1,
"_key": {
"name": "vadas",
"age": 27,
"_id": "2",
"_type": "vertex"
}
},
"3": {
"_value": 3,
"_key": {
"name": "lop",
"lang": "java",
"_id": "3",
"_type": "vertex"
}
},
"4": {
"_value": 1,
"_key": {
"name": "josh",
"age": 32,
"_id": "4",
"_type": "vertex"
}
},
"5": {
"_value": 1,
"_key": {
"name": "ripple",
"lang": "java",
"_id": "5",
"_type": "vertex"
}
}
}
],
"success": true,
"version": "*.*",
"queryTime": 12.599092
}
Here the key to top level Map
is the identifier of the graph element. The value is itself a Map
that contains the GraphSON serialized graph element, and the value (i.e. the “count”).
For multi-line constructs, its possible to use tools like cURL to post JSON to the traversal service instead of relying on the conversion of the URI query parameters to be mapped to JSON (see Mapping a URI to JSON). However, you can also use newline characters in your URI.