Skip to content

A converter from RDF knowledge graph to Gephi projcet files

Notifications You must be signed in to change notification settings

sparna-git/rdf2gephi

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

31 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

RDF-to-Gephi

Converts RDF knowledge graphs to a Gephi GEXF file that can be opened in Gephi. GEXF stands for Graph Exchange XML Format. Supports single RDF file, multiple files in a folder, or remote SPARQL endpoint URL. Can work either in a "direct and simple conversion" mode, turning triples into edges, or using a set of SPARQL queries to define exactly the scope and structure of the nodes and edges that should appear in the Gexf file.

How to run

  1. Make sure you have Java installed
  2. Download the application from the release section
  3. Have some RDF data at hand (one or more RDF file, or a SPARQL service you can query)
  4. Open a command-line in the directory you downloaded the app, and run java -jar rdf2gexf-x.y-onejar.jar --help to list the available commands and options
  5. Run a conversion command, typically the following:
java -jar rdf2gephi-1.0-onejar.jar sparql \
--input http://my.sparql.endpoint \
--edges queries/edges.rq \
--attributes queries/attributes.rq \
--labels queries/labels.rq \
--output output.gexf
  1. Download and run Gephi
  2. In Gephi, open the generated gexf file, and start applying layouts and colors to your graph to make it beautiful and tell your story

Available commands and options

direct convertion (discouraged)

Converts RDF data to GEXF format directly. All literals are considered as attributes, and all triples as edges, except rdf:type. rdfs:label is used as label.

The full options of the command are:

    direct      Converts RDF data to GEXF format directly. All literals are 
            considered as attributes, and all triples as edges, except 
            rdf:type. rdfsl:label is used as label.
      Usage: direct [options]
        Options:
          -e, --endDateProperty
            URI of the property in the knowledge grapg holding the end date of 
            entities 
        * -i, --input
            Path to RDF input file, or directory containing RDF files, or URL 
            of a SPARQL endpoint.
        * -o, --output
            Path to GEXF output file
          -s, --startDateProperty
            URI of the property holding the start date of entities
          -w, --weight
            Path to a properties file associating properties to weights

SPARQL-based conversion (preferred)

The sparql commands takes a set of SPARQL queries to build the structure of the Gephi graph. The command synopsis is the following:

java -jar rdf2gephi-1.0-onejar.jar sparql \
--input <file or directory or url of SPARQL endpoint> \
--edges <SPARQL query file to create edges> \
--attributes <SPARQL query file to create attributes> \
--labels <SPARQL query file to create label> \
--output <output gexf file>

The full options of the command are:

    sparql      Converts RDF data to GEXF format using SPARQL queries.
      Usage: sparql [options]
        Options:
          -a, --attributes
            Path to the file containing the SPARQL query to retrieve 
            attributes, e.g. 'sparql/attribute.rq'. The query MUST return 3 
            columns: the first one is the subject, the second one is the 
            attribute URI, the third one is the attribute value.
          -d, --dates
            Path to the file containing the SPARQL query to retrieve date 
            ranges, e.g. 'sparql/dates.rq'
        * -e, --edges
            Path to the file containing the SPARQL query to retrieve edges, 
            e.g. 'sparql/edges.rq'. The query MUST return the following 
            variables: ?subject, ?edge, ?object
        * -i, --input
            Path to RDF input file, or directory containing RDF files, or URL 
            of a SPARQL endpoint.
          -l, --labels
            Path to the file containing the SPARQL query to retrieve labels, 
            e.g. 'sparql/labels.rq'. The query MUST return the following 
            variables: ?subject, ?label
        * -o, --output
            Path to GEXF output file

/!\ Attention : the provided queries MUST follow the rules below:

edges query

This query defines the graph structure. The edges query MUST return the 3 variables: ?subject, ?edge, ?object.

An example of such query is:

PREFIX cdm: <http://publications.europa.eu/ontology/cdm#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
SELECT ?subject ?edge ?object
WHERE {
    ?subject a cdm:resource_legal .
    ?subject cdm:resource_legal_in-force true .
    ?subject cdm:resource_legal_based_on_resource_legal ?object .
    BIND(cdm:resource_legal_based_on_resource_legal as ?edge)
}

This query is mandatory.

labels query

This query returns the labels of each node in the graph. Typically from an rdfs:label, skos:prefLabel, or anything. The labels query MUST use the ?subject variable to hold the node in the graph, and MUST return the 2 variables ?subject and ?label.

An example of such query is:

PREFIX cdm: <http://publications.europa.eu/ontology/cdm#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
SELECT ?subject ?label
WHERE {
    ?subject cdm:resource_legal_eli ?eli .
    BIND(STRAFTER(STR(?eli), "http://data.europa.eu/") AS ?label)
}

This query is optional. If not provided, the following query is used:

# Default labels query
# Selects the first present : foaf:name, rdfs:label in english or without language, skos:prefLabel in english or without language
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
PREFIX org: <http://www.w3.org/ns/org#>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX epvoc: <https://data.europarl.europa.eu/def/epvoc#>
SELECT ?subject ?label
WHERE {
	OPTIONAL { ?subject foaf:name ?foafName }
	OPTIONAL { ?subject skos:prefLabel ?prefLabel . FILTER(lang(?prefLabel) = "en" || lang(?prefLabel) = "") }
	OPTIONAL { ?subject rdfs:label ?rdfsLabel . FILTER(lang(?rdfsLabel) = "en" || lang(?rdfsLabel) = "") }
	
	BIND(COALESCE(?foafName, ?prefLabel, ?rdfsLabel) AS ?label)
}

attributes query

This query returns the attributes of each node in the graph. Typically the value of rdf:type, and other attributes. The attributes query MUST use the ?subject variable to hold the node in the graph, and MUST return 3 variables : ?subject, ?attribute as the attribute type, and ?value as the attribute value (a URI or a literal).

An example of such query is:

PREFIX cdm: <http://publications.europa.eu/ontology/cdm#>

SELECT ?subject ?attribute ?value
WHERE {
    ?subject cdm:work_has_resource-type ?value .
	BIND(cdm:work_has_resource-type AS ?attribute)
}

This query is optional. If not provided, the following query is used:

# Default attributes query
# Selects the rdf:type value and any other property pointing to a skos:Concept
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
PREFIX org: <http://www.w3.org/ns/org#>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX epvoc: <https://data.europarl.europa.eu/def/epvoc#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
SELECT ?subject ?attribute ?value
WHERE {
	{ 
		?subject a ?value .
		BIND(rdf:type AS ?attribute)
	}
	UNION
	{
		?subject ?attribute ?value .
		?value a skos:Concept .
	}
}

dates query

TODO

Support for dynamic graphs

rdf2gephi supports the creation of dynamic graphs where we can see the evolution of the graph over time. TODO

Typical actions in Gephi to view your RDF graph

  1. Apply a layout algorithm : Use "Force Atlas 2".
  2. Give colors to nodes based on the type attribute : Appearance > Nodes > Partition > Choose an attribute
  3. Size the nodes based on (incoming or outgoing) degree : Appearance > Size icon > Ranking > Degree
  4. Print labels only of biggest nodes : Filter > Topology > Degree Range > drag and drop to Queries below > set the parameters. Then click on filter. Then click on icon above "hide node/edges labels if not in filtered graph"
  5. Click on "Show node labels" button
  6. Go in "Preview" tab, regenerate the preview, export as SVG/PNG/PDF

This is illustrated in the screencast below:

Gallery

EU in-force legislation from Cellar SPARQL endpoint. Links shows the "based_on" links (acts legally based on another act). ELI identifiers of acts that are most often used as basis are shown