Skip to content
Markus Krötzsch edited this page Sep 17, 2019 · 7 revisions

This page explains the use of VLog as a stand-alone command-line client. VLog can also be used from Java through the VLog4j project.

We illustrate the functionality of VLog with two examples. In the first, we show how we can compute the materialization of a small RDF knowledge base (LUBM). In the second, we illustrate how we can query a relational database available as a set of CSV files.

If you want to run these examples, you can either use the compiled binaries or the Docker image. In the second case, the input data that we use is already available in /data.

Materializing a RDF knowledge base

We would like to materialize a RDF knowledge base (KB) using some example rules. Let us assume the KB is stored in a number of files with the N-Triples format. The first operation we need to do is to create a database from the collection of files. We do this by loading the KB into a different format that can be queried efficiently by VLog. For this end, we launch the command:

./vlog load -i /data/lubm_1/ntriples -o /data/lubm_1/kb

(notice that this is the path where the example data is stored in the Docker image). After the computation is terminated, VLog has created a copy of the database at "/data/lubm_1/kb".

Now we need to instruct VLog to see the data inside the KB as a collection of facts with a given predicate. To this end, we must create a small file, which we call "edb.conf" which contains all these settings. In our case, we add the following lines:

EDB0_predname=TE
EDB0_type=Trident
EDB0_param0=/data/lubm_1/kb

These lines instruct VLog to map the content of the KB as facts with predicate "TE". These facts will have three arguments, that are, the subject, predicate, object of the triples.

Now, we must create a file with the list of rules to execute. For now, let's say we want to execute four simple rules:

TI(A,B,C) :- TE(A,B,C)
isA(A,B) :- TI(A,rdf:type,B)
subClassOf(A,B) :- TI(A,rdfs:subClassOf,B)
isA(A,C) :- isA(A,B),subClassOf(B,C) 

We save these rules into a file called /data/lubm_1/rules. Let us look at the rules: First of all, the rules are written using the notation HEAD :- BODY. Then, we see that the first rule does something very trivial, namely to convert every EDB triple in TE (i.e., our knowledge graph) into another predicate "TI". The other two rules create two binary predicates, while the last rule computes the closure of the "isA" predicate.

In order to execute these rules on our input, we launch the program:

./vlog mat --edb /data/lubm_1/edb.conf --rules /data/lubm_1/rules

In this case, "mat" is the subcommand that instructs VLog to compute the materialization while the other two arguments provide the input for the computation. After the program has finished, the output should be something like:

[0x9b7052e4559973b8 2018-02-19 08:41:32] INFO  Read program from file /data/lubm_1/rules
[0x9b7052e4559973b8 2018-02-19 08:41:32] INFO  New assigned constants: 0
[0x9b7052e4559973b8 2018-02-19 08:41:32] INFO  The program might not terminate due to existential rules ...
[0x9b7052e4559973b8 2018-02-19 08:41:32] INFO  Starting full materialization
[0x9b7052e4559973b8 2018-02-19 08:41:32] INFO  Finished process. Iterations=10
[0x9b7052e4559973b8 2018-02-19 08:41:32] INFO  Runtime materialization = 53.750741 milliseconds
[0x9b7052e4559973b8 2018-02-19 08:41:32] INFO  Cardinality of TI: 100868
[0x9b7052e4559973b8 2018-02-19 08:41:32] INFO  Cardinality of isA: 18227
[0x9b7052e4559973b8 2018-02-19 08:41:32] INFO  Cardinality of subClassOf: 36
[0x9b7052e4559973b8 2018-02-19 08:41:32] INFO  Total # derivations: 119131
[0x9b7052e4559973b8 2018-02-19 08:41:32] INFO  Runtime = 92.590253 milliseconds
[0x9b7052e4559973b8 2018-02-19 08:41:32] INFO  Max memory used: 69.046875 MB

We can suppress some of the logging by adding the parameter "-l error", which will show only errors. Notice that in this particular case the inference will be thrown away after the program has terminated. To save it, we must add three parameters: "--storemat_path " tells where the materialization should be stored, "--storemat_format [files|csv|db]" tells the format to use, and "--decompressmat [1|0]" instructs where we should return the textual IDs rather than internal numerical ones. For instance, we can launch the command:

./vlog mat --edb /data/lubm_1/edb.conf --rules /data/lubm_1/rules --storemat_path /data/lubm_1/inf --storemat_format csv --decompressmat 1

which will dump all the materialized data into csv files inside /data/lubm_1/inf.

More options are available. To get a quick overview of all possibilities, you can type

./vlog help

and the program will return an explanation for all options.

Materializing a relational database with existential rules

TODO

Clone this wiki locally