-
Notifications
You must be signed in to change notification settings - Fork 9
Usage
This page explains the use of VLog as a stand-alone command-line client. VLog can also be used from Java through the VLog4j project.
We illustrate the functionality of VLog with two examples. In the first, we show how we can compute the materialization of a small RDF knowledge base (LUBM). In the second, we illustrate how we can query a relational database available as a set of CSV files.
If you want to run these examples, you can either use the compiled binaries or the Docker image. In the second case, the input data that we use is already available in /data.
We would like to materialize a RDF knowledge base (KB) using some example rules. Let us assume the KB is stored in a number of files with the N-Triples format. The first operation we need to do is to create a database from the collection of files. We do this by loading the KB into a different format that can be queried efficiently by VLog. For this end, we launch the command:
./vlog load -i /data/lubm_1/ntriples -o /data/lubm_1/kb
(notice that this is the path where the example data is stored in the Docker image). After the computation is terminated, VLog has created a copy of the database at "/data/lubm_1/kb".
Now we need to instruct VLog to see the data inside the KB as a collection of facts with a given predicate. To this end, we must create a small file, which we call "edb.conf" which contains all these settings. In our case, we add the following lines:
EDB0_predname=TE
EDB0_type=Trident
EDB0_param0=/data/lubm_1/kb
These lines instruct VLog to map the content of the KB as facts with predicate "TE". These facts will have three arguments, that are, the subject, predicate, object of the triples.
Now, we must create a file with the list of rules to execute. For now, let's say we want to execute four simple rules:
TI(A,B,C) :- TE(A,B,C)
isA(A,B) :- TI(A,rdf:type,B)
subClassOf(A,B) :- TI(A,rdfs:subClassOf,B)
isA(A,C) :- isA(A,B),subClassOf(B,C)
We save these rules into a file called /data/lubm_1/rules. Let us look at the rules: First of all, the rules are written using the notation HEAD :- BODY. Then, we see that the first rule does something very trivial, namely to convert every EDB triple in TE (i.e., our knowledge graph) into another predicate "TI". The other two rules create two binary predicates, while the last rule computes the closure of the "isA" predicate.
In order to execute these rules on our input, we launch the program:
./vlog mat --edb /data/lubm_1/edb.conf --rules /data/lubm_1/rules
In this case, "mat" is the subcommand that instructs VLog to compute the materialization while the other two arguments provide the input for the computation. After the program has finished, the output should be something like:
[0x9b7052e4559973b8 2018-02-19 08:41:32] INFO Read program from file /data/lubm_1/rules
[0x9b7052e4559973b8 2018-02-19 08:41:32] INFO New assigned constants: 0
[0x9b7052e4559973b8 2018-02-19 08:41:32] INFO The program might not terminate due to existential rules ...
[0x9b7052e4559973b8 2018-02-19 08:41:32] INFO Starting full materialization
[0x9b7052e4559973b8 2018-02-19 08:41:32] INFO Finished process. Iterations=10
[0x9b7052e4559973b8 2018-02-19 08:41:32] INFO Runtime materialization = 53.750741 milliseconds
[0x9b7052e4559973b8 2018-02-19 08:41:32] INFO Cardinality of TI: 100868
[0x9b7052e4559973b8 2018-02-19 08:41:32] INFO Cardinality of isA: 18227
[0x9b7052e4559973b8 2018-02-19 08:41:32] INFO Cardinality of subClassOf: 36
[0x9b7052e4559973b8 2018-02-19 08:41:32] INFO Total # derivations: 119131
[0x9b7052e4559973b8 2018-02-19 08:41:32] INFO Runtime = 92.590253 milliseconds
[0x9b7052e4559973b8 2018-02-19 08:41:32] INFO Max memory used: 69.046875 MB
We can suppress some of the logging by adding the parameter "-l error", which will show only errors. Notice that in this particular case the inference will be thrown away after the program has terminated. To save it, we must add three parameters: "--storemat_path " tells where the materialization should be stored, "--storemat_format [files|csv|db]" tells the format to use, and "--decompressmat [1|0]" instructs where we should return the textual IDs rather than internal numerical ones. For instance, we can launch the command:
./vlog mat --edb /data/lubm_1/edb.conf --rules /data/lubm_1/rules --storemat_path /data/lubm_1/inf --storemat_format csv --decompressmat 1
which will dump all the materialized data into csv files inside /data/lubm_1/inf.
More options are available. To get a quick overview of all possibilities, you can type
./vlog help
and the program will return an explanation for all options.
TODO