-
Notifications
You must be signed in to change notification settings - Fork 3
/
Copy pathNotes
162 lines (135 loc) · 8.19 KB
/
Notes
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
Implementing SHACL as a translation to SPARQL
This is an implementation (actually two related implementations) that
performs SHACL validation by first constructing a SPARQL 1.1 SELECT query
and then running that query on the data graph. One implementation directly
creates SPARQL queries for core SHACL shapes. The other implementation
implements a SPARQL extension for SHACL and produces SPARQL queries for core
SHACL shapes using a metamodel graph that has templates for all the core
SHACL components except partition
The implementations are both pure translations to SPARQL - the validation
results of a SHACL shape are a transformation of the result set returned by
a SPARQL query against the data graph. The implementation thus cannot
depend on pre-binding in the data graph or hasShape or anything that is not
in SPARQL. The implementation also does not depend on access to the shapes
graph during the execution of the SPARQL queries.
Implementation Basics
The queries that implement shape validation need an environment where the
main query variable (always ?this) will have bindings for the nodes in the
data graph that need to be validated. This is easy for the initial shape
(the one that works on the results of scopes) but not so easy for embedded
shapes.
To ensure that embedded shapes work correctly embedded shapes use a graph
pattern that correctly generates bindings for ?this. For example, inside a
construct that is looking at the values of a path the pattern will bind
?this to these values.
However, this is not quite sufficient for constructs (like minCount) that
need to look at a set of values as an entirety. If there are no values for
?this then there are no bindings and thus nothing to work with in SPARQL.
To get around this problem, the graph pattern thus does not just generate
bindings for ?this but bindings for ?this related to another node, which is
the binding for ?parent, or related to the graph as a whole, in which case
there is no ?parent variable. The net effect then is that for those
constructs that need it the pattern can be grouped correctly to make these
sets, that can, for example, be counted for minCount, even if the set is the
empty set.
So that the correct pattern can be generated in all cases, the pattern is
broken up into two parts - one that sets up the stage (usually determining
bindings for ?parent) and one that then sets up bindings for ?this (usually
using the bindings for ?parent if it is available). When looking at values
of a path, the outer part is removed, the inner part becomes the new outer,
and a new inner is created to transverse the path.
To this basic context is added several pieces of information that are needed
to generate the correct SPARQL queries to implement SHACL components. The
entire set of information available consists of
inner - SPARQL pattern or query to get bindings for ?this
- usually goes from ?parent to ?this
outer - SPARQL pattern or query to get bindings for ?parent
- not always present, e.g., not present at top level
severity - the severity to be used in validation results
projection - the variable used in outer to get to ?this - usually ?parent
group - how to separate the results for counting, etc.
- if there is a parent this is GROUP BY ?parent
Validation Results
The result sets contain bindings for the following variables:
Variable SHACL predicate description
parent the node that was traversed to get
to the focus node, if any
this sh:focusNode focus node
thisShape sh:sourceShape identifier for shape that produced the result
childShape identifier for embedded shape, if any
subject sh:subject usually focus node of violation
predicate sh:predicate property or path involved in violation
object sh:object
severity sh:severity severity of result
component sh:sourceTemplate component property that produced the result
message sh:message a human readable message
Identifiers for shapes are the node itself if the shape is an IRI and a
unique identifier for shapes that are blank nodes. Note that for inverse
properties, the subject variable is the object of the triple that produced
the violation and the object variable is the subject of the triple.
A node validates against a shape if there is no solution in the result set
with the variable this bound to the node or unbound, the variable PS
bound to the identifier for the shape, and the variable severity bound to
sh:Violation.
The SHACL validation results graph can be constructed by first creating a
blank node for each solution in the result set and adding triples with it as
subject as above. For solutions that have the CS variable bound a sh:detail
triple is added for each other solution that has its PS variable bound to
the CS binding in this solution and the focus node of this solution as its
binding for the parent variable. This triple has the node for this solution
as its subject and the node for the other solution as its object.
It might be better to put the parent shape in the child solution instead of
the child shape in the parent solution.
Templates
Because prebinding and extension functions are not available to the
implementation it has to use a different method for templates. The method
used is substitution - instead of prebinding, the template code is subjected
to a process that substitutes the values of parameters into the code. To
handle subshapes, the method also allows the query for a shape to be
substituted into the code.
Because there is no functional help like hasShape available the SPARQL
queries need to include all the mediation for subshapes. This ends up with
slightly more complex queries than otherwise, but in most cases what is
needed just boilerplate and can be added on to the core portion of the
query. The template mechanism is designed to take care of this boilerplate
in many cases.
Templates use a substitution mechanism to access their argument and the
information described above. Subsitution takes a string (usually
parameterized SPARQL code) and a mapping from strings to replacement strings
and replaces substrings of the form [directive] in the first string
for directives of the form
- s(e) : recursively process e, treat the result as a shape in the shapes
graph, and return the SPARQL query for the shape
- c(p,e) : recursively process p and e, treat the result for e as a shape in the shapes
graph, and return the SPARQL query for propValues with the
result of p considered as a path the shape
- p(e) : recursively process e, treat the result as a path in the shapes
graph, and return the SPARQL code for that path
- l(e,"string") : recursively process e, treat the result as a list in the
shapes graph, and return string representations of the list elements
joined with string
- "replacement" : return replacement
- name : return the mapping for name
Template instantiation proceeds by first augmenting a copy of the context
with argument being the argument of the template, then it looks through
propValues constructs on the template and for any of them that have a
sh:argumentName it augments the context with that name being a value of the
propValues path or the sh:argumentDefault there if there is no value or ""
if there is also no default. Then it uses this context to substitute in
sh:templateMessage (default "") and adds the result of this for message.
Then it looks for a value for sh:templatePattern and puts that into the
pattern part of a query that will implement the shape. If there is no
sh:templatePattern then it looks for sh:templateFilter, wraps that in a
negated FILTER, and constructs the shape.
sh:disjointT a sh:ComponentTemplate ; rdfs:domain sh:Shape ; rdfs:range rdf:List ;
sh:list shmm:pathShape ;
sh:propValues ( ( rdf:rest rdf:rest ) [ sh:hasValue rdf:nil ] ) ;
sh:propValues ( rdf:first [ sh:shape shmm:pathShape ; sh:argumentName "path1" ] ) ;
sh:propValues ( ( rdf:rest rdf:first ) [ sh:shape shmm:pathShape ; sh:argumentName "path2" ] ) ;
sh:templateMessage "Path values not disjoint" ;
sh:messsage "Not list of two paths" ;
sh:templatePattern """?this [p(path1)] ?value1 . ?this [p(path2)] ?value1 .""" .
Internal Implementation Notes
Installing rdflib
sudo pip install rdflib
fix up permissions in /usr/lib/python2.7/site-packages