-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to validate literals based on their datatype IRI? #46
Comments
Could you provide the above cases complete with shapes and sample data? Also, please check with SHACL playground to see what are the results there |
@tpluscode I have not done anything complicated yet. I think that even the most simple things like the XSD literals do not work. I can still share my files of course :-) This is my data file: prefix xsd: <http://www.w3.org/2001/XMLSchema#>
[ a <C>;
<p> "-false"^^xsd:boolean; # This will not validate when `sh:datatype xsd:boolean` is used.
<r> "--1.1e0"^^xsd:double ]. # This will validate when `sh:datatype xsd:double` is used. And this is my patterns file: prefix sh: <http://www.w3.org/ns/shacl#>
prefix xsd: <http://www.w3.org/2001/XMLSchema#>
[ sh:property
[ sh:datatype xsd:boolean;
sh:path <p> ],
[ sh:datatype xsd:double;
sh:path <r>;
sh:pattern "(\\+|-)?([0-9]+(\\.[0-9]*)?|\\.[0-9]+)" ]; # This does not do anything at all IIUC.
sh:targetClass <C> ]. |
I have added a couple more example. This is mostly a copy/paste from the XSD standard. I have replaced backward slashes with double backward slashes, since this seems to be required. Since I do not know the Regex grammar, I do know whether the Regexes are valid (the library does not give feedback when a Regex cannot be processed). This is my patterns file: prefix sh: <http://www.w3.org/ns/shacl#>
prefix xsd: <http://www.w3.org/2001/XMLSchema#>
[ sh:property
[ sh:datatype xsd:boolean;
sh:path <boolean>;
sh:pattern "false|true|0|1" ],
[ sh:datatype xsd:date;
sh:path <date>;
sh:pattern "-?([1-9][0-9]{3,}|0[0-9]{3})-(0[1-9]|1[0-2])-(0[1-9]|[12][0-9]|3[01])(Z|(\\+|-)((0[0-9]|1[0-3]):[0-5][0-9]|14:00))?" ],
[ sh:datatype xsd:dateTime;
sh:path <dateTime>;
sh:pattern """
-?([1-9][0-9]{3,}|0[0-9]{3})
-(0[1-9]|1[0-2])
-(0[1-9]|[12][0-9]|3[01])
T(([01][0-9]|2[0-3]):[0-5][0-9]:[0-5][0-9](\\.[0-9]+)?|(24:00:00(\\.0+)?))
(Z|(\\+|-)((0[0-9]|1[0-3]):[0-5][0-9]|14:00))?""" ],
[ sh:datatype xsd:decimal;
sh:path <decimal>;
sh:pattern "(\\+|-)?([0-9]+(\\.[0-9]*)?|\\.[0-9]+)" ],
[ sh:datatype xsd:double;
sh:path <double>;
sh:pattern "(\\+|-)?([0-9]+(\\.[0-9]*)?|\\.[0-9]+)([Ee](\\+|-)?[0-9]+)? |(\\+|-)?INF|NaN" ],
[ sh:datatype xsd:duration;
sh:path <duration>;
sh:pattern """
-?P( ( ( [0-9]+Y([0-9]+M)?([0-9]+D)?
| ([0-9]+M)([0-9]+D)?
| ([0-9]+D)
)
(T ( ([0-9]+H)([0-9]+M)?([0-9]+(\\.[0-9]+)?S)?
| ([0-9]+M)([0-9]+(\\.[0-9]+)?S)?
| ([0-9]+(\\.[0-9]+)?S)
)
)?
)
| (T ( ([0-9]+H)([0-9]+M)?([0-9]+(\\.[0-9]+)?S)?
| ([0-9]+M)([0-9]+(\\.[0-9]+)?S)?
| ([0-9]+(\\.[0-9]+)?S)
)
)
)""" ],
[ sh:datatype xsd:gMonth;
sh:path <gMonth>;
sh:pattern "--(0[1-9]|1[0-2])(Z|(\\+|-)((0[0-9]|1[0-3]):[0-5][0-9]|14:00))?" ],
[ sh:datatype xsd:gYear;
sh:path <gYear>;
sh:pattern "-?([1-9][0-9]{3,}|0[0-9]{3})(Z|(\\+|-)((0[0-9]|1[0-3]):[0-5][0-9]|14:00))?" ],
[ sh:datatype xsd:gYearMonth;
sh:path <gYearMonth>;
sh:pattern "-?([1-9][0-9]{3,}|0[0-9]{3})-(0[1-9]|1[0-2])(Z|(\\+|-)((0[0-9]|1[0-3]):[0-5][0-9]|14:00))?" ],
[ sh:datatype xsd:string;
sh:path <string>;
sh:pattern "\\S" ],
[ sh:datatype xsd:time;
sh:path <time>;
sh:pattern "(([01][0-9]|2[0-3]):[0-5][0-9]:[0-5][0-9](\\.[0-9]+)?|(24:00:00(\\.0+)?))(Z|(\\+|-)((0[0-9]|1[0-3]):[0-5][0-9]|14:00))?" ];
sh:targetClass <C> ]. This is my data file: prefix xsd: <http://www.w3.org/2001/XMLSchema#>
<i>
a <C>;
<boolean> false, "0"^^xsd:boolean;
<date> "-1-01-01"^^xsd:date;
<dateTime> "-1-01-01T00:00:00-00:00"^^xsd:dateTime;
<decimal> -01.10, "-02.20"^^xsd:decimal;
<double> -1.1e+0, "-2.2e+0"^^xsd:double;
<duration> "-1-01-01T00:00:00-00:00"^^xsd:duration;
<gMonth> "--01"^^xsd:gMonth;
<gYear> "-1"^^xsd:gYear, "111111"^^xsd:gYear;
<gYearMonth> "-1-01Z"^^xsd:gYear, "111111-01Z"^^xsd:gYear;
<string> "😺", "😺"^^xsd:string;
<time> "00:00:00-00:00"^^xsd:time. Since Regex is a crude approach for validating lexical forms, it would be better if lexical forms could also be validated by specifying the datatype IRI ( |
After looking at your examples in the SHACL playground and the spec I have a few observations:
Now, while the spec does not mention checking the lexical correctness of literals, it could be added as an option to the library. What do you think @martinmaillard ? |
This library already uses rdf-validate-datatype to validate the lexical correctness of literals. So if something gets validated wrong, it's probably a bug. |
I do not understand how literals should be validated based on their datatype IRI. I make the following observations:
For some literals specifying the datatype IRI with
sh:datatype
seems to suffice in order to also check their lexical form. An example of this isxsd:boolean
, where lexical form"-false"
is currently not accepted because the minus sign is not part of the syntax for Boolean lexical forms.For some literals specifying the datatype IRI with
sh:datatype
does not seem sufficient, since incorrect lexical forms are still accepted. An example of this isxsd:double
for which"--1.1e0"
is accepted, even though the double occurrence of the hyphen is not supported by the floating-point syntax.At the same time, it is also not clear how regular expressions could be manually specified in order to fix the absence of lexical form validation (see Regular expressions with Unicode #44 for generic issues with the way in which regular expressions are currently supported). For example, specifying the regular expression
sh:pattern "(\\+|-)?([0-9]+(\\.[0-9]*)?|\\.[0-9]+)"
copied from the XSD standard alongsidesh:datatype xsd:double
still allows validates literals like"--1.1e1"^^xsd:double
as ok, even though they violate both the datatype IRI and the regular expression specifications.At the moment it is difficult for me to determine what is intended behavior and what is a bug. It would be great if SHACL could be used to validate literals, but I am not sure whether (1) such validation is indeed intended by the SHACL standard, and whether (2) it is technologically feasible to implement such validation with contemporary technology.
The text was updated successfully, but these errors were encountered: