-
Notifications
You must be signed in to change notification settings - Fork 564
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
N-Quads serializer ignores default graph #1842
Comments
Hmm, it's somewhat related to #1804. |
May be related to this also:
EDIT: Actually on second thought no, maybe not. |
I guess this is a more general issue with how rdflib serializes context-aware stores. Changing the output format to @prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix ns1: <urn:x-rdflib:> .
@prefix prov: <http://www.w3.org/ns/prov#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
_:nd65e1d4bf7a34d92a06e4e619a245037b1 {
<http://greggkellogg.net/foaf#me> a foaf:Person ;
foaf:knows "http://manu.sporny.org/about#manu" ;
foaf:name "Gregg Kellogg" .
<http://manu.sporny.org/about#manu> a foaf:Person ;
foaf:knows "http://greggkellogg.net/foaf#me" ;
foaf:name "Manu Sporny" .
}
_:N427df78321f84718beec24f5f0c7e26c {
[] prov:generatedAtTime "2012-04-09"^^xsd:date .
} |
I noticed this issue while I was working on implementing a more efficient integration of pyld as a parser into rdflib core #1836. My implementation sets the graph name to I noticed the output in the format An easy fix (I think) is to set the context to |
To add further to this, it may be that those other serializers are adding statements from the default graph as For example: from rdflib.graph import DATASET_DEFAULT_GRAPH_ID
# Instead of this
store.add((s, p, o), None)
# Do this
store.add((s, p, o), DATASET_DEFAULT_GRAPH_ID) |
Oops, I take that back. This only works correctly for The Currently it serializes something like:
Notice the |
You're not wrong, I've been addressing this in the dataset re-work, changes to nquads serializer here |
Can't we get this in without breaking changes? |
Yes. In this instance, correcting the serialization doesn't cause any breaking changes. |
Thanks for your work @gjhiggins. I've copied your code from the nquads serializer out into a separate PR. I hope you don't mind. I need this patch to get the JSON-LD 1.1 tests to pass. Can I ask why rdflib/rdflib/plugins/serializers/nquads.py Lines 38 to 41 in 4fba0ff
I had to remove it because the serialize method was outputting double the statements. |
That's cool, I don't mind at all, whatever works for you.
It's a consequence of switching over to Dataset. |
Hi, I just noticed when I take the multigraph example from JSON-LD standard and convert it to N-Quads, main graph is suddenly referenced by a blank label, instead of no label. My code is: from rdflib.graph import Dataset
data = """{
"@context": [
"http://schema.org/",
{"@base": "http://example.com/"}
],
"@graph": [{
"@id": "people/alice",
"gender": [
{"@value": "weiblich", "@language": "de"},
{"@value": "female", "@language": "en"}
],
"knows": {"@id": "people/bob"},
"name": "Alice"
}, {
"@id": "graphs/1",
"@graph": {
"@id": "people/alice",
"parent": {
"@id": "people/bob",
"name": "Bob"
}
}
}, {
"@id": "graphs/2",
"@graph": {
"@id": "people/bob",
"sibling": {
"name": "Mary",
"sibling": {"@id": "people/bob"}
}
}
}]
}"""
ds = Dataset()
ds.parse(data=data, format="json-ld")
print(ds.serialize(format="nquads").strip()) The result looks like this for me: <http://example.com/people/bob> <http://schema.org/name> "Bob" <http://example.com/graphs/1> .
<http://example.com/people/alice> <http://schema.org/parent> <http://example.com/people/bob> <http://example.com/graphs/1> .
<http://example.com/people/alice> <http://schema.org/gender> "female"@en _:N6535627397b54eb2b076091aaccf8a98 .
<http://example.com/people/alice> <http://schema.org/name> "Alice" _:N6535627397b54eb2b076091aaccf8a98 .
<http://example.com/people/alice> <http://schema.org/gender> "weiblich"@de _:N6535627397b54eb2b076091aaccf8a98 .
<http://example.com/people/alice> <http://schema.org/knows> <http://example.com/people/bob> _:N6535627397b54eb2b076091aaccf8a98 .
<http://example.com/people/bob> <http://schema.org/sibling> _:Na4b162b6579f4d0a9aa68d2d0f65572c <http://example.com/graphs/2> .
_:Na4b162b6579f4d0a9aa68d2d0f65572c <http://schema.org/name> "Mary" <http://example.com/graphs/2> .
_:Na4b162b6579f4d0a9aa68d2d0f65572c <http://schema.org/sibling> <http://example.com/people/bob> <http://example.com/graphs/2> . However on the JSON-LD playground, the output for N-Quads conversion looks like this instead: <http://example.com/people/alice> <http://schema.org/gender> "female"@en .
<http://example.com/people/alice> <http://schema.org/gender> "weiblich"@de .
<http://example.com/people/alice> <http://schema.org/knows> <http://example.com/people/bob> .
<http://example.com/people/alice> <http://schema.org/name> "Alice" .
<http://example.com/people/alice> <http://schema.org/parent> <http://example.com/people/bob> <http://example.com/graphs/1> .
<http://example.com/people/bob> <http://schema.org/name> "Bob" <http://example.com/graphs/1> .
<http://example.com/people/bob> <http://schema.org/sibling> _:b0 <http://example.com/graphs/2> .
_:b0 <http://schema.org/name> "Mary" <http://example.com/graphs/2> .
_:b0 <http://schema.org/sibling> <http://example.com/people/bob> <http://example.com/graphs/2> . Is this issue likely to be solved soon? |
So is there a way to avoid |
@namedgraph In what sense is the I think there was some trick to it, where the default graph will or won't be in there depending on how you insert it into the |
It is an issue because the default graph should not have a name, as soon as it does it is no longer the default graph. |
Uhh, no? This is not standard in any way. The 4th element of a quad should be omitted for triples in the default graph:
|
My bad, you're right. Actually seems like some kind of internal rdflib thing that's leaking out by accident.
Just for completeness, I don't think this is exactly true. While it does say that if there is no graphLabel, it should be in the default graph, I don't think it specifies that a default graph may not be refered to using an IRI, in case that ever got standardised. |
Not that I know of, I will be working on fixing the Dataset issue in the coming months but it is all a bit tangled. |
Kinda explicit in the wording: “The default graph does not have a name”. My understanding is that this is inherited from SPARQL: a query that does not specify a graph name is posed of the default graph --- which in consequence, cannot have a name. However, RDFLib binds an identifier to every graph (probably inherited from the extant implementations of Store) and if an identifier isn't provided, a BNode is used. In consequence, in the RDFLib implementation, a Dataset's default graph, being an RDFLib
Because the default graph doesn't have a name, that's a must - but there are some slightly-inobvious consequences. I've spent some time looking into the issues here and I do have a mostly-complete solution that I'm using to tease out some of the options. If you'll forgive me some elaboration, I'm including some example code that uses as input a slightly-changed diff --git a/test/data/sportquads.trig b/test/data/sportquads.trig
+
+<http://example.com/resource/student_30> a ont:Student ;
+ foaf:name "Dudley Moore" . And some annotated test code ... def test_dataset_serialize():
d1 = Dataset()
d1.parse(
TEST_DATA_DIR / "sportquads.trig", # Augmented with the two triples mentioned
format="trig",
publicID="" # Uncontextualised statements -> default_graph
)
assert len(d1) == 2 # uncontextualised statements (“triples”) in the default graph
# And the contexts created ...
assert sorted(list(d1.contexts())) == [
URIRef('http://example.org/graph/practise'),
URIRef('http://example.org/graph/sports'),
URIRef('http://example.org/graph/students'),
] # Note: no mention of `<urn:x-rdflib:default>` aka “the graph with no name”
# it serializes as expected ...
assert sorted(d1.serialize(format="nquads").splitlines()) == [
"",
"<http://example.com/resource/sport_100> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://example.com/ontology/Sport> <http://example.org/graph/sports> .",
'<http://example.com/resource/sport_100> <http://www.w3.org/2000/01/rdf-schema#label> "Tennis" <http://example.org/graph/sports> .',
"<http://example.com/resource/student_10> <http://example.com/ontology/practises> <http://example.com/resource/sport_100> <http://example.org/graph/practise> .",
"<http://example.com/resource/student_10> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://example.com/ontology/Student> <http://example.org/graph/students> .",
'<http://example.com/resource/student_10> <http://xmlns.com/foaf/0.1/name> "Venus Williams" <http://example.org/graph/students> .',
"<http://example.com/resource/student_20> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://example.com/ontology/Student> <http://example.org/graph/students> .",
'<http://example.com/resource/student_20> <http://xmlns.com/foaf/0.1/name> "Demi Moore" <http://example.org/graph/students> .',
"<http://example.com/resource/student_30> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://example.com/ontology/Student> .",
'<http://example.com/resource/student_30> <http://xmlns.com/foaf/0.1/name> "Dudley Moore" .',
] # uncontextualized statements preserved as such, just as in the trig source
# Quads are no issue so let's work with uncontextualized statements
sportstriples = d1.serialize(format='nt') # Decontextualize the statements
# Use nquads parser to read triples into the default graph
d2 = Dataset()
d2.parse(
data=sportstriples,
format="nquads") # Read uncontextualized statements as nquads
assert len(d2) == 9 # All parsed into the default graph
assert len(list(d2.contexts())) == 0 # only named graphs are contexts
# Use nquads parser to read triples into a named graph (aka “context”)
d3 = Dataset()
d3.parse(
data=sportstriples,
format="nquads",
publicID=context0 # Assert a context for the uncontextualised statements
)
assert len(d3) == 0 # No triples in default graph
assert len(d3.graph(context0)) == 9 # All statements now contextualized
assert list(d3.contexts()) == [
URIRef('urn:example:context-0')
] # Only one context, as specified
# Now back to `d1` and some fun stuff ...
assert len(d1) == 2 # the two added triples
d1.default_union = True
assert len(d1) == 9 # decontextualise all statements
d1.default_union = False
assert len(d1) == 2 # back to base Why is it “fun stuff” - because of It is indeed tangled, the reason why this isn't a draft PR is that I'm playing whack-a-mole with the tests 😄 |
The following script can be run as-is:
Output:
Issue
I would have expected the first statement of the output to omit the graph label as it is a statement in the default graph.
See https://www.w3.org/TR/n-quads/#simple-triples for reference.
The text was updated successfully, but these errors were encountered: