Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

N-Quads serializer ignores default graph #1842

Open
edmondchuc opened this issue Apr 17, 2022 · 19 comments
Open

N-Quads serializer ignores default graph #1842

edmondchuc opened this issue Apr 17, 2022 · 19 comments
Labels
bug Something isn't working concept: default graph concept: RDF dataset Relates to the RDF datasets concept. core Relates to core functionality of RDFLib, i.e. `rdflib.{graph,store,term}` critical format: N-Quads Related to N-Quads format.

Comments

@edmondchuc
Copy link
Contributor

The following script can be run as-is:

from rdflib import Dataset

data = """
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

{
	_:b0 <http://www.w3.org/ns/prov#generatedAtTime> "2012-04-09"^^xsd:date .
}

_:b0 {
	<http://greggkellogg.net/foaf#me> a <http://xmlns.com/foaf/0.1/Person> ;
		<http://xmlns.com/foaf/0.1/knows> "http://manu.sporny.org/about#manu" ;
		<http://xmlns.com/foaf/0.1/name> "Gregg Kellogg" .

	<http://manu.sporny.org/about#manu> a <http://xmlns.com/foaf/0.1/Person> ;
		<http://xmlns.com/foaf/0.1/knows> "http://greggkellogg.net/foaf#me" ;
		<http://xmlns.com/foaf/0.1/name> "Manu Sporny" .
}


"""

g = Dataset()
g.parse(data=data, format="trig")

g.print(format="nquads")

Output:

_:nde95dc418226482f9fb7b0242109b9a3b1 <http://www.w3.org/ns/prov#generatedAtTime> "2012-04-09"^^<http://www.w3.org/2001/XMLSchema#date> _:Neae5d6b422ed4d1d872dd9674af22f8f .
<http://greggkellogg.net/foaf#me> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://xmlns.com/foaf/0.1/Person> _:nde95dc418226482f9fb7b0242109b9a3b1 .
<http://manu.sporny.org/about#manu> <http://xmlns.com/foaf/0.1/name> "Manu Sporny" _:nde95dc418226482f9fb7b0242109b9a3b1 .
<http://manu.sporny.org/about#manu> <http://xmlns.com/foaf/0.1/knows> "http://greggkellogg.net/foaf#me" _:nde95dc418226482f9fb7b0242109b9a3b1 .
<http://greggkellogg.net/foaf#me> <http://xmlns.com/foaf/0.1/knows> "http://manu.sporny.org/about#manu" _:nde95dc418226482f9fb7b0242109b9a3b1 .
<http://manu.sporny.org/about#manu> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://xmlns.com/foaf/0.1/Person> _:nde95dc418226482f9fb7b0242109b9a3b1 .
<http://greggkellogg.net/foaf#me> <http://xmlns.com/foaf/0.1/name> "Gregg Kellogg" _:nde95dc418226482f9fb7b0242109b9a3b1 .

Issue

I would have expected the first statement of the output to omit the graph label as it is a statement in the default graph.

_:nde95dc418226482f9fb7b0242109b9a3b1 <http://www.w3.org/ns/prov#generatedAtTime> "2012-04-09"^^<http://www.w3.org/2001/XMLSchema#date> _:Neae5d6b422ed4d1d872dd9674af22f8f .

See https://www.w3.org/TR/n-quads/#simple-triples for reference.

@edmondchuc
Copy link
Contributor Author

Hmm, it's somewhat related to #1804.

@aucampia
Copy link
Member

aucampia commented Apr 17, 2022

May be related to this also:

  • https://github.com/RDFLib/rdflib/blob/6f2c11cd2c549d6410f9a1c948ab3a8dbf77ca00/test/variants/rdf11trig_eg2.trig
  • https://github.com/RDFLib/rdflib/blob/6f2c11cd2c549d6410f9a1c948ab3a8dbf77ca00/test/variants/rdf11trig_eg2.nq
  • ("variants/rdf11trig_eg2"): pytest.mark.xfail(
    reason="""
    This fails randomly, passing less than 10% of the time, and always failing
    with comparing hext against trig. Not clear why, it may be a big with hext
    parsing.
    AssertionError: checking rdf11trig_eg2.hext against rdf11trig_eg2.trig
    in both:
    (rdflib.term.BNode('cb0'), rdflib.term.URIRef('http://xmlns.com/foaf/0.1/mbox'), rdflib.term.URIRef('mailto:[email protected]'))
    (rdflib.term.BNode('cb0'), rdflib.term.URIRef('http://xmlns.com/foaf/0.1/name'), rdflib.term.Literal('Bob'))
    (rdflib.term.URIRef('http://example.org/bob'), rdflib.term.URIRef('http://purl.org/dc/terms/publisher'), rdflib.term.Literal('Bob'))
    (rdflib.term.URIRef('http://example.org/alice'), rdflib.term.URIRef('http://purl.org/dc/terms/publisher'), rdflib.term.Literal('Alice'))
    only in first:
    (rdflib.term.BNode('cb0'), rdflib.term.URIRef('http://xmlns.com/foaf/0.1/knows'), rdflib.term.BNode('cbb5eb12b5dcf688537b0298cce144c6dd68cf047530d0b4a455a8f31f314244fd'))
    (rdflib.term.BNode('cbb5eb12b5dcf688537b0298cce144c6dd68cf047530d0b4a455a8f31f314244fd'), rdflib.term.URIRef('http://xmlns.com/foaf/0.1/mbox'), rdflib.term.URIRef('mailto:[email protected]'))
    (rdflib.term.BNode('cbb5eb12b5dcf688537b0298cce144c6dd68cf047530d0b4a455a8f31f314244fd'), rdflib.term.URIRef('http://xmlns.com/foaf/0.1/name'), rdflib.term.Literal('Alice'))
    only in second:
    (rdflib.term.BNode('cb0'), rdflib.term.URIRef('http://xmlns.com/foaf/0.1/knows'), rdflib.term.BNode('cbcd41774964510991c01701d8430149bc373e1f23734d9c938c81a40b1429aa33'))
    (rdflib.term.BNode('cbcd41774964510991c01701d8430149bc373e1f23734d9c938c81a40b1429aa33'), rdflib.term.URIRef('http://xmlns.com/foaf/0.1/mbox'), rdflib.term.URIRef('mailto:[email protected]'))
    (rdflib.term.BNode('cbcd41774964510991c01701d8430149bc373e1f23734d9c938c81a40b1429aa33'), rdflib.term.URIRef('http://xmlns.com/foaf/0.1/name'), rdflib.term.Literal('Alice'))
    """,
    raises=AssertionError,
    ),
  • ("trig", "rdf11trig_eg2.trig"): pytest.mark.xfail(
    reason="""
    Something is going wrong here with blank node serialization. In the second
    graph below bob knows someone who does not exist, while in first he knows
    someone that does exist and has the name Alice.
    AssertionError: in both:
    (rdflib.term.BNode('cbb5eb12b5dcf688537b0298cce144c6dd68cf047530d0b4a455a8f31f314244fd'), rdflib.term.URIRef('http://xmlns.com/foaf/0.1/mbox'), rdflib.term.URIRef('mailto:[email protected]'))
    (rdflib.term.BNode('cbb5eb12b5dcf688537b0298cce144c6dd68cf047530d0b4a455a8f31f314244fd'), rdflib.term.URIRef('http://xmlns.com/foaf/0.1/name'), rdflib.term.Literal('Alice'))
    (rdflib.term.URIRef('http://example.org/alice'), rdflib.term.URIRef('http://purl.org/dc/terms/publisher'), rdflib.term.Literal('Alice'))
    (rdflib.term.URIRef('http://example.org/bob'), rdflib.term.URIRef('http://purl.org/dc/terms/publisher'), rdflib.term.Literal('Bob'))
    only in first:
    (rdflib.term.BNode('cb0'), rdflib.term.URIRef('http://xmlns.com/foaf/0.1/knows'), rdflib.term.BNode('cbb5eb12b5dcf688537b0298cce144c6dd68cf047530d0b4a455a8f31f314244fd'))
    (rdflib.term.BNode('cb0'), rdflib.term.URIRef('http://xmlns.com/foaf/0.1/mbox'), rdflib.term.URIRef('mailto:[email protected]'))
    (rdflib.term.BNode('cb0'), rdflib.term.URIRef('http://xmlns.com/foaf/0.1/name'), rdflib.term.Literal('Bob'))
    only in second:
    (rdflib.term.BNode('cb7be1d0397a49ddd4ae8aa96acc7b6135903c5f3fa5e47bf619c0e4b438aafcc1'), rdflib.term.URIRef('http://xmlns.com/foaf/0.1/knows'), rdflib.term.BNode('cb0'))
    (rdflib.term.BNode('cb7be1d0397a49ddd4ae8aa96acc7b6135903c5f3fa5e47bf619c0e4b438aafcc1'), rdflib.term.URIRef('http://xmlns.com/foaf/0.1/mbox'), rdflib.term.URIRef('mailto:[email protected]'))
    (rdflib.term.BNode('cb7be1d0397a49ddd4ae8aa96acc7b6135903c5f3fa5e47bf619c0e4b438aafcc1'), rdflib.term.URIRef('http://xmlns.com/foaf/0.1/name'), rdflib.term.Literal('Bob'))
    """,
    raises=AssertionError,
    ),

EDIT: Actually on second thought no, maybe not.

@edmondchuc
Copy link
Contributor Author

I guess this is a more general issue with how rdflib serializes context-aware stores. Changing the output format to trig results in the same issue, thus breaking round-tripping.

@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix ns1: <urn:x-rdflib:> .
@prefix prov: <http://www.w3.org/ns/prov#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

_:nd65e1d4bf7a34d92a06e4e619a245037b1 {
    <http://greggkellogg.net/foaf#me> a foaf:Person ;
        foaf:knows "http://manu.sporny.org/about#manu" ;
        foaf:name "Gregg Kellogg" .

    <http://manu.sporny.org/about#manu> a foaf:Person ;
        foaf:knows "http://greggkellogg.net/foaf#me" ;
        foaf:name "Manu Sporny" .
}

_:N427df78321f84718beec24f5f0c7e26c {
    [] prov:generatedAtTime "2012-04-09"^^xsd:date .
}

@edmondchuc
Copy link
Contributor Author

I noticed this issue while I was working on implementing a more efficient integration of pyld as a parser into rdflib core #1836.

My implementation sets the graph name to rdflib.graph.DATASET_DEFAULT_GRAPH_ID when the statement is from the default graph and correctly serializes the dataset.

I noticed the output in the format nquads and trig were different. Those parsers and serializers fail round-trips by ignoring the default graph and incorrectly setting it to a blank node.

An easy fix (I think) is to set the context to rdflib.graph.DATASET_DEFAULT_GRAPH_ID in the failing serializers.

@edmondchuc
Copy link
Contributor Author

To add further to this, it may be that those other serializers are adding statements from the default graph as None which results in adding those statements to a graph labelled with a blank node. I need to confirm this.

For example:

from rdflib.graph import DATASET_DEFAULT_GRAPH_ID

# Instead of this
store.add((s, p, o), None)

# Do this
store.add((s, p, o), DATASET_DEFAULT_GRAPH_ID)

@edmondchuc
Copy link
Contributor Author

An easy fix (I think) is to set the context to rdflib.graph.DATASET_DEFAULT_GRAPH_ID in the failing serializers.

Oops, I take that back. This only works correctly for trig.

The nquads serializer just need to omit the graph label when it sees rdflib.graph.DATASET_DEFAULT_GRAPH_ID.

Currently it serializes something like:

_:b0 <http://www.w3.org/ns/prov#generatedAtTime> "2012-04-09"^^<http://www.w3.org/2001/XMLSchema#date> <urn:x-rdflib:default> .
<http://manu.sporny.org/about#manu> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://xmlns.com/foaf/0.1/Person> _:b0 .
<http://greggkellogg.net/foaf#me> <http://xmlns.com/foaf/0.1/knows> "http://manu.sporny.org/about#manu"^^<http://www.w3.org/2001/XMLSchema#string> _:b0 .
<http://manu.sporny.org/about#manu> <http://xmlns.com/foaf/0.1/knows> "http://greggkellogg.net/foaf#me"^^<http://www.w3.org/2001/XMLSchema#string> _:b0 .
<http://greggkellogg.net/foaf#me> <http://xmlns.com/foaf/0.1/name> "Gregg Kellogg"^^<http://www.w3.org/2001/XMLSchema#string> _:b0 .
<http://greggkellogg.net/foaf#me> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://xmlns.com/foaf/0.1/Person> _:b0 .
<http://manu.sporny.org/about#manu> <http://xmlns.com/foaf/0.1/name> "Manu Sporny"^^<http://www.w3.org/2001/XMLSchema#string> _:b0 .

Notice the <urn:x-rdflib:default>.

@ghost
Copy link

ghost commented Apr 17, 2022

The nquads serializer just need to omit the graph label when it sees rdflib.graph.DATASET_DEFAULT_GRAPH_ID.

You're not wrong, I've been addressing this in the dataset re-work, changes to nquads serializer here

@aucampia
Copy link
Member

The nquads serializer just need to omit the graph label when it sees rdflib.graph.DATASET_DEFAULT_GRAPH_ID.

You're not wrong, I've been addressing this in the dataset re-work, changes to nquads serializer here

Can't we get this in without breaking changes?

@ghost
Copy link

ghost commented Apr 17, 2022

Can't we get this in without breaking changes?

Yes. In this instance, correcting the serialization doesn't cause any breaking changes.

@aucampia aucampia added the bug Something isn't working label Apr 17, 2022
@edmondchuc
Copy link
Contributor Author

Thanks for your work @gjhiggins. I've copied your code from the nquads serializer out into a separate PR. I hope you don't mind. I need this patch to get the JSON-LD 1.1 tests to pass.

Can I ask why + [DATASET_DEFAULT_GRAPH_ID] is required?

for context in list(self.store.contexts()) + [DATASET_DEFAULT_GRAPH_ID]:
graph = self.store.graph(context)
for triple in graph:
stream.write(self._nq_row(triple, context).encode(encoding, "replace"))

I had to remove it because the serialize method was outputting double the statements.

@ghost
Copy link

ghost commented Apr 20, 2022

Thanks for your work @gjhiggins. I've copied your code from the nquads serializer out into a separate PR. I hope you don't mind. I need this patch to get the JSON-LD 1.1 tests to pass.

That's cool, I don't mind at all, whatever works for you.

Can I ask why + [DATASET_DEFAULT_GRAPH_ID] is required?

It's a consequence of switching over to Dataset. ConjunctiveGraph.contexts() returns all graphs, including the default graph and Dataset.graphs() doesn't include the (nameless) default graph.

@aucampia aucampia added core Relates to core functionality of RDFLib, i.e. `rdflib.{graph,store,term}` format: N-Quads Related to N-Quads format. labels Aug 21, 2022
@sdasda7777
Copy link

Hi, I just noticed when I take the multigraph example from JSON-LD standard and convert it to N-Quads, main graph is suddenly referenced by a blank label, instead of no label. My code is:

from rdflib.graph import Dataset

data = """{
"@context": [
"http://schema.org/",
{"@base": "http://example.com/"}
],
"@graph": [{
"@id": "people/alice",
"gender": [
{"@value": "weiblich", "@language": "de"},
{"@value": "female",   "@language": "en"}
],
"knows": {"@id": "people/bob"},
"name": "Alice"
}, {
"@id": "graphs/1",
"@graph": {
"@id": "people/alice",
"parent": {
"@id": "people/bob",
"name": "Bob"
}
}
}, {
"@id": "graphs/2",
"@graph": {
"@id": "people/bob",
"sibling": {
"name": "Mary",
"sibling": {"@id": "people/bob"}
}
}
}]
}"""

ds = Dataset()
ds.parse(data=data, format="json-ld")
print(ds.serialize(format="nquads").strip())

The result looks like this for me:

<http://example.com/people/bob> <http://schema.org/name> "Bob" <http://example.com/graphs/1> .
<http://example.com/people/alice> <http://schema.org/parent> <http://example.com/people/bob> <http://example.com/graphs/1> .
<http://example.com/people/alice> <http://schema.org/gender> "female"@en _:N6535627397b54eb2b076091aaccf8a98 .
<http://example.com/people/alice> <http://schema.org/name> "Alice" _:N6535627397b54eb2b076091aaccf8a98 .
<http://example.com/people/alice> <http://schema.org/gender> "weiblich"@de _:N6535627397b54eb2b076091aaccf8a98 .
<http://example.com/people/alice> <http://schema.org/knows> <http://example.com/people/bob> _:N6535627397b54eb2b076091aaccf8a98 .
<http://example.com/people/bob> <http://schema.org/sibling> _:Na4b162b6579f4d0a9aa68d2d0f65572c <http://example.com/graphs/2> .
_:Na4b162b6579f4d0a9aa68d2d0f65572c <http://schema.org/name> "Mary" <http://example.com/graphs/2> .
_:Na4b162b6579f4d0a9aa68d2d0f65572c <http://schema.org/sibling> <http://example.com/people/bob> <http://example.com/graphs/2> .

However on the JSON-LD playground, the output for N-Quads conversion looks like this instead:

<http://example.com/people/alice> <http://schema.org/gender> "female"@en .
<http://example.com/people/alice> <http://schema.org/gender> "weiblich"@de .
<http://example.com/people/alice> <http://schema.org/knows> <http://example.com/people/bob> .
<http://example.com/people/alice> <http://schema.org/name> "Alice" .
<http://example.com/people/alice> <http://schema.org/parent> <http://example.com/people/bob> <http://example.com/graphs/1> .
<http://example.com/people/bob> <http://schema.org/name> "Bob" <http://example.com/graphs/1> .
<http://example.com/people/bob> <http://schema.org/sibling> _:b0 <http://example.com/graphs/2> .
_:b0 <http://schema.org/name> "Mary" <http://example.com/graphs/2> .
_:b0 <http://schema.org/sibling> <http://example.com/people/bob> <http://example.com/graphs/2> .

Is this issue likely to be solved soon?

@aucampia aucampia added the concept: RDF dataset Relates to the RDF datasets concept. label May 20, 2023
@namedgraph
Copy link

So is there a way to avoid <urn:x-rdflib:default> when serializing Dataset? I'm using N-Quads.

@sdasda7777
Copy link

@namedgraph In what sense is the <urn:x-rdflib:default> an issue for you? Any N-Quads parser should parse that as data in the default graph, right?

I think there was some trick to it, where the default graph will or won't be in there depending on how you insert it into the Dataset, but I would avoid depending on that as that could change at any time without as much as a notice.

@aucampia
Copy link
Member

aucampia commented Jun 8, 2023

@namedgraph In what sense is the <urn:x-rdflib:default> an issue for you? Any N-Quads parser should parse that as data in the default graph, right?

It is an issue because the default graph should not have a name, as soon as it does it is no longer the default graph.

@namedgraph
Copy link

namedgraph commented Jun 8, 2023

@namedgraph In what sense is the <urn:x-rdflib:default> an issue for you? Any N-Quads parser should parse that as data in the default graph, right?

Uhh, no? This is not standard in any way. The 4th element of a quad should be omitted for triples in the default graph:

The graph label IRI can be omitted, in which case the triples are considered part of the default graph of the RDF dataset.

https://www.w3.org/TR/n-quads/#simple-triples

@sdasda7777
Copy link

Uhh, no? This is not standard in any way.

My bad, you're right. Actually seems like some kind of internal rdflib thing that's leaking out by accident.

The 4th element of a quad should be omitted for triples in the default graph:

The graph label IRI can be omitted, in which case the triples are considered part of the default graph of the RDF dataset.

Just for completeness, I don't think this is exactly true. While it does say that if there is no graphLabel, it should be in the default graph, I don't think it specifies that a default graph may not be refered to using an IRI, in case that ever got standardised.

@aucampia
Copy link
Member

aucampia commented Jun 8, 2023

So is there a way to avoid <urn:x-rdflib:default> when serializing Dataset? I'm using N-Quads.

Not that I know of, I will be working on fixing the Dataset issue in the coming months but it is all a bit tangled.

@ghost
Copy link

ghost commented Jun 8, 2023

While it does say that if there is no graphLabel, it should be in the default graph, I don't think it specifies that a default graph may not be refered to using an IRI, in case that ever got standardised.

Kinda explicit in the wording: “The default graph does not have a name”.

My understanding is that this is inherited from SPARQL: a query that does not specify a graph name is posed of the default graph --- which in consequence, cannot have a name.

However, RDFLib binds an identifier to every graph (probably inherited from the extant implementations of Store) and if an identifier isn't provided, a BNode is used.

In consequence, in the RDFLib implementation, a Dataset's default graph, being an RDFLib Graph, is (for the time being, unavoidably) assigned the (internal) identifier DATASET_DEFAULT_GRAPH_ID (bound to urn:x-rdflib:default) but this is not intended for external consumption - use of the Dataset().default_graph reference is recommended.

So is there a way to avoid <urn:x-rdflib:default> when serializing Dataset? I'm using N-Quads.

Because the default graph doesn't have a name, that's a must - but there are some slightly-inobvious consequences.

I've spent some time looking into the issues here and I do have a mostly-complete solution that I'm using to tease out some of the options. If you'll forgive me some elaboration, I'm including some example code that uses as input a slightly-changed test/data/sportsquads.trig, having added a couple of triples: a student_30 with foaf:name "Dudley Moore":

diff --git a/test/data/sportquads.trig b/test/data/sportquads.trig
+
+<http://example.com/resource/student_30> a ont:Student ;
+        foaf:name "Dudley Moore" .

And some annotated test code ...

def test_dataset_serialize():
    d1 = Dataset()
    d1.parse(
        TEST_DATA_DIR / "sportquads.trig",  # Augmented with the two triples mentioned
        format="trig",
        publicID=""  #  Uncontextualised statements -> default_graph
    )
    assert len(d1) == 2  # uncontextualised statements (“triples”) in the default graph

    # And the contexts created ...
    assert sorted(list(d1.contexts())) == [
        URIRef('http://example.org/graph/practise'),
        URIRef('http://example.org/graph/sports'), 
        URIRef('http://example.org/graph/students'),
    ]  # Note: no mention of `<urn:x-rdflib:default>` aka “the graph with no name”

    # it serializes as expected ...
    assert sorted(d1.serialize(format="nquads").splitlines()) == [
        "",
        "<http://example.com/resource/sport_100> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://example.com/ontology/Sport> <http://example.org/graph/sports> .",
        '<http://example.com/resource/sport_100> <http://www.w3.org/2000/01/rdf-schema#label> "Tennis" <http://example.org/graph/sports> .',
        "<http://example.com/resource/student_10> <http://example.com/ontology/practises> <http://example.com/resource/sport_100> <http://example.org/graph/practise> .",
        "<http://example.com/resource/student_10> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://example.com/ontology/Student> <http://example.org/graph/students> .",
        '<http://example.com/resource/student_10> <http://xmlns.com/foaf/0.1/name> "Venus Williams" <http://example.org/graph/students> .',
        "<http://example.com/resource/student_20> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://example.com/ontology/Student> <http://example.org/graph/students> .",
        '<http://example.com/resource/student_20> <http://xmlns.com/foaf/0.1/name> "Demi Moore" <http://example.org/graph/students> .',
        "<http://example.com/resource/student_30> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://example.com/ontology/Student>  .",
        '<http://example.com/resource/student_30> <http://xmlns.com/foaf/0.1/name> "Dudley Moore"  .',
    ]  # uncontextualized statements preserved as such, just as in the trig source

    # Quads are no issue so let's work with uncontextualized statements
    sportstriples = d1.serialize(format='nt')  # Decontextualize the statements

    # Use nquads parser to read triples into the default graph
    d2 = Dataset()
    d2.parse(
        data=sportstriples,
        format="nquads")  # Read uncontextualized statements as nquads
    assert len(d2) == 9  # All parsed into the default graph
    assert len(list(d2.contexts())) == 0  # only named graphs are contexts

    # Use nquads parser to read triples into a named graph (aka “context”)
    d3 = Dataset()
    d3.parse(
        data=sportstriples,
        format="nquads",
        publicID=context0  # Assert a context for the uncontextualised statements
    )
    assert len(d3) == 0  # No triples in default graph
    assert len(d3.graph(context0)) == 9  # All statements now contextualized
    assert list(d3.contexts()) == [
        URIRef('urn:example:context-0')
    ]  # Only one context, as specified

    # Now back to `d1` and some fun stuff ...
    assert len(d1) == 2  # the two added triples
    d1.default_union = True
    assert len(d1) == 9  # decontextualise all statements
    d1.default_union = False
    assert len(d1) == 2  # back to base

Why is it “fun stuff” - because of SPARQL_DEFAULT_GRAPH_UNION - “If True - the default graph in the RDF Dataset is the union of all named graphs”

It is indeed tangled, the reason why this isn't a draft PR is that I'm playing whack-a-mole with the tests 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working concept: default graph concept: RDF dataset Relates to the RDF datasets concept. core Relates to core functionality of RDFLib, i.e. `rdflib.{graph,store,term}` critical format: N-Quads Related to N-Quads format.
Projects
None yet
Development

No branches or pull requests

4 participants