Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trig format cannot be processed by RDFLib #2958

Open
floresbakker opened this issue Oct 28, 2024 · 6 comments
Open

Trig format cannot be processed by RDFLib #2958

floresbakker opened this issue Oct 28, 2024 · 6 comments

Comments

@floresbakker
Copy link

Data in Trig format cannot be processed by RDFLib.

Let us assume the following data including graphs (example copied from RDFlib documentation)

GraphString = '''

PREFIX eg: <http://example.com/person/>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>

eg:graph-1 {
    eg:drewp a foaf:Person .
    eg:drewp eg:says "Hello World" .
}

eg:graph-2 {
    eg:nick a foaf:Person .
    eg:nick eg:says "Hi World" .
}
'''

Next, let's parse this data into a Graph object:

someGraph = Graph()
someGraph.parse(data=GraphString , format="trig")

Let us query the graph:

someQuery = someGraph.query('''
       
select ?s

where  {
         ?s ?p ?o     
       }   
''')   

If we then go through the result set, there is unexpectedly nothing:

for row in someQuery:
        print (str(row.s))

This does not lead to any result, whereas I would expect the following bindings for the variable ?s .

http://example.com/person/drewp
http://example.com/person/nick
http://example.com/person/drewp
http://example.com/person/nick

If I prepare the data differently by removing the explicit graphs, I do get the expected results:

GraphString = '''

PREFIX eg: <http://example.com/person/>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>

   eg:drewp a foaf:Person .
   eg:drewp eg:says "Hello World" .

   eg:nick a foaf:Person .
   eg:nick eg:says "Hi World" .
'''

Result of the query:

http://example.com/person/drewp
http://example.com/person/nick
http://example.com/person/drewp
http://example.com/person/nick

Perhaps I am mistaken in this and I should work in a different way with rdflib graph objects in Python that contain trig data, but it does seem incorrect behavior from purely a triples & sparql point of view.

@ajnelson-nist
Copy link
Contributor

I've been curious about this behavior, too. I think it's consistent with the SPARQL specification to behave differently when a quads graph is used vs. a triples graph.

The SPARQL 1.1 grammar has the term QuadsNotTriples (item 51), which indicates a syntax difference in the WHERE clause.

Your query ...

someQuery = someGraph.query('''
       
select ?s

where  {
         ?s ?p ?o     
       }   
''') 

... would need to become:

 someQuery = someGraph.query('''
        
 select ?s
 
 where  {
+         GRAPH ?g {
          ?s ?p ?o     
+         }
        }   
 ''')

I only think this because of trying to figure out some nuances with the JSON-LD @graph keyword. My queries written for triples graphs started not returning results when I gave the @graph JSON key a sibling @id key. I got results again when throwing in that GRAPH ?g { ... } wrapper.

I'm not sure offhand where in the SPARQL specification this gets spelled out, though. The word "graph" appears a few hundred times in the document. So I'm curious for how this thread goes.

@floresbakker
Copy link
Author

floresbakker commented Oct 28, 2024

I've been curious about this behavior, too. I think it's consistent with the SPARQL specification to behave differently when a quads graph is used vs. a triples graph.

The SPARQL 1.1 grammar has the term QuadsNotTriples (item 51), which indicates a syntax difference in the WHERE clause.

Your query ...

someQuery = someGraph.query('''
       
select ?s

where  {
         ?s ?p ?o     
       }   
''') 

... would need to become:

 someQuery = someGraph.query('''
        
 select ?s
 
 where  {
+         GRAPH ?g {
          ?s ?p ?o     
+         }
        }   
 ''')

I only think this because of trying to figure out some nuances with the JSON-LD @graph keyword. My queries written for triples graphs started not returning results when I gave the @graph JSON key a sibling @id key. I got results again when throwing in that GRAPH ?g { ... } wrapper.

I'm not sure offhand where in the SPARQL specification this gets spelled out, though. The word "graph" appears a few hundred times in the document. So I'm curious for how this thread goes.

You are referring to rules dealing explicitly with UPDATE or DELETE statements in SPARQL. Those production rules make part of the abstract syntax tree of SPARQL, so one should read it as leaves and branches of a tree, not as nodes that stand on their own (51 > 50 > 48/49 > 38/39/40). See also note #8 in paragraph 19.8.

My issue deals with a SELECT statement. Would be destructive to the SPARQL specification if a graph could not be queried anymore without a graph statement. Fortunately that is not the case.

I have tested this issue with four engines, RDFLib, Speedy, Virtuoso and Jena. Only RDFlib breaks, the rest of the engines give me the expected bindings.

@WhiteGobo
Copy link
Contributor

Doesnt work for Dataset either. I would have expected Graph() conceals data from its store, from graphs with a different identifier. This behaviour i wouldnt expect from Dataset. So i would expect the query on the dataset should work:

anotherGraphSameData = Dataset(store=someGraph.store)
someQuery = anotherGraphSameData.query('''
       
select ?s

where  {
         ?s ?p ?o     
       }   
''')   
for row in someQuery:
        print (str(row.s))
#still will return nothing

As sidenote the data is correctly parsed, only the query for dataset doesnt work. So this returns GraphString.

print(anotherGraphSameData.serialize(format="trig"))

@WhiteGobo
Copy link
Contributor

WhiteGobo commented Oct 28, 2024

Also one can search with given query in the data from the graph itself. So using:

someGraph = Graph(identifier=URIRef("http://example.com/person/graph-1"))

then you will get

http://example.com/person/drewp
http://example.com/person/drewp

But there doesnt seem to be any options for the sparql processor to ignore graph identifiers. See

def query( # type: ignore[override]
self,
strOrQuery: Union[str, Query],
initBindings: Optional[Mapping[str, Identifier]] = None,
initNs: Optional[Mapping[str, Any]] = None,
base: Optional[str] = None,
DEBUG: bool = False,
) -> Mapping[str, Any]:

@floresbakker
Copy link
Author

I first noticed the behavior when I wanted to run PyShacl on trig data in RDFlib. I could never get that to work, despite that PyShacl is able to handle trig files. I suspected the issue might be in RDFlib, so I decided to create the above mentioned example. My workaround is to transform a trig file into a turtle file and then offer this instead to RDFlib/PyShacl. But this is not handy, as each time I want to process some data, I first have to transform the source.

@WhiteGobo
Copy link
Contributor

The sparql query works if you use Dataset with the option default_untion. @ashleysommer gave some more background info at #2959

from rdflib import *

GraphString = '''
PREFIX eg: <http://example.com/person/>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>

eg:graph-1 {
    eg:drewp a foaf:Person .
    eg:drewp eg:says "Hello World" .
}

eg:graph-2 {
    eg:nick a foaf:Person .
    eg:nick eg:says "Hi World" .
}

eg:ash a foaf:Person .
eg:ash eg:says "Default" .
'''

ds = Dataset(default_union=True)
ds.parse(data=GraphString, format="trig")

someQuery = ds.query('''

select ?s

where  {
         ?s ?p ?o
       }
''')

for row in someQuery:
        print (str(row.s))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants
@WhiteGobo @ajnelson-nist @floresbakker and others