Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issues with negative dates #2829

Open
Superraptor opened this issue Jul 19, 2024 · 5 comments
Open

Issues with negative dates #2829

Superraptor opened this issue Jul 19, 2024 · 5 comments
Assignees
Labels
awaiting feedback More feedback is needed from the author of the PR or Issue.

Comments

@Superraptor
Copy link

I'm currently trying to parse an RDF file in TTL exported from a Wikibase using its dumpRdf.php feature.

The Wikibase includes some ISO-8601 dates that are BCE, such as "-0028-08-10T00:00:00Z"^^xsd:dateTime". When processing these, RDFLib spits out the following error:

Failed to convert Literal lexical form to value. Datatype=http://www.w3.org/2001/XMLSchema#dateTime, Converter=<function parse_datetime at 0x0000020671E09C60>
Traceback (most recent call last):
  File "C:\Users\Username\anaconda3\envs\py311\Lib\site-packages\rdflib\term.py", line 2084, in _castLexicalToPython
    return conv_func(lexical)  # type: ignore[arg-type]
           ^^^^^^^^^^^^^^^^^^
  File "C:\Users\Username\anaconda3\envs\py311\Lib\site-packages\isodate\isodatetime.py", line 55, in parse_datetime
    tmpdate = parse_date(datestring)
              ^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Username\anaconda3\envs\py311\Lib\site-packages\isodate\isodates.py", line 203, in parse_date
    raise ISO8601Error('Unrecognised ISO 8601 date format: %r' % datestring)
isodate.isoerror.ISO8601Error: Unrecognised ISO 8601 date format: '-0028-08-10'

Is there any recommended way to deal with this? Thanks so much!

@nicholascar
Copy link
Member

Oh dear, that's a good find and I've got no solution for you, sorry. Someone will have to look in to the Python date parser. So it's either a limitation in Python general date parsing (unlikely) or an issue with the way RDFLib is using the Python date parsing (more likely). But I don't know that part of the library, sorry.

@Superraptor
Copy link
Author

@nicholascar thanks for the response! if anything is a nightmare in any programming language/package it's date/time, looks like we might need someone familiar with the deep magics for this one!

@nicholascar
Copy link
Member

I just tried to reproduce the error but can't:

from rdflib import Graph, Literal, URIRef
from rdflib.namespace import XSD, PROV

d_neg = Literal("-0028-08-10T00:00:00Z", datatype=XSD.dateTimeStamp)

g = Graph()
g.add((
    URIRef("http://example.com"),
    PROV.startedAtTime,
    d_neg
))
print(d_neg.toPython())
print(d_neg.n3())
print(g.serialize(format="longturtle"))
print(g.serialize(format="json-ld"))

This correctly prints out:

-0028-08-10T00:00:00Z
"-0028-08-10T00:00:00Z"^^<http://www.w3.org/2001/XMLSchema#dateTimeStamp>
PREFIX prov: <http://www.w3.org/ns/prov#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>

<http://example.com>
    prov:startedAtTime "-0028-08-10T00:00:00Z"^^xsd:dateTimeStamp ;
.

[
  {
    "@id": "http://example.com",
    "http://www.w3.org/ns/prov#startedAtTime": [
      {
        "@type": "http://www.w3.org/2001/XMLSchema#dateTimeStamp",
        "@value": "-0028-08-10T00:00:00Z"
      }
    ]
  }
]

Can you supply code that triggers the error so I can take a look at it in more depth?

@nicholascar nicholascar self-assigned this Aug 11, 2024
@nicholascar nicholascar added the awaiting feedback More feedback is needed from the author of the PR or Issue. label Aug 11, 2024
@Superraptor
Copy link
Author

Sorry for taking so long!

Here's an example portion of the TTL that was causing the issue (sorry Wikibase Turtle is... a bit nested):

@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix ontolex: <http://www.w3.org/ns/lemon/ontolex#> .
@prefix dct: <http://purl.org/dc/terms/> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix wikibase: <http://wikiba.se/ontology#> .
@prefix skos: <http://www.w3.org/2004/02/skos/core#> .
@prefix schema: <http://schema.org/> .
@prefix cc: <http://creativecommons.org/ns#> .
@prefix geo: <http://www.opengis.net/ont/geosparql#> .
@prefix prov: <http://www.w3.org/ns/prov#> .
@prefix wd: <http://b73b8432f4ff/entity/> .
@prefix data: <http://b73b8432f4ff/wiki/Special:EntityData/> .
@prefix s: <http://b73b8432f4ff/entity/statement/> .
@prefix ref: <http://b73b8432f4ff/reference/> .
@prefix v: <http://b73b8432f4ff/value/> .
@prefix wdt: <http://b73b8432f4ff/prop/direct/> .
@prefix wdtn: <http://b73b8432f4ff/prop/direct-normalized/> .
@prefix p: <http://b73b8432f4ff/prop/> .
@prefix ps: <http://b73b8432f4ff/prop/statement/> .
@prefix psv: <http://b73b8432f4ff/prop/statement/value/> .
@prefix psn: <http://b73b8432f4ff/prop/statement/value-normalized/> .
@prefix pq: <http://b73b8432f4ff/prop/qualifier/> .
@prefix pqv: <http://b73b8432f4ff/prop/qualifier/value/> .
@prefix pqn: <http://b73b8432f4ff/prop/qualifier/value-normalized/> .
@prefix pr: <http://b73b8432f4ff/prop/reference/> .
@prefix prv: <http://b73b8432f4ff/prop/reference/value/> .
@prefix prn: <http://b73b8432f4ff/prop/reference/value-normalized/> .
@prefix wdno: <http://b73b8432f4ff/prop/novalue/> .

wikibase:Dump a schema:Dataset,
		owl:Ontology ;
	cc:license <http://creativecommons.org/publicdomain/zero/1.0/> ;
	schema:softwareVersion "1.0.0" ;
	schema:dateModified "2024-08-07T14:23:13Z"^^xsd:dateTime ;
	owl:imports <http://wikiba.se/ontology-1.0.owl> .

data:Q11314 a schema:Dataset ;
	schema:about wd:Q11314 ;
	schema:version "32272"^^xsd:integer ;
	schema:dateModified "2023-11-09T14:26:22Z"^^xsd:dateTime ;
	wikibase:statements "8"^^xsd:integer ;
	wikibase:sitelinks "0"^^xsd:integer ;
	wikibase:identifiers "5"^^xsd:integer .

wd:Q11314 a wikibase:Item ;
	wdt:P82 wd:Q2225 ;
	wdt:P3 "Q1398" ;
	wdt:P123 "8194433" ;
	wdt:P122 "0000000430695667" ;
	wdt:P108 "n79014062" ;
	wdt:P107 "PA6801-PA6961" ;
	wdt:P141 "-0069-10-13T00:00:00Z"^^xsd:dateTime ;
	wdt:P142 "-0018-09-19T00:00:00Z"^^xsd:dateTime ;
	p:P82 s:Q11314-de166e33-42a7-5204-c10f-6a277fdfe081 .

s:Q11314-de166e33-42a7-5204-c10f-6a277fdfe081 a wikibase:Statement,
		wikibase:BestRank ;
	wikibase:rank wikibase:NormalRank ;
	ps:P82 wd:Q2225 ;
	pq:P57 "48" .

wd:Q11314 p:P3 s:Q11314-2fc1a84a-428a-d78a-19bd-27c0d1d4edaf .

s:Q11314-2fc1a84a-428a-d78a-19bd-27c0d1d4edaf a wikibase:Statement,
		wikibase:BestRank ;
	wikibase:rank wikibase:NormalRank ;
	ps:P3 "Q1398" .

wd:Q11314 p:P123 s:Q11314-0997c22e-4549-b52f-3c9c-9a26571e07a0 .

s:Q11314-0997c22e-4549-b52f-3c9c-9a26571e07a0 a wikibase:Statement,
		wikibase:BestRank ;
	wikibase:rank wikibase:NormalRank ;
	ps:P123 "8194433" .

wd:Q11314 p:P122 s:Q11314-9831b198-4891-efee-42f5-8a77bef1bac0 .

s:Q11314-9831b198-4891-efee-42f5-8a77bef1bac0 a wikibase:Statement,
		wikibase:BestRank ;
	wikibase:rank wikibase:NormalRank ;
	ps:P122 "0000000430695667" .

wd:Q11314 p:P108 s:Q11314-d401b682-4b7a-4bce-a9bb-fe2c6c425539 .

s:Q11314-d401b682-4b7a-4bce-a9bb-fe2c6c425539 a wikibase:Statement,
		wikibase:BestRank ;
	wikibase:rank wikibase:NormalRank ;
	ps:P108 "n79014062" .

wd:Q11314 p:P107 s:Q11314-fa6964d3-4acc-4fe9-718c-025a64bb0aed .

s:Q11314-fa6964d3-4acc-4fe9-718c-025a64bb0aed a wikibase:Statement,
		wikibase:BestRank ;
	wikibase:rank wikibase:NormalRank ;
	ps:P107 "PA6801-PA6961" .

wd:Q11314 p:P141 s:Q11314-1929b796-4cbb-57fb-c952-4fdb537110a2 .

s:Q11314-1929b796-4cbb-57fb-c952-4fdb537110a2 a wikibase:Statement,
		wikibase:BestRank ;
	wikibase:rank wikibase:NormalRank ;
	ps:P141 "-0069-10-13T00:00:00Z"^^xsd:dateTime ;
	psv:P141 v:9956a3176c50e5f372b2805522b9f235 ;
	prov:wasDerivedFrom ref:07354354b93c0850a770a6e5ac4c2595f1292a8b .

wd:Q11314 p:P142 s:Q11314-92587d20-49ef-1683-95f8-3f8e331166f9 .

s:Q11314-92587d20-49ef-1683-95f8-3f8e331166f9 a wikibase:Statement,
		wikibase:BestRank ;
	wikibase:rank wikibase:NormalRank ;
	ps:P142 "-0018-09-19T00:00:00Z"^^xsd:dateTime ;
	psv:P142 v:834791bd6aa770755041b4306c4fa39a ;
	prov:wasDerivedFrom ref:07354354b93c0850a770a6e5ac4c2595f1292a8b .

wd:Q11314 rdfs:label "Virgil"@en ;
	skos:prefLabel "Virgil"@en ;
	schema:name "Virgil"@en ;
	skos:altLabel "Virgil (Ancient Roman poet of the Augustan period)"@en,
		"Virgil (Ancient Roman poet, 70-19 BCE)"@en,
		"Virgil, 70 B.C.-19 B.C."@en .

v:9956a3176c50e5f372b2805522b9f235 a wikibase:TimeValue ;
	wikibase:timeValue "-0069-10-13T00:00:00Z"^^xsd:dateTime ;
	wikibase:timePrecision "11"^^xsd:integer ;
	wikibase:timeTimezone "0"^^xsd:integer ;
	wikibase:timeCalendarModel <http://www.wikidata.org/entity/Q1985786> .

v:834791bd6aa770755041b4306c4fa39a a wikibase:TimeValue ;
	wikibase:timeValue "-0018-09-19T00:00:00Z"^^xsd:dateTime ;
	wikibase:timePrecision "11"^^xsd:integer ;
	wikibase:timeTimezone "0"^^xsd:integer ;
	wikibase:timeCalendarModel <http://www.wikidata.org/entity/Q1985786> 

The time value causing the issue is wikibase:timeValue "-0018-09-19T00:00:00Z"^^xsd:dateTime ;.

The code itself is a bit long (it's been a while since I've tested this case so I need to dig through it), but it essentially deconstructed the file into triples and loaded them into the graph. I'll do more checking on this this week and get back to you as soon as I can!

@ageorgou
Copy link
Contributor

Sorry to jump in or state something obvious but I happened to be looking into this a bit. In case it helps:

  • There are two different datatypes in use here: @Superraptor's example uses xsd:dateTime, while @nicholascar's uses xsd:dateTimeStamp
  • The second one, while not raising an error, doesn't seem to parse the value either:
    d_neg = Literal("-0028-08-10T00:00:00Z", datatype=XSD.dateTimeStamp)
    d_neg.value # --> None 
  • Actually, there doesn't seem to be any code for creating literals of type xsd:dateTimeStamp:
    d_pos = Literal("2028-08-10T00:00:00Z", datatype=XSD.dateTimeStamp)
    d_pos.value # --> None
    d_pos.toPython() # --> the literal itself
    whereas with xsd:dateTime:
    d_pos = Literal("2028-08-10T00:00:00Z", datatype=XSD.dateTime)
    d_pos.value # --> datetime.datetime(2028, 8, 10, 0, 0, tzinfo=<...>)
    d_pos.toPython() # --> the datetime value as above

Negative years (BCE dates) are not supported in either the Python standard library's datetime or in isodate, which rdflib uses. This issue has been reported before (#2210, #2321 at least) but my guess is that the patchy support for BCE dates across Python libraries must make it hard to address.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
awaiting feedback More feedback is needed from the author of the PR or Issue.
Projects
None yet
Development

No branches or pull requests

3 participants