From 3eeba16f3721bde687ddcbdf55d0d4da4ee91479 Mon Sep 17 00:00:00 2001
From: Gregg Kellogg
Date: Thu, 11 Jan 2024 09:50:47 -0800
Subject: [PATCH] Case normalization of language tags. (#74)
* Case normalization of language tags. Fixes #55.
---------
Co-authored-by: Ted Thibodeau Jr
Co-authored-by: Andy Seaborne
---
spec/index.html | 22 ++++++++++++++++++----
1 file changed, 18 insertions(+), 4 deletions(-)
diff --git a/spec/index.html b/spec/index.html
index bd59bc3..d40c513 100644
--- a/spec/index.html
+++ b/spec/index.html
@@ -718,7 +718,9 @@ Literals
non-empty language tag as defined by [[!BCP47]]. The
language tag MUST be well-formed according to
section 2.2.9
- of [[!BCP47]].
+ of [[!BCP47]],
+ and MUST be treated consistently, that is, in a case insensitive manner.
+ Two language tags are the same if they only differ by case.
if and only if the datatype IRI is
http://www.w3.org/1999/02/22-rdf-syntax-ns#dirLangString
,
a non-empty language tag
@@ -729,9 +731,10 @@ Literals
A literal is a language-tagged string if the third element
is present and the fourth element is not present.
- Lexical representations of language tags MAY be converted
- to lower case.
- The value of language tags is always treated as being in lower case.
+ Lexical representations of language tags
+ MAY be case normalized,
+ (for example, by converting to lower case).
+
A literal is a directional language-tagged string
if both the third element and fourth elements are present.
@@ -1813,6 +1816,17 @@
Changes between RDF 1.1 and RDF 1.2
Minor edit
to improve the example about distinguishing literals, IRIs, and blank nodes
in .
+ Implementations were previously allowed to normalize language tags to lower case,
+ which made it ambiguous whether two literals with language tags
+ that differed only by case represented the same literal,
+ or distinct literals.
+ RDF 1.2 requires that language tags be case-insensitively unique
+ but does not specify the common formatting to be used.
+ Two literals with the same lexical form and language tags that differ only by case
+ are the same literal.
+ Implementations can either follow the advice to normalize to lower case,
+ use the recommended BCP47 format,
+ or do something else, as long it is performed consistently.
A detailed overview of the differences between RDF versions 1.0