-
-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEEDBACK] Message Format Unquoted Literals #724
Comments
We should consider this in a severely timeboxed way. Bear in mind design, which is not directly "on the nose" to this request. Note that unquoted literals appear in other places than in keys. We previously reserved a bunch of the ASCII punctuation (which is the main consideration here) for future use via For example, one of the characters not listed above is On the other hand, square brackets and parens seems potentially useful as do some of the other junk. |
This won't go in 46.1, so I'm going to change the labels. I am also adding resolve-candidate because I think we won't extend unquoted, but that's for the WG to decide. |
One comment; broadening can be done in future versions, since it would be backwards compatible. |
Broadening can be done, so long as it is done in a backwards-compatible way. It's a little tricky here, because of the uses of literals in the syntax. I haven't carefully reviewed the proposal recently enough to say one way or the other if there are sticky bits that I'd object to. I won't say that we would never do an extension (never is a long time), but I think it unlikely in the 2.0 timeframe (e.g. 46.1/47) |
I would like to have a discussion on this in January. |
I will add this to a meeting agenda in January. The WG asked me to close this last week, as the consensus was that we would not revisit the syntax in the 46.1 timeframe and didn't feel that our previous discussions about text literals would change later. Our usual process would be to create a design doc (the onus is on the person proposing a change to document the proposal and compare/contrast it with the current design and other options), but let's discuss first. |
I would like at least a clear statement in v47 that the literal syntax could be expanded in future versions to include more characters than just what are in |
The literal syntax allows nearly any Unicode character. Do you mean |
Yes, I meant |
I just cleaned up the description a bit.
@eemeli , any comments? |
We also need to reserve at least I would be much more comfortable with |
Strictly speaking, Reserving '@' is not necessary, because syntactically it can't cause ambiguity in the syntax (an attribute can't be in the same position as a literal). Same for '#' and '='. The '=' because it is required that a literal be separated from an attribute or option identifier by a =. So with '/' on the other hand can cause ambiguity in the syntax, because of the following: Disallowing (, ), [, and ] would be unfortunate, since they are the natural characters for open/closed ranges. I don't see any math use for ' or ", so no particular reason to allow them in unquoted literals. However, I admit it would be simpler if the only ASCII characters allowed in unquoted literals were [A-Za-z0-9-+_.]. |
If If
Yes, and we need to keep in mind that we should not presume that people look at MF2 often. Not confusing readers is a high priority. |
We'll also need to exclude |
proposal:
These are all immutable (Unicode Character Encoding Stability). This also disallows the noncharacters that XML didn’t know about yet, before the noncharacter property was made immutable.
This could be done by just disallowing the “s” production characters, but that could be very confusing. {a b} looks too much like two items (the space is an A0 NO-BREAK SPACE). So it should be broadened to the Unicode Whitespace characters. Unicode Whitespace is not guaranteed immutable, but has not changed for over a decade. Anyway, we would derive the code points as of now, so everything would be stable into the future. |
I just made a census of characters inside and outside of name_char:
|
So if we include just the ASCII that unquoted-literal contains now (without bothering with the number syntax), and regularize it, we'd get:
That is,
That would be result in the following.
|
Summary
Consider relaxing constraints on literals, after v45
Background
Right now, unquoted literals are fairly narrowly constrained by
message.abnf
; here are the relevant lines:
Reason for reconsidering
However, for functions outside of the standard registry, this forces
many natural literals to use quotes. Here is an example from a function
that would handle MF1’s choice format:
The natural literals to use would be intervals, which use [,(,),]
characters for ranges (the choice format would require some recasting
because it depends on ordering of variants. It currently uses >.) So
that would require
Many Unicode symbols are included by XML’s NT-NCName (about 6,000
currently), while many are excluded (about 2,600 currently). But these
are literals, not identifiers, which is what name is
intended for. By expanding beyond identifier usage, it allows functions
to avoid requiring quoting in many cases. It also allows us to dispense
with the special formulation for number-literal.
The literals for number, date, etc could be specified elsewhere, but
wouldn’t have to be in the ABNF.
That would allow for various registries to have more sophisticated
literal without requiring quoting, and without privileging the
structured literals that we know about now.
Requirements
So, what restrictions on characters for a broadened definition of
unquoted literals would be required by a revised ABNF?
No ‘}’, because it would make .local $x = {literal} fail.
No ‘|’, because an initial one would conflict with quoting
No ‘:' or '$', because an initial one would indicate a function or variable, which would conflict in expressions starting with one (and initial '$' would conflict in the value of an option).
No ‘{’. Not strictly required, but for clarity best to always forbid.
None of the big blocks of ‘strange’ code points that XML forbids: controls, (unpaired) surrogates, private-use, noncharacters.
These are all immutable (Unicode Character Encoding Stability).
This also disallows the noncharacters that XML didn’t know about yet, before the noncharacter property was made immutable.
No whitespace, since variant uses that for separators between keys, and expressions use it to separate various components.
This could be done by just disallowing the “s” production characters, but that could be very confusing. {a b} looks too much like two items (the space is an A0 NO-BREAK SPACE). So it should be broadened to the Unicode Whitespace characters.
Unicode Whitespace is not guaranteed immutable, but has not changed for over a decade. Anyway, we would derive the code points as of now, so everything would be stable into the future.
Detailed Proposal
This would result in the following change:
OLD
NEW
This changes just the first line above,
unquoted-literal = name / number-literal
— the rest of the above would remain the same.Needed to avoid syntax conflicts
Whitespace
Controls
Surrogates
Private Use
Noncharacters
Notes
The text was updated successfully, but these errors were encountered: