Proposal: Pragmas #10

yamahito · 2021-10-26T13:50:54Z

This is a ~~draft~~ proposal for adding pragmas to the invisible XML Grammar.

Details of the proposal are in misc/pragmas.md, with changes to the invisible XML made to the grammar.

as a basis for demonstrating how to use github and PRs to collaborate on a proposal.

ixml.ixml

ixml-specification.html

ixml.ixml

Add missing close quotation for eg namespace in example of pragmas

ndw · 2021-11-10T09:32:09Z

On the subject of syntax, I'll just observe that the compact syntax for Relax NG uses square brackets as the delimiter, with considerable flexibility about what can go in the annotation. You might, in this style, cast Michael's green/blue example like this:

[
  my:color='green'
]
num: digit+, fractional-part?.

[
  my:color='blue'
]
var: letter+.

cmsmcq · 2021-11-10T17:54:31Z

I like @ndw's syntax idea; it also has the consequence that pragmas can nest without contortions in the surface syntax.

The basic principle of XQuery pragmas seems sound: a pragma P is always followed by an expression E (bracketed, in XQuery), with the meaning 'if you understand P, do what you need to do with P and E; if you don't understand P, evaluate E normally'. It's not so dramatic and not so unitary in XSLT, but there is a similar pairing of things a processor may or may not understand and identifiable things in standard syntax: if you don't know how to evaluate this thing from v.Future, evaluate this xsl:fallback element.

I am coming to think there are three use cases, syntactically speaking, which may require some thought to handle gracefully with a single mechanism (or what feels to the user like a single mechanism):

relative simple annotations on a terminal or nonterminal, the kind that would feel natural as a namespaced attribute on the XML element for the symbol (nonterminal, literal, inclusion, exclusion). These are well served by something like the current draft (with current syntax or with [...]).
heavier annotations whose fallback expression would be empty (or just {}). In an attribute grammar, a specification that the value of the E in the LHS is the sum of the values of the E on the right-hand side and the T on the right-hand side would naturally come at the end of the rule (as it normally does in yacc and similar tools, and in most attribute-grammar systems I've looked at), like this:
```
  E: E, '+', T
      [my:atts E_0.value := E_1.value + T.value ]
      .
```
A guard checking that the inherited N attribute is a number greater than zero might feel more natural coming at the beginning of the RHS than at the end, but either would probably work:
```
  foo: [my:guards foo.N > 0 ] bar; baz?.
  { or }
  foo:  bar; baz? [my:guard foo.N > 0 ].
```
As the examples illustrate, these feel most natural to me when they appear on a RHS roughly where a comment might appear, or at the very end. (If we allow them wherever a comment may appear, they will also appear anywhere the light-weight annotation on a nonterminal or terminal may appear; bad idea, I think.)

In the XML form of a grammar, the natural representation of these cases would (it seems to me) be an extension element within the final s of a rule:
```
  <rule name="foo">
      <alt><nonterminal name="bar"/></alt>
      <alt><option><nonterminal name="baz"/></option></alt>
      <my:guard>foo.N > 0</my:guard>
  </rule>
```
Or if getting the element to carry the name my:guard is too complicated, the pragma could turn into <pragma name="my:guard">foo.N > 0</pragma>.
pragmas that basically involve replacing a standard rule (or set of rules) with an alternative formulation. If for various reasons I wanted to use a different formulation of the rule for comment, in which a run of non-brace characters is marked as a token, meaning it can be read in one fell swoop without making an Earley item for every character, but I wanted to retain the specified grammar rule instead of just changing it in a local copy, I'd like to be able to write something like:
```
  [my:altform
      comment:  -'{', (cchars; comment)*, '}'.
      [my:token] cchars: cchar+.
  ] 
  comment: -"{", (cchar; comment)*, -"}".
```
Meaning: replace the rule for comment with the two rules for comment and cchars shown, and in the rule for cchars, note that cchars can be treated as an atomic token and recognized with a greedy match.

Note that this syntax is already supported by the current draft proposal, but this is not something that should be an attribute on the nonterminal for comment; it should be an extension element appearing between rules. I am not quite sure how to have this co-exist with the light-weight form.

For discussion purposes, I think this nets out to the following proposal, which has taken form in my mind as I wrote this reply to Norm:

There are two forms of pragma: light-weight and heavy-weight:

  pragma: '[', QName, s, pragma-data, s?, ']'.
  Pragma:  '[[', QName, s, Pragma-data, s?, ']]'.

Lightweight pragmas are annotations on nonterminals and terminals, and turn into attributes in the XML representation. They are allowed before the serialization mark. Perhaps the simplest change to the grammar is thus to replace mark and tmark in right-hand-sides with annotation and tannotation and then have
```
  -annotation: (pragma, s?)?, mark.
  -tannotation: (pragma, s?)?, tmark.
```
I don't know how we mark the rule for pragma to signal that it should turn into an attribute whose expanded name is the expanded name given by the QName and whose value is the value of pragma-data. Further thought needed.
Heavyweight Pragmas are additional constructs skipped over by processors that don't understand them, and translate into extension elements in the XML. They are allowed between rules and just before the final full stop of a rule.
```
  ixml: S2, rule+S2, S2.
  S2:  (whitespace+; comment; Pragma)*.
  rule: (annotation, S)?, name, S, ["=:"], S, -alts, S2, ".".
```
Or allowed between rules and anywhere on the RHS of a rule.
```
  rule: (annotation, S)?, name, S, ["=:"], S2, -alts, S2, ".".
```
When pragmas occur before a rule or on the LHS of a rule, the 'fallback expression' for them is the rule as a whole. That means the effect of a pragma may in principle be to cause a different interpretation of the rule (or the replacement of the rule with something in the pragma. Heavyweight pragmas occurring at the end of a rule have no 'fallback expression'; they are just ignored by processors that don't understand the QName. If heavyweight pragmas are allowed inside a RHS, the fallback expression is the one immediately following the pragma.

Draft document to summarize pragma issues (and help MSM stop forgetting what we've already discussed).

Provide a bit more prose.

I hate Markdown.

Some copy edits and begin a new example

misc/pragmas.md

Copy edit, revise proposal to allow grammar-attached pragmas, revise namespaces example, supply renaming and rewriting examples, finish tokenization example, add lists of open issues and decisions for the group.

Add an open issue

Add an open issue and a decision for the group

One more copy-editing pass

I hate MarkDown.

one more copy-editing pass (but slacked off at the end)

Minor copy edits

Prose revision for the prolog.

Small copy-edit

Delete the bits that say this is not yet finished. It's as finished as it is going to get.

First version of this abbreviated proposal

Revised (and made longer, sigh)

Add a claim to adequacy. Also a revision date.

The pragma-data element is not always required.

Markdown claims another victim.

More copy editing

Add status note

Further copy edits, typo corrections.

Markdown is so simple and intuitive! It's perfectly obvious how to embed a code block within a paragraph within a list item. I cannot understand how anyone could fail to prefer Markdown to systems with explicit markup. Who can understand those?

Copy edits

Correct the revision date.

… into proposal-pragmas

Added some pragma basics

319faec

as a basis for demonstrating how to use github and PRs to collaborate on a proposal.

yamahito requested review from spemberton and cmsmcq October 26, 2021 13:51

yamahito commented Oct 26, 2021

View reviewed changes

ixml.ixml Outdated Show resolved Hide resolved

ixml.ixml Outdated Show resolved Hide resolved

yamahito added 2 commits October 27, 2021 19:18

Adds some more suggested detail for pragma implementation

988bab4

Just adds some whitespace for readability

94442ea

yamahito commented Oct 27, 2021

View reviewed changes

ixml-specification.html Outdated Show resolved Hide resolved

cmsmcq reviewed Oct 31, 2021

View reviewed changes

ixml.ixml Outdated Show resolved Hide resolved

Correct typo in example

7eb74b7

Add missing close quotation for eg namespace in example of pragmas

cmsmcq added 4 commits November 16, 2021 09:39

add pragmas.md

1e138d1

Draft document to summarize pragma issues (and help MSM stop forgetting what we've already discussed).

Update pragmas.md

468f8ab

Provide a bit more prose.

Update pragmas.md

f5e0b74

I hate Markdown.

Update pragmas.md

7db2f74

Some copy edits and begin a new example