Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: Pragmas #10

Open
wants to merge 163 commits into
base: master
Choose a base branch
from
Open

Proposal: Pragmas #10

wants to merge 163 commits into from

Conversation

yamahito
Copy link
Contributor

@yamahito yamahito commented Oct 26, 2021

This is a draft proposal for adding pragmas to the invisible XML Grammar.

Details of the proposal are in misc/pragmas.md, with changes to the invisible XML made to the grammar.

as a basis for demonstrating how to use github and PRs to collaborate on a proposal.
@yamahito yamahito requested review from spemberton and cmsmcq October 26, 2021 13:51
ixml.ixml Outdated Show resolved Hide resolved
ixml.ixml Outdated Show resolved Hide resolved
ixml-specification.html Outdated Show resolved Hide resolved
ixml.ixml Outdated Show resolved Hide resolved
Add missing close quotation for eg namespace in example of pragmas
@ndw
Copy link
Contributor

ndw commented Nov 10, 2021

On the subject of syntax, I'll just observe that the compact syntax for Relax NG uses square brackets as the delimiter, with considerable flexibility about what can go in the annotation. You might, in this style, cast Michael's green/blue example like this:

[
  my:color='green'
]
num: digit+, fractional-part?.

[
  my:color='blue'
]
var: letter+.

@cmsmcq
Copy link
Contributor

cmsmcq commented Nov 10, 2021

I like @ndw's syntax idea; it also has the consequence that pragmas can nest without contortions in the surface syntax.

The basic principle of XQuery pragmas seems sound: a pragma P is always followed by an expression E (bracketed, in XQuery), with the meaning 'if you understand P, do what you need to do with P and E; if you don't understand P, evaluate E normally'. It's not so dramatic and not so unitary in XSLT, but there is a similar pairing of things a processor may or may not understand and identifiable things in standard syntax: if you don't know how to evaluate this thing from v.Future, evaluate this xsl:fallback element.

I am coming to think there are three use cases, syntactically speaking, which may require some thought to handle gracefully with a single mechanism (or what feels to the user like a single mechanism):

  • relative simple annotations on a terminal or nonterminal, the kind that would feel natural as a namespaced attribute on the XML element for the symbol (nonterminal, literal, inclusion, exclusion). These are well served by something like the current draft (with current syntax or with [...]).

  • heavier annotations whose fallback expression would be empty (or just {}). In an attribute grammar, a specification that the value of the E in the LHS is the sum of the values of the E on the right-hand side and the T on the right-hand side would naturally come at the end of the rule (as it normally does in yacc and similar tools, and in most attribute-grammar systems I've looked at), like this:

      E: E, '+', T
          [my:atts E_0.value := E_1.value + T.value ]
          .
    

    A guard checking that the inherited N attribute is a number greater than zero might feel more natural coming at the beginning of the RHS than at the end, but either would probably work:

      foo: [my:guards foo.N > 0 ] bar; baz?.
      { or }
      foo:  bar; baz? [my:guard foo.N > 0 ].
    

    As the examples illustrate, these feel most natural to me when they appear on a RHS roughly where a comment might appear, or at the very end. (If we allow them wherever a comment may appear, they will also appear anywhere the light-weight annotation on a nonterminal or terminal may appear; bad idea, I think.)

    In the XML form of a grammar, the natural representation of these cases would (it seems to me) be an extension element within the final s of a rule:

      <rule name="foo">
          <alt><nonterminal name="bar"/></alt>
          <alt><option><nonterminal name="baz"/></option></alt>
          <my:guard>foo.N > 0</my:guard>
      </rule>
    

    Or if getting the element to carry the name my:guard is too complicated, the pragma could turn into <pragma name="my:guard">foo.N > 0</pragma>.

  • pragmas that basically involve replacing a standard rule (or set of rules) with an alternative formulation. If for various reasons I wanted to use a different formulation of the rule for comment, in which a run of non-brace characters is marked as a token, meaning it can be read in one fell swoop without making an Earley item for every character, but I wanted to retain the specified grammar rule instead of just changing it in a local copy, I'd like to be able to write something like:

      [my:altform
          comment:  -'{', (cchars; comment)*, '}'.
          [my:token] cchars: cchar+.
      ] 
      comment: -"{", (cchar; comment)*, -"}".
    

    Meaning: replace the rule for comment with the two rules for comment and cchars shown, and in the rule for cchars, note that cchars can be treated as an atomic token and recognized with a greedy match.

    Note that this syntax is already supported by the current draft proposal, but this is not something that should be an attribute on the nonterminal for comment; it should be an extension element appearing between rules. I am not quite sure how to have this co-exist with the light-weight form.

For discussion purposes, I think this nets out to the following proposal, which has taken form in my mind as I wrote this reply to Norm:

  • There are two forms of pragma: light-weight and heavy-weight:

      pragma: '[', QName, s, pragma-data, s?, ']'.
      Pragma:  '[[', QName, s, Pragma-data, s?, ']]'.
    
  • Lightweight pragmas are annotations on nonterminals and terminals, and turn into attributes in the XML representation. They are allowed before the serialization mark. Perhaps the simplest change to the grammar is thus to replace mark and tmark in right-hand-sides with annotation and tannotation and then have

      -annotation: (pragma, s?)?, mark.
      -tannotation: (pragma, s?)?, tmark.
    

    I don't know how we mark the rule for pragma to signal that it should turn into an attribute whose expanded name is the expanded name given by the QName and whose value is the value of pragma-data. Further thought needed.

  • Heavyweight Pragmas are additional constructs skipped over by processors that don't understand them, and translate into extension elements in the XML. They are allowed between rules and just before the final full stop of a rule.

      ixml: S2, rule+S2, S2.
      S2:  (whitespace+; comment; Pragma)*.
      rule: (annotation, S)?, name, S, ["=:"], S, -alts, S2, ".".
    

    Or allowed between rules and anywhere on the RHS of a rule.

      rule: (annotation, S)?, name, S, ["=:"], S2, -alts, S2, ".".
    
  • When pragmas occur before a rule or on the LHS of a rule, the 'fallback expression' for them is the rule as a whole. That means the effect of a pragma may in principle be to cause a different interpretation of the rule (or the replacement of the rule with something in the pragma. Heavyweight pragmas occurring at the end of a rule have no 'fallback expression'; they are just ignored by processors that don't understand the QName. If heavyweight pragmas are allowed inside a RHS, the fallback expression is the one immediately following the pragma.

Draft document to summarize pragma issues (and help MSM stop forgetting what we've already discussed).
Provide a bit more prose.
I hate Markdown.
Some copy edits and begin a new example
misc/pragmas.md Outdated Show resolved Hide resolved
misc/pragmas.md Outdated Show resolved Hide resolved
misc/pragmas.md Outdated Show resolved Hide resolved
misc/pragmas.md Outdated Show resolved Hide resolved
misc/pragmas.md Outdated Show resolved Hide resolved
misc/pragmas.md Outdated Show resolved Hide resolved
misc/pragmas.md Outdated Show resolved Hide resolved
misc/pragmas.md Show resolved Hide resolved
misc/pragmas.md Outdated Show resolved Hide resolved
misc/pragmas.md Outdated Show resolved Hide resolved
misc/pragmas.md Outdated Show resolved Hide resolved
misc/pragmas.md Outdated Show resolved Hide resolved
misc/pragmas.md Outdated Show resolved Hide resolved
misc/pragmas.md Show resolved Hide resolved
misc/pragmas.md Outdated Show resolved Hide resolved
misc/pragmas.md Outdated Show resolved Hide resolved
misc/pragmas.md Outdated Show resolved Hide resolved
Copy edit, revise proposal to allow grammar-attached pragmas, revise namespaces example, supply renaming and rewriting examples, finish tokenization example, add lists of open issues and decisions for the group.
Add an open issue
Add an open issue and a decision for the group
One more copy-editing pass
cmsmcq added 26 commits March 22, 2022 11:52
I hate MarkDown.
one more copy-editing pass (but slacked off at the end)
Minor copy edits
Small copy-edit
Delete the bits that say this is not yet finished.  It's as finished as it is going to get.
First version of this abbreviated proposal
Revised (and made longer, sigh)
Add a claim to adequacy. Also a revision date.
The pragma-data element is not always required.
Markdown claims another victim.
More copy editing
Add status note
Further copy edits, typo corrections.
Markdown is so simple and intuitive!  It's perfectly obvious how to embed a code block within a paragraph within a list item.  I cannot understand how anyone could fail to prefer Markdown to systems with explicit markup.  Who can understand those?
Copy edits
Correct the revision date.
@ndw ndw added the feature A new feature label Apr 3, 2022
@ndw ndw added this to the Version 1.0 milestone Apr 3, 2022
@ndw ndw modified the milestones: Version 1.0, Version V.next Apr 11, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature A new feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants