An attempt to specify a formal grammar for Org syntax.
The grammars lives in laundry/grammar. The top level is in org.rkt.
The grammars are implemented using Racket’s #lang brag. \
The details of the implementation are in the comments.
For an overview of the approach see ./docs/overview.org.
Install Racket for your platform.
Laundry is available from the Racket package catalog and can be installed via
raco pkg install org laundry
To install the development version you can run the following from the location of this readme.
raco pkg install laundry/ laundry-test/ org/ org-tools/
Once everything is installed you can run the tests by invoking the following in the directory of this readme.
raco test laundry laundry-test org org-tools
You can also parse individual Org files using ./laundry-test/cli.rkt.
laundry-test/cli.rkt docs/thoughts.org laundry-test/test.org
Laundry can parse most of Org syntax, though there are still issues with the correctness of the parse in a number of cases.
In particular there are a number of edge cases in the interaction between the syntax for various Org objects that have not been resolved.
The primary objective of this work is to provide a reference grammar and implementation that can be used to test other implementations of org mode. The elisp implementation has inconsistent behavior in a number of critical areas, such as the parsing of headline tags. A reference grammar will help to elucidate the issues.
This work was originally inspired by a desire to regularize the behavior of Org babel, however the lack of a consistent reference for the grammar of Org as a whole turned out to be a significant stumbling block due to the fact that Org babel interacts with nearly every aspect of Org. Therefore work to standardize Org syntax became a priority.
The long term goal of this work is to provide a reference that can be used to standardize Org syntax and behavior and to specify various levels of compliance for an implementation of Org mode.
Until recently the performance of the parser was exceptionally bad due
to some hidden quadratic behavior in Racket parser-tools
. To work
around this issue the tokenizer has been made more complex, so sometimes
you may find that there is an opaque token appearing in the grammar
that obscures some of the details. In most cases the full grammar
is retained, but unless it has been tested with the less complex and
slower tokenizer the full grammar may be incorrect.
The current terminology used in the grammars matches the terminology
used in Org syntax. This is expected to change so that, for example
terms such as headline
will be renamed to heading
. This has not
been done to simplify comparison to the existing Org syntax document
during early development. As such the names of the elements of the
parse tree should not be considered to be stable.
Change [debug #f]
on line 32 of br-parser-tools-lib
yacc.rkt to
point to a file (e.g. [debug "/tmp/parser-debug-output.rktd"]
) and then
when the parser fails you can check that file for a dump of the full
parser state at the time of failure.
Unfortunately for some reason providing that debug value does not result in the behavior that is articulated in the documentation, and on error the stack is not printed. You can go in and tamper inside the parsing loop of the same file and sort of get a view into what is going on.
A grammar for org syntax written in Clojure by the Organice team. https://github.com/200ok-ch/org-parser/blob/master/resources/org.ebnf https://github.com/200ok-ch/org-parser