Skip to content

Latest commit

 

History

History
93 lines (81 loc) · 4.27 KB

README.org

File metadata and controls

93 lines (81 loc) · 4.27 KB

Laundry: Org mode for Racket

An attempt to specify a formal grammar for Org syntax.

The grammars lives in laundry/grammar. The top level is in org.rkt.
The grammars are implemented using Racket’s #lang brag. \ The details of the implementation are in the comments.

For an overview of the approach see ./docs/overview.org.

Getting started

Install Racket for your platform.

Laundry is available from the Racket package catalog and can be installed via

raco pkg install org laundry

To install the development version you can run the following from the location of this readme.

raco pkg install laundry/ laundry-test/ org/ org-tools/

Once everything is installed you can run the tests by invoking the following in the directory of this readme.

raco test laundry laundry-test org org-tools

You can also parse individual Org files using ./laundry-test/cli.rkt.

laundry-test/cli.rkt docs/thoughts.org laundry-test/test.org

Status

Laundry can parse most of Org syntax, though there are still issues with the correctness of the parse in a number of cases.

In particular there are a number of edge cases in the interaction between the syntax for various Org objects that have not been resolved.

Objectives

The primary objective of this work is to provide a reference grammar and implementation that can be used to test other implementations of org mode. The elisp implementation has inconsistent behavior in a number of critical areas, such as the parsing of headline tags. A reference grammar will help to elucidate the issues.

This work was originally inspired by a desire to regularize the behavior of Org babel, however the lack of a consistent reference for the grammar of Org as a whole turned out to be a significant stumbling block due to the fact that Org babel interacts with nearly every aspect of Org. Therefore work to standardize Org syntax became a priority.

The long term goal of this work is to provide a reference that can be used to standardize Org syntax and behavior and to specify various levels of compliance for an implementation of Org mode.

Performance

Until recently the performance of the parser was exceptionally bad due to some hidden quadratic behavior in Racket parser-tools. To work around this issue the tokenizer has been made more complex, so sometimes you may find that there is an opaque token appearing in the grammar that obscures some of the details. In most cases the full grammar is retained, but unless it has been tested with the less complex and slower tokenizer the full grammar may be incorrect.

A note on terminology

The current terminology used in the grammars matches the terminology used in Org syntax. This is expected to change so that, for example terms such as headline will be renamed to heading. This has not been done to simplify comparison to the existing Org syntax document during early development. As such the names of the elements of the parse tree should not be considered to be stable.

Debugging

Change [debug #f] on line 32 of br-parser-tools-lib yacc.rkt to point to a file (e.g. [debug "/tmp/parser-debug-output.rktd"]) and then when the parser fails you can check that file for a dump of the full parser state at the time of failure.

Unfortunately for some reason providing that debug value does not result in the behavior that is articulated in the documentation, and on error the stack is not printed. You can go in and tamper inside the parsing loop of the same file and sort of get a view into what is going on.

Similar projects

A grammar for org syntax written in Clojure by the Organice team. https://github.com/200ok-ch/org-parser/blob/master/resources/org.ebnf https://github.com/200ok-ch/org-parser

https://alphapapa.github.io/org-almanac/#Parsing