Parser combinators are a composable way of building parsers in code; they stand in contrast to traditional parser generators like goyacc. They work particularly well in Go now that Go has generics, and they are remarkably easy to implement from scratch. This repo contains a simple parser combinator implementation, and an example of using that to build a parser for a simple configuration language.
I wrote it principally for a presentation at GopherCon 2022 (video) (slides) (session announcement), in which my goal was to do a quick introduction to parser combinators, show how to use a few primitives to implement a parser for a microlanguage, and show how to implement all the primitives -- each in just a few lines of Go.
Feel free to grab and use this -- just copy the code so you can modify it to suit your ends, and keep the copyright around somewhere.
The parser API is very loosely inspired by the parser package for the Elm language.
If you want to get more familiar with this implementation of parser combinators, here are a few exercises to try out.
-
Add a
Parser[Empty]
namedEnd
which succeeds only when you have no more input remaining. Remove the check for remaining input in theParse
function and modify the example grammar to useEnd
to ensure no input remains. -
Add a function to the
parser
package calledLookahead
that takes aParser[T]
as an argument, and returns aParser[T]
which returns the same value as the input parser, but without consuming any input -- in other words, it looks to see if the argument parser matches the upcoming input but doesn't actually consume that input. -
Extend the
state
implementation inparser
to track line and column numbers, and add a parser function calledGetPosition
that returns the current line and column numbers, while consuming no input. (This could be used in sequences, say, to get text positions.)
-
As written,
OneOf
always backtracks. Add a functionCommit
that transforms its argument parser such that if it has an error,OneOf
will immediately fail instead of continuing to try more parsers. Here's a a rough example, where the idea is that after seeing a{
if this parser fails, a OneOf containing it should not proceed to try others.myCodeParser := AndThen( StartSkipping(Exactly("{")), func (Empty) Parser[MyCodeBody] { return AndThen(Commit(myCodeBodyParser), func (body MyCodeBody) Parser[MyCodeBody] { return Map(Exactly("}"), func(Empty) MyCodeBody { return body } ) }
Hint: You'll have to modify
OneOf
to make this work. The interactions between backtracking control and sequencing (withAndThen
or with the special sequencing forms) also bears thinking about.
Because our combinators are implemented as functions, if we use the debugger to analyze mid-parse, we just see a stack of closures.
-
Approach A: Add a payload data type to
Parser
, so it'sParser[T any, D any]
. Write aWithData
wrapper function that takes a parser and payload data and returns a parser in which incomingstate
will contain the new payload data when the underlying parser is run. Add aGetData
parser that, when run, returns the current payload data. This will enable you to provide contextual data (e.g. in printfs or errors from your parsing.) -
Approach B: replace the functions with interfaces. Change the type of
Parser
fromfunc (state)...
totype Parser[T any] interface { parse(state) (T, state, error) }
Now modify all of the parser-generating and combining functions to return various structs implementing the interface. The debugger should show a stack of method calls on specific struct types now. Is it more clear? (Let me know, I haven't yet tried this myself.)
I'd be very grateful for any feedback, comments, or questions -- DM me, or opoen an issue.
I am not accepting PRs to this repo, please don't submit them, I will just close them.