-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
(Helpful) Error messages #39
Comments
I agree, and I think better error messages are the thing most needed by this project. If you use the packrat parser, you will get close to what you are asking with the >>> pe.match('"a"+ "b"+', "aacc", flags=pe.STRICT)
Traceback (most recent call last):
[...]
pe._errors.ParseError: ParseError: failed to parse; use memoization for more details Also use the >>> pe.match('"a"+ "b"+', "aacc", flags=pe.STRICT|pe.MEMOIZE)
Traceback (most recent call last):
[...]
pe._errors.ParseError:
line 0, character 0
aacc
^
ParseError: `(?=(?P<_1>a+))(?P=_1)(?=(?P<_2>b+))(?P=_2)` The next issue is that the patterns are optimized, which means:
To overcome this, you will need to first compile the grammar and then match, because the >>> p = pe.compile('"a"+ "b"+', parser="packrat", flags=pe.NONE)
>>> p.match("aacc", flags=pe.STRICT|pe.MEMOIZE)
Traceback (most recent call last):
[...]
pe._errors.ParseError:
line 0, character 2
aacc
^
ParseError: `a`, `b` Another strategy is to use the >>> p = pe.compile('"a"+ "b"+', parser="packrat", flags=pe.DEBUG)
## Grammar ##
Start <- "a"+ "b"+
>>> p.match("aacc", flags=pe.STRICT|pe.MEMOIZE)
aacc | "a"+ "b"+
aacc | "a"+
aacc | "a"
acc | "a"
cc | "a"
cc | "b"+
cc | "b"
Traceback (most recent call last):
[...]
pe._errors.ParseError:
line 0, character 2
aacc
^
ParseError: `a`, `b` In a terminal that supports ANSI colors, you'll see the terminals that failed in red and those that succeeded in green. Getting good error messages from a general-purpose recursive descent parser is a significant challenge in itself, but doing that when you have an optimized grammar is even more difficult. I welcome any help in this area. Since the machine parser does not yet do memoization or debug rules, a strategy for grammar development is to test things out with the packrat parser, then switch to the machine parser for speed once it's working. |
Yeah, producing good error messages is incredibly difficult. For the issue with having an optimized grammar, one potential workaround could be to keep the original grammar as well and if there is a parse error re-parse with the unoptimized grammar in order to provide more helpful error messages. That should probably be optional though due to the additional overhead of parsing the same input twice. |
Pe already stores the original and optimized grammars in the parser objects: Lines 10 to 13 in 4167657
The intended reason for storing both is precisely what you describe, however I have not yet implemented the reparsing feature. One challenge is that I don't see an easy way to hand over parser state at a point to the unoptimized grammar. I can reparse the whole input, but that can take a while, especially with the unoptimized grammar. |
It would be very handy if parsers generated by
pe
were able to produce proper error messages when they fail to parse a given input. This includes pointing out the longest possible sequence of the input that the parser was able to match and then telling the user what the parser would have expected in order to continue matching from there.E.g. when using a grammar such as
matching an input of
Hello
should say something likeAt line 1, col 5: Expected "world"
or if there were multiple alternatives:At line 1, col 5: Expected one of: "world", "universe"
For character classes one can simply print their definition as the expected value and for rule references print the name of the rule.
See e.g. the error messages that ANTLR produces.
The text was updated successfully, but these errors were encountered: