Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
During AST expansion, it is possible for new literal nodes to be created and given the same position as existing literals. The code attempts to reposition nodes when this happens but does not always succeed. The result is a short-circuit in the TNFA which can manifest as an infinite loop, e.g. a bounded iteration being treated as an unbounded one. This reshuffling can also leave positions unfilled, wasting memory.
This bug was discovered while investigating a report of an issue with a regular expression intended to accept DNS names, which turned out to accept labels of any length, instead of only up to 63 characters. Compiling the minimal reproducer
^((a{1,3})?x)*y
with debugging enabled showed the following AST after expansion:Note that the last
a
node and thex
node both have position 1. They
node was moved from position 2 to position 4 during expansion; thex
node should have been moved to position 3, but wasn't.The solution is to postpone assigning positions to literal nodes until immediately before TNFA conversion, removing the need to reposition nodes during AST expansion. With this change, we get the following AST instead:
This PR adds test cases which trigger the bug and implements the solution described above.