-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix trailing IdentifierPart in grammar #641
fix trailing IdentifierPart in grammar #641
Conversation
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks a lot!
- Moved numeric literals to use `notFollowedBy: IdentifierStart` instead of `IdentifierPart`. - Removed `notFollowedBy: IdentifierPart` from string literals, to match the behavior of `solc`.
38f76fc
to
a6b555c
Compare
@@ -3833,85 +3845,95 @@ codegen_language_macros::compile!(Language( | |||
]) | |||
), | |||
Repeated(name = AsciiStringLiterals, repeated = AsciiStringLiteral), | |||
Token( | |||
Enum( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@OmarTawfik I'm wondering, shouldn't this still be a token? The string literals are considered tokens by solc and it's still defined as scanner in the YAML and DSL v1.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, the question is (same for these three), each has a single quote/double quote variation:
HexStringLiteral
AsciiStringLiteral
UnicodeStringLiteral
Taking the first as an example, in DSL v0/v1, HexStringLiteral
was defined as a token, and its definition is Fragment(SingleQuoteHexStringLiteral) || Fragment(DoubleQuoteHexStringLiteral)
.. This leads to the CST having a single token named HexStringLiteral
but can contain either 0x'FF'
or 0x"FF"
..
In DSL v2, I changed that for the string literal to be defined as an Enum, and its definition to SingleQuoteVariant || DoubleQuoteVariant
.. Both are stand alone tokens .. Since Enums don’t produce their own NonTerminalKind
, we will still end up with one token in the CST, but instead of it being HexStringLiteral
, it will be either SingleQuoteHexStringLiteral
or DoubleQuoteHexStringLiteral
..
I believe while the change is minor, it makes it much easier to use/deal with, since any operation on the containing string value will have to detect the quote start/end character, but also deal with escape sequences that differ between the two variants based on the quote character. No need to “hide” this piece of information in the tree, only to have to analyze it again afterwards.
Please let me know if you have any concerns on this change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's reverse that change for now, until after migration is done: #646
notFollowedBy: IdentifierStart
instead ofIdentifierPart
.notFollowedBy: IdentifierPart
from string literals, to match the behavior ofsolc
.