fix trailing IdentifierPart in grammar #641

OmarTawfik · 2023-11-06T23:04:08Z

Moved numeric literals to use notFollowedBy: IdentifierStart instead of IdentifierPart.
Removed notFollowedBy: IdentifierPart from string literals, to match the behavior of solc.

changeset-bot · 2023-11-06T23:04:11Z

⚠️ No Changeset found

Latest commit: a6b555c

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

Xanewok

LGTM, thanks a lot!

- Moved numeric literals to use `notFollowedBy: IdentifierStart` instead of `IdentifierPart`. - Removed `notFollowedBy: IdentifierPart` from string literals, to match the behavior of `solc`.

Xanewok · 2023-11-07T08:25:57Z

crates/solidity/inputs/language/src/definition.rs

@@ -3833,85 +3845,95 @@ codegen_language_macros::compile!(Language(
                            ])
                        ),
                        Repeated(name = AsciiStringLiterals, repeated = AsciiStringLiteral),
-                        Token(
+                        Enum(


@OmarTawfik I'm wondering, shouldn't this still be a token? The string literals are considered tokens by solc and it's still defined as scanner in the YAML and DSL v1.

So, the question is (same for these three), each has a single quote/double quote variation:

HexStringLiteral

AsciiStringLiteral

UnicodeStringLiteral

Taking the first as an example, in DSL v0/v1, HexStringLiteral was defined as a token, and its definition is Fragment(SingleQuoteHexStringLiteral) || Fragment(DoubleQuoteHexStringLiteral) .. This leads to the CST having a single token named HexStringLiteral but can contain either 0x'FF' or 0x"FF" ..

In DSL v2, I changed that for the string literal to be defined as an Enum, and its definition to SingleQuoteVariant || DoubleQuoteVariant .. Both are stand alone tokens .. Since Enums don’t produce their own NonTerminalKind, we will still end up with one token in the CST, but instead of it being HexStringLiteral, it will be either SingleQuoteHexStringLiteral or DoubleQuoteHexStringLiteral ..

I believe while the change is minor, it makes it much easier to use/deal with, since any operation on the containing string value will have to detect the quote start/end character, but also deal with escape sequences that differ between the two variants based on the quote character. No need to “hide” this piece of information in the tree, only to have to analyze it again afterwards.

Please let me know if you have any concerns on this change.

Let's reverse that change for now, until after migration is done: #646

…" rules (#642) See #641

OmarTawfik requested a review from a team as a code owner November 6, 2023 23:04

OmarTawfik enabled auto-merge November 6, 2023 23:04

Xanewok approved these changes Nov 6, 2023

View reviewed changes

OmarTawfik added this pull request to the merge queue Nov 6, 2023

fix trailing IdentifierPart in grammar

a6b555c

- Moved numeric literals to use `notFollowedBy: IdentifierStart` instead of `IdentifierPart`. - Removed `notFollowedBy: IdentifierPart` from string literals, to match the behavior of `solc`.

OmarTawfik removed this pull request from the merge queue due to a manual request Nov 6, 2023

OmarTawfik force-pushed the fix-trailing-identifier-part-grammar branch from 38f76fc to a6b555c Compare November 6, 2023 23:45

OmarTawfik enabled auto-merge November 6, 2023 23:45

OmarTawfik added this pull request to the merge queue Nov 6, 2023

Merged via the queue into NomicFoundation:main with commit a59a464 Nov 7, 2023
1 check passed

OmarTawfik deleted the fix-trailing-identifier-part-grammar branch November 7, 2023 00:09

Xanewok reviewed Nov 7, 2023

View reviewed changes

Xanewok mentioned this pull request Nov 7, 2023

tests(cst): Add regression tests for "not followed by IdentifierStart" rules #642

Merged

OmarTawfik mentioned this pull request Nov 8, 2023

fix: Define DecimalLiteral in DSL v2 using the DSL v1 rules #643

Merged

github-merge-queue bot pushed a commit that referenced this pull request Nov 8, 2023

tests(cst): Add regression tests for "not followed by IdentifierStart…

eb3e2d0

…" rules (#642) See #641

Xanewok mentioned this pull request Nov 13, 2023

Unify language definitions #652

Closed

9 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix trailing IdentifierPart in grammar #641

fix trailing IdentifierPart in grammar #641

OmarTawfik commented Nov 6, 2023

changeset-bot bot commented Nov 6, 2023 •

edited

Loading

Xanewok left a comment

Xanewok Nov 7, 2023

OmarTawfik Nov 8, 2023

OmarTawfik Nov 9, 2023

fix trailing IdentifierPart in grammar #641

fix trailing IdentifierPart in grammar #641

Conversation

OmarTawfik commented Nov 6, 2023

changeset-bot bot commented Nov 6, 2023 • edited Loading

⚠️ No Changeset found

Xanewok left a comment

Choose a reason for hiding this comment

Xanewok Nov 7, 2023

Choose a reason for hiding this comment

OmarTawfik Nov 8, 2023

Choose a reason for hiding this comment

OmarTawfik Nov 9, 2023

Choose a reason for hiding this comment

changeset-bot bot commented Nov 6, 2023 •

edited

Loading