Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix trailing IdentifierPart in grammar #641

Conversation

OmarTawfik
Copy link
Contributor

  • Moved numeric literals to use notFollowedBy: IdentifierStart instead of IdentifierPart.
  • Removed notFollowedBy: IdentifierPart from string literals, to match the behavior of solc.

@OmarTawfik OmarTawfik requested a review from a team as a code owner November 6, 2023 23:04
Copy link

changeset-bot bot commented Nov 6, 2023

⚠️ No Changeset found

Latest commit: a6b555c

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@OmarTawfik OmarTawfik enabled auto-merge November 6, 2023 23:04
Copy link
Contributor

@Xanewok Xanewok left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks a lot!

@OmarTawfik OmarTawfik added this pull request to the merge queue Nov 6, 2023
- Moved numeric literals to use `notFollowedBy: IdentifierStart` instead of `IdentifierPart`.
- Removed `notFollowedBy: IdentifierPart` from string literals, to match the behavior of `solc`.
@OmarTawfik OmarTawfik removed this pull request from the merge queue due to a manual request Nov 6, 2023
@OmarTawfik OmarTawfik force-pushed the fix-trailing-identifier-part-grammar branch from 38f76fc to a6b555c Compare November 6, 2023 23:45
@OmarTawfik OmarTawfik enabled auto-merge November 6, 2023 23:45
@OmarTawfik OmarTawfik added this pull request to the merge queue Nov 6, 2023
Merged via the queue into NomicFoundation:main with commit a59a464 Nov 7, 2023
1 check passed
@OmarTawfik OmarTawfik deleted the fix-trailing-identifier-part-grammar branch November 7, 2023 00:09
@@ -3833,85 +3845,95 @@ codegen_language_macros::compile!(Language(
])
),
Repeated(name = AsciiStringLiterals, repeated = AsciiStringLiteral),
Token(
Enum(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@OmarTawfik I'm wondering, shouldn't this still be a token? The string literals are considered tokens by solc and it's still defined as scanner in the YAML and DSL v1.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, the question is (same for these three), each has a single quote/double quote variation:

  • HexStringLiteral
  • AsciiStringLiteral
  • UnicodeStringLiteral

Taking the first as an example, in DSL v0/v1, HexStringLiteral was defined as a token, and its definition is Fragment(SingleQuoteHexStringLiteral) || Fragment(DoubleQuoteHexStringLiteral) .. This leads to the CST having a single token named HexStringLiteral but can contain either 0x'FF' or 0x"FF" ..

In DSL v2, I changed that for the string literal to be defined as an Enum, and its definition to SingleQuoteVariant || DoubleQuoteVariant .. Both are stand alone tokens .. Since Enums don’t produce their own NonTerminalKind, we will still end up with one token in the CST, but instead of it being HexStringLiteral, it will be either SingleQuoteHexStringLiteral or DoubleQuoteHexStringLiteral ..

I believe while the change is minor, it makes it much easier to use/deal with, since any operation on the containing string value will have to detect the quote start/end character, but also deal with escape sequences that differ between the two variants based on the quote character. No need to “hide” this piece of information in the tree, only to have to analyze it again afterwards.

Please let me know if you have any concerns on this change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's reverse that change for now, until after migration is done: #646

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants