Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't match keywords if they are preceded by a dot and add the 'sequence' keyword #186

Merged
merged 7 commits into from
Jan 10, 2025

Conversation

matteocoder
Copy link
Contributor

@matteocoder matteocoder commented Jan 8, 2025

I added a negative lookbehind to the regexes of most control keywords, so that they do not match when they are an object property or method. I added a few more relevant tests as well.

Also, I added the 'sequence' keyword, and 'process' is now considered a block keyword.

This PR fixes a couple more tests in the syntax test, and should fix #105.

Keywords (with the exception of 'where' and 'foreach') should not be considered as such if they are preceded by a '.'.
The tests relative to keywords used as members or methods now pass.
'process' is now considered a block keyword.
Also make the relative syntax test pass.
Now the test reflects the change of the 'process' keyword from uncategorised to block.
The relevant test in the function syntax test now passes.
@michaelblyons
Copy link
Collaborator

michaelblyons commented Jan 8, 2025

Is this compatible with ST's "fast" regex engine? (As opposed to the "slow" Oniguruma one?) You can test with the regex compatibility build variant.

@matteocoder
Copy link
Contributor Author

matteocoder commented Jan 8, 2025

@michaelblyons I just checked and negative lookbehind is compatible just with Oniguruma. Is it still okay, or is there a better regex construct that would achieve the same result but allow compatibility with the new engine?

@michaelblyons
Copy link
Collaborator

michaelblyons commented Jan 8, 2025

You need to make sure that the possibly-keyword is already consumed by a match on the ..

Is there somewhere that is not correctly pushing the members context, or perhaps an inclusion in members that should not be there?

@matteocoder
Copy link
Contributor Author

@michaelblyons Perhaps it's the fact that the members context is lower on the syntax file, and so the syntax engine sees the keyword matching first and disregards the members context? I'm a newbie at this, so I apologise if I'm saying something wrong.

@michaelblyons
Copy link
Collaborator

michaelblyons commented Jan 8, 2025

No, patterns are matched in the order in which they appear in the current set of matches in the tip.of the context stack.

foo:
  # matches first
  - include: bar
  # matches after, even though context is defined first
  - include: any

any:
 - match: \w+
   scope: any

bar:
 - match: \bbar\b
   scope: bar

For a friendly intro to TextMate-like engines, see this post I wrote for a VS Code question. (There are probably ST guides, too, but I don't know links to those off the top of my head.)

@matteocoder
Copy link
Contributor Author

@michaelblyons Thank you! the post was really informative. Looking at the syntax test, there's really nothing to push the context:
image

Maybe I could create a typename scope which matches the argument of "-typename", which will then push the members context?

@michaelblyons
Copy link
Collaborator

I think that's more specific than you want to be, here. At most, I think we'd want to identify a command use, and then push into a context for its arguments. Then pop when we find a {terminator, pipe, unescaped newline, etc.} and identify --prefixed items versus other items.

This is not necessarily an easy thing to do. I intentionally did not attempt it when I fixed scopes in 4.0. You are welcome to try, but it may be difficult. The Bash syntax shipped with ST does it this way.

@matteocoder
Copy link
Contributor Author

matteocoder commented Jan 8, 2025

diff --git a/PowerShell.sublime-syntax b/PowerShell.sublime-syntax
index 8d1941b..fab5aad 100644
--- a/PowerShell.sublime-syntax
+++ b/PowerShell.sublime-syntax
@@ -75,6 +75,9 @@ contexts:
         - include: main
     - match: \b[\w.-]+\.(?i:exe|com|cmd|bat)\b
       scope: variable.function.powershell
+    - match: \b[\w<>_]+(?=\.)
+      scope: storage.type.powershell
+      push: members
     # Exceptions
     - match: \b(?i:throw){{kebab_break}}
       scope: keyword.control.exception.raise.powershell

Indeed, I'm finding this quite tricky. How about the solution outlined above? It consumes the dot before the keywords are matched, and pushes the members context correctly, while also being compatible with the new engine. The existing tests pass too. The only problem I see with this solution is that file names are incorrectly recognised as storage.type. It's frustrating because to PowerShell foo.baz could be a type name or a file name.

If you're unsatisfied with this solution, I could make another PR with only the changes you're comfortable with integrating into the codebase.

@michaelblyons
Copy link
Collaborator

New Bash just scopes arguments like that as strings since a syntax engine can't really know what they are. And for the purposes of normal use, they're unquoted strings.

Alternatively, you could just remove the scope and consume without applying a scope.

@matteocoder
Copy link
Contributor Author

Wonderful! I thought scoping was always necessary. I removed the scope setting and the lookbehind isn't necessary anymore, so I removed that as well.

diff --git a/PowerShell.sublime-syntax b/PowerShell.sublime-syntax
index 8d1941b..6645f88 100644
--- a/PowerShell.sublime-syntax
+++ b/PowerShell.sublime-syntax
@@ -75,6 +75,8 @@ contexts:
         - include: main
     - match: \b[\w.-]+\.(?i:exe|com|cmd|bat)\b
       scope: variable.function.powershell
+    - match: \b[\w<>_]+\b(?=\.)
+      push: members
     # Exceptions
     - match: \b(?i:throw){{kebab_break}}
       scope: keyword.control.exception.raise.powershell

Do you think this change is good?

@matteocoder
Copy link
Contributor Author

@michaelblyons I apologise for double posting, but I decided to alter the regular expression a little by taking into account all the character classes allowed by the C# spec.

diff --git a/PowerShell.sublime-syntax b/PowerShell.sublime-syntax
index 8d1941b..9abc03c 100644
--- a/PowerShell.sublime-syntax
+++ b/PowerShell.sublime-syntax
@@ -23,6 +23,7 @@ variables:
   integer_suffix: (?i:ul?|lu?)
   bytes_unit: (?i:[kmgtp]b)
   kebab_break: (?![\w-])
+  csharp_allowed_chars: \p{Lu}\p{Ll}\p{Lm}\p{Nl}\p{Nd}\p{Pc}\p{Mn}\p{Mc}\p{Cf}
 
 contexts:
 
@@ -75,6 +76,10 @@ contexts:
         - include: main
     - match: \b[\w.-]+\.(?i:exe|com|cmd|bat)\b
       scope: variable.function.powershell
+    # Consume a string with a trailing dot
+    # to prevent members with particular names from being recognized as keywords.
+    - match: \b[{{csharp_allowed_chars}}]+\b(?=\.)
+      push: members
     # Exceptions
     - match: \b(?i:throw){{kebab_break}}
       scope: keyword.control.exception.raise.powershell

If you're okay with these changes, I can push another commit which incorporates your feedback.

By the way, is there something weird going on with the regex matching? I'm testing the pattern [\p{L}\p{N}]+, which in theory should match one or more characters, but Sublime Text is showing me that it is matching just a few:
image

@michaelblyons
Copy link
Collaborator

Looks like ST is trying to match p, n, l, {, and }, no?

@matteocoder
Copy link
Contributor Author

Ah interesting. I took that regex from the storage.type assertion, and I thought it would match all unicode letters and digits.

@michaelblyons
Copy link
Collaborator

ST has 3 different regex engines, iirc, in different places. Maybe Find uses one where character classes aren't valid in a character set?

This makes negative lookbehind assertions on keywords unnecessary.
@matteocoder
Copy link
Contributor Author

Thank you for you help! Indeed, it seems that find and replace use another library for regex matching.

I've pushed my changes containing your feedback. I'm looking forward to hearing your opinion.

@matteocoder matteocoder requested a review from deathaxe January 10, 2025 17:35
PowerShell.sublime-syntax Outdated Show resolved Hide resolved
@matteocoder
Copy link
Contributor Author

@deathaxe @michaelblyons Done!

@michaelblyons michaelblyons merged commit bea280e into SublimeText:master Jan 10, 2025
0 of 2 checks passed
@matteocoder matteocoder deleted the keyword-not-dot branch January 12, 2025 17:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add sequence keyword
3 participants