Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Split StringTemplate from HTML #5192

Merged
merged 37 commits into from
Mar 19, 2021
Merged

Split StringTemplate from HTML #5192

merged 37 commits into from
Mar 19, 2021

Conversation

Nixinova
Copy link
Contributor

@Nixinova Nixinova commented Feb 6, 2021

#4789 (comment): StringTemplate was added in #1117 as a raw HTML variant, but per #4979 HTML variants should be split.

It seems to be HTML but with control statements and variables inside $dollars$.

Default to using default HTML grammar, which is not great as it doesn't match the dollar syntax at all, but couldn't find an actual one that works.
Found a grammar: dangmarm/stringtemplate-lang
UPDATE (Alhadis): Added grammar for StringTemplates that use either <…> delimiters or $…$ delimiters.

Color #3fb34f from logo.

Closes #4789

Was originally added in #1117 as a HTML variant, but per #4979 language variants should be split.

Default to using default HTML grammar.

Will add a heuristic to differentiate from Smalltalk if needed.
@Nixinova Nixinova requested a review from a team as a code owner February 6, 2021 05:37
@Nixinova
Copy link
Contributor Author

Nixinova commented Feb 7, 2021

Ah, found a grammar: dangmarm/stringtemplate-lang

Using dangmarm/stringtemplate-lang now
dangmarm/stringtemplate-lang
`source.stringtemplate`
@Nixinova
Copy link
Contributor Author

Nixinova commented Feb 7, 2021

Can't CTRL+F the logs but it's the same issue as #5194's failed run I assume.

@Nixinova
Copy link
Contributor Author

Nixinova commented Feb 8, 2021

Lightshow doesn't seem to work..

@lildude
Copy link
Member

lildude commented Feb 8, 2021

Lightshow doesn't seem to work..

Looks to be a problem with the grammar and not Lightshow. Picking another grammar source works.

@Nixinova
Copy link
Contributor Author

Nixinova commented Feb 9, 2021

Should I just revert the grammar changes then? Can't find a better one.

@lildude lildude requested a review from Alhadis February 9, 2021 09:54
lib/linguist/heuristics.yml Outdated Show resolved Hide resolved
lib/linguist/heuristics.yml Outdated Show resolved Hide resolved
@Alhadis
Copy link
Collaborator

Alhadis commented Feb 17, 2021

[StringTemplate] seems to be HTML but with control statements

From a quick glance at their format's documentation, it seems that StringTemplate files can generate any sort of output, similar to Moustache (even if it's most commonly used to generate HTML).

Ergo, a dedicated StringTemplate grammar would be more appropriate than Atom's HTML grammar, and it shouldn't take me too long to smash one together. I'll hold off on it until @lildude's finished deploying v7.13.0, just in case an update to Alhadis/language-etc screws with the procedure somehow. 😉

@lildude
Copy link
Member

lildude commented Feb 17, 2021

All done @Alhadis. Go wild.

@Alhadis
Copy link
Collaborator

Alhadis commented Feb 17, 2021

Go wild.

Roger. Will push an update to @Nixinova's branch that contains the required grammar update.

I'm wondering if there should be an undocumented script/bump-grammar utility to streamline this procedure… 🤔

@Nixinova
Copy link
Contributor Author

it seems that StringTemplate files can generate any sort of output

Should I improve the StringTemplate heuristic then?

@Alhadis
Copy link
Collaborator

Alhadis commented Feb 17, 2021

Should I improve the StringTemplate heuristic then?

What we have now should be sufficient. The delimiter characters are configurable, so <word> is no less valid than $word$, |word|, [word] or ^word^. StringTemplate's most common use appears to be preparing HTML, and the docs recommend $…$ for HTML output:

This documentation uses <...> to delimit expressions, but you can use any single start and stop character. For HTML, $..$ is a much better choice obviously.

The only thing I'd suggest adding is StringTemplate's <! comment !> syntax:

pattern: >-
    (?x-m)

    # Symmetrical delimiter: $…$ '…' #…# ~…~
    (.)! .+? !\1

    # Mirrored delimiters: <…> […] {…} “…” «…»
    |  <! .+? !>
    | \[! .+? !\]
    | \{! .+? !\}
Skippable sidenotes that you should read anyway

Now, I'd like to take a moment to explain a small but critical nuance about the behaviour of . in Ruby regexes. See, Ruby/Oniguruma's multi-line mode changes the semantics of . so it matches newline characters. This differs from other regex engines, which generally use a separate modifier to enable dotall mode (/s in Perl/PCRE and JavaScript). As a result, this has severe performance implications when processing large files with .+especially when it's used repeatedly in an expression.

Now, while possessive matching (.++, (?>.+)) helps, an even better solution is to double-check if . should match newlines in the first place. If not, then [^\r\n] or (?-m:.) can have substantial performance improvements. This is mostly relevant to Ruby/Oniguruma, because the default behaviour of . in other engines is not to match newline characters.

Examples of when . should match newlines

Good
<zalgo>.*?</zalgo>
Bad
async function .+\(.*\)
Slightly less bad
async function .+?\(.*?\)

Why only slightly? .+? only affects the performance of successful matches. A failed match means the rest of the input is scanned, which is as slow as if .+ were used.
Solution: use possessive quantifiers (.++) or atomic groups ((?>.+)) to fail faster.

Okay, enough of my rambling bullshit. ^D

@Nixinova
Copy link
Contributor Author

Is what I added sufficient?

Alhadis added a commit to Alhadis/language-etc that referenced this pull request Mar 7, 2021
@Alhadis
Copy link
Collaborator

Alhadis commented Mar 12, 2021

@Nixinova Sorry I took so long to write that StringTemplate grammar. The language turned out to be more complex than I assumed it was.

I've pushed an update to your branch to use the new grammar, so, no actions are required on your part. 👍

@Nixinova Nixinova requested a review from lildude March 17, 2021 00:58
@lildude lildude merged commit fa41e7e into github-linguist:master Mar 19, 2021
@Nixinova Nixinova deleted the StringTemplate branch March 19, 2021 18:51
@github-linguist github-linguist locked as resolved and limited conversation to collaborators Jun 17, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Smalltalk files are misidentified as HTML
3 participants