Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

enh(gcode): rewrote language for moden gcode support #4040

Merged
merged 6 commits into from
Oct 31, 2024

Conversation

barthy-koeln
Copy link
Contributor

@barthy-koeln barthy-koeln commented Apr 14, 2024

Complete rework of the gcode language to allow for extended uses cases.
Not all scope rules will apply to all implementations of g-code, but many applications of gcode have added a lot on top of the original spec.

This language implementation aims to be more extensive but still flexible.

My research has used the following documentations:

And countless code examples extracted from GitHub's search:

Changes

  • More keywords
  • Stricter numbers matching (c-style numbers are too broad)
  • Differentiation between functions (G-Codes, M-Codes, …), Axes (ABC, UVW, XYZ), Parameters (P, Q, R, S, …)
  • Semicolon comments
  • Quoted strings
  • Word commands

Question about code

I have used this pattern to re-use complex regexes:

const NUMBER = /[+-]?((\.\d+)|(\d+)(\.\d*)?)/;
const match = new RegExp(`(?<![A-Z])[FHIJKPQRS]\\s*${NUMBER.source}`)

I would like to use this for certain duplicated parts (e.g. (?<![A-Z])) as well. Is this acceptable?
The performance impact is surely negligible since this only happens during initialisation.
It greatly increases readability and IDE support in my opinion.

Screenshots

Name Before After
Default default_before default_after
Extended extended_before extended_after

Checklist

  • Added markup tests
  • Updated the changelog at CHANGES.md

@barthy-koeln barthy-koeln changed the title Feat/gcode rework feat: gcode rework Apr 14, 2024
@barthy-koeln barthy-koeln force-pushed the feat/gcode_rework branch 2 times, most recently from e416841 to aaeb9f5 Compare April 14, 2024 12:58
@barthy-koeln barthy-koeln changed the title feat: gcode rework enh(gcode): rewrote language for moden gcode support Apr 14, 2024
@barthy-koeln barthy-koeln marked this pull request as ready for review April 14, 2024 13:00
src/languages/gcode.js Outdated Show resolved Hide resolved
variants: [
// G General functions: G0, G5.1, G5.2, …
// M Misc functions: M0, M55.6, M199, …
{ match: /(?<![A-Z])[GM]\s*\d+(\.\d+)?/ },
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't use look-behinds until version 12 (breaking change)... so we'll need another way, or this PR will have to wait till then... no ETA on version 12 yet.

src/languages/gcode.js Outdated Show resolved Hide resolved
src/languages/gcode.js Outdated Show resolved Hide resolved
src/languages/gcode.js Outdated Show resolved Hide resolved
src/languages/gcode.js Outdated Show resolved Hide resolved
src/languages/gcode.js Outdated Show resolved Hide resolved
const NUMBER = hljs.inherit(hljs.C_NUMBER_MODE, { begin: '([-+]?((\\.\\d+)|(\\d+)(\\.\\d*)?))|' + hljs.C_NUMBER_RE });


const LETTER_BOUNDARY_RE = /(?<![A-Z])/;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
const LETTER_BOUNDARY_RE = /(?<![A-Z])/;
// TODO: post v12 lets use look-behind for more accuracy, until then \b should suffice
// const LETTER_BOUNDARY_RE = /(?<![A-Z])/;
const LETTER_BOUNDARY_RE = /\b/;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was still thinking about that — I unfortunately don't think that \b is good enough. It completely breaks the spaceless gcode snippet I included in the markup test, because \b includes [0-9], meaning a sequence like G1A2 does not match G1 and A2 separately like it should.

It's pretty common for gcode to be generated by software and not written by humans, and to reduce file size and transmission time to the machines, they strip all whitespace.

I wanted to look into \b and an on:begin filter using response.ignoreMatch() if a digit is found before the first letter.

Would that be fine or discouraged?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You'd have to do that per mode or course... we don't use that much because it's SO expensive... but I think if we were willing to remove GCODE from the list of auto-detectable languages with disableAutodetect that on:begin could then be used... since we'd then be certain we were at least doing all that work in the service of highlighting actual GCODE.

And lets still leave a comment somewhere that the better solution is look-behind - for the future when we can use that.

@barthy-koeln
Copy link
Contributor Author

@joshgoebel thank you for your patience.

My latest commit added disableAutodetect: true to the language, as well as the aforementioned on:begin filter.
I added the filter as a variant to full matches with \b.

This means that readable gcode with sane spacing will rarely, if ever, run into the callback filter.


I've tested both implementations with the existing markup test, as well as some 100lines of spaceless gcode.
The results are always within 5% of each other, always in favor of the v12 language with lookbehinds.

Running Benchmark & Results

Shell commands to download v12 and install benny

wget https://raw.githubusercontent.com/barthy-koeln/highlight.js/2c55db96e8a0523fdbdd6d4069c7007f75d5288b/src/languages/gcode.js -O src/languages/gcode-v12.js
npm run build gcode gcode-v12
cd build
npm install benny

# create benchmark script here e.g. bench.mjs

node bench.mjs

Benchmark script

import b from 'benny'
import fs from 'fs'
import hljs from 'highlight.js'

const code = fs.readFileSync(import.meta.dirname + '/../test/markup/gcode/extended.txt').toString('utf-8')

b.suite(
  'gcode bench',

  b.add('v11', () => {
    hljs.highlight(code, { language: 'gcode' })
  }),

  b.add('v12', () => {
    hljs.highlight(code, { language: 'gcode-v12' })
  }),

  b.cycle(),
  b.complete(),
  b.save({ file: 'gcode', version: '1.0.0' }),
  b.save({ file: 'gcode', format: 'chart.html' })
)

Results

gcode-bench

Comment on lines 74 to 83
const charBeforeMatch = matchdata.input[matchdata.index - 1];
if (charBeforeMatch >= '0' && charBeforeMatch <= '9') {
return;
}

response.ignoreMatch();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the regex is ![A-Z] why wouldn't the conditional match? ie:

Suggested change
const charBeforeMatch = matchdata.input[matchdata.index - 1];
if (charBeforeMatch >= '0' && charBeforeMatch <= '9') {
return;
}
response.ignoreMatch();
const charBeforeMatch = matchdata.input[matchdata.index - 1];
if (charBeforeMatch >= 'A' && charBeforeMatch <= 'Z') {
response.ignoreMatch();
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to deal with lowercase a-z also?

Copy link
Contributor Author

@barthy-koeln barthy-koeln Apr 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the begin with on:begin variant was the only matching logic, we'd have to check for ![a-zA-Z].

But since we have a variant with \b we can be sure that any instance that follows a word boundary ![a-zA-Z0-9_] has been found already. So we only need to additionally find those that follow 0-9 and ignore everything else.

Doing that means we don't have to account for upper/lowercase.

I just noticed the underscore _ isn't handled by that callback, but it should be. I will push another commit (but will wait for feedback on the thing above).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update: I've committed an additional check for underscores, to align the filter with what \b and ![A-Z] mean.
I've added two lines to the markup test and verified in my v1 branch feat/gcode-v12 that the result is the same.

@barthy-koeln
Copy link
Contributor Author

barthy-koeln commented May 20, 2024

Hey @joshgoebel, this is just a careful ping to see if I can or should do anything else here?
I need this fix for a freelance project, but can use my fork for the initial release.

Edit: I seriously understand everyone's time is precious and this is free open-source work, so I'll accept any answer including "not now, will revisit later" :D

Copy link
Member

@joshgoebel joshgoebel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry this took so very long!

Copy link

Build Size Report

Changes to minified artifacts in /build, after gzip compression.

5 files changed

Total change +289 B

View Changes
file base pr diff
es/core.min.js 8.18 KB 8.18 KB -1 B
es/highlight.min.js 8.18 KB 8.18 KB -1 B
es/languages/gcode.min.js 642 B 788 B +146 B
highlight.min.js 8.22 KB 8.22 KB -1 B
languages/gcode.min.js 647 B 793 B +146 B

@joshgoebel joshgoebel merged commit 7893353 into highlightjs:main Oct 31, 2024
19 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants