👩‍💻 Add html transform that combines related html nodes #575

fwkoch · 2023-09-01T15:45:08Z

This is an attempt to address issues around persisting inline HTML in notebooks, raised here: jupyter-book/jupyterlab-myst#64

The parser breaks inline html up into individual tags per node; this transform reconstructs a single html node from these individual nodes as well as the markdown content between them (using myst-to-html). This means you can even include markdown inside html inline, e.g. something like <i>[](#my-link)</i> will resolve to an html ref.

Non-inline html is not broken up by the parser, so it will persist untransformed. Potentially we could process that html the same way we process the inline html allowing markdown inside, for example, an html table - but that is slightly out of scope here.

fwkoch · 2023-09-01T23:18:01Z

packages/myst-transforms/src/html.ts

+      },
+    },
+  );
+  // This would be good to sanitize, but the best solution requires jsdom, increasing build size by 50%...


@stevejpurves dompurify requires jsdom which bumps our mystmd bundle size from ~12mb -> ~17mb. Feels not great. We will need it on the other side, where this html is consumed (jupyterlab-myst or myst-to-react).

Another option is sanitize-html. This is a simpler library, but we need to explicitly decide which html tags/attrs to allow, and I don't think we have enough info for that yet. (For example, the defaults do not allow button, which is needed for this use case: jupyter-book/jupyterlab-myst#64)

fwkoch · 2023-09-02T00:12:33Z

@agoose77 - this is the transform that Rowan mentioned in his comment here: jupyter-book/jupyterlab-myst#118 (comment) - it stitches together the split-up inline html (including any markdown content between the open and close tag) into a single html node. It works nicely at least with the example html from this issue: jupyter-book/jupyterlab-myst#64 as well as a few test cases. I'm sure there will be plenty of complicated html use cases which will require tweaking this, but hopefully it is an acceptable start.

fwkoch commented Sep 1, 2023

View reviewed changes

fwkoch mentioned this pull request Sep 2, 2023

⚛️ Render HTML directly jupyter-book/jupyterlab-myst#118

Merged

1 task

fwkoch added 5 commits September 5, 2023 17:37

📦 Bump package versions

035ea0c

👩‍💻 Move types from myst-transforms to myst-common

2fa2042

👩‍💻 Handle html nodes in myst-to-html

daf7123

👩‍💻 Add html transform that combines related html nodes

01b8b68

🔧 Move html handler out of default myst-to-html

da7692b

fwkoch force-pushed the feat/html-revive-transform branch from fd140b5 to da7692b Compare September 5, 2023 23:37

fwkoch merged commit 7752cb7 into main Sep 5, 2023

fwkoch deleted the feat/html-revive-transform branch September 5, 2023 23:41

rowanc1 mentioned this pull request Sep 19, 2023

Inline HTML tags are parsed separately and produce semantically incorrect AST node #418

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

👩‍💻 Add html transform that combines related html nodes #575

👩‍💻 Add html transform that combines related html nodes #575

fwkoch commented Sep 1, 2023 •

edited

Loading

fwkoch Sep 1, 2023

fwkoch commented Sep 2, 2023

👩‍💻 Add html transform that combines related html nodes #575

👩‍💻 Add html transform that combines related html nodes #575

Conversation

fwkoch commented Sep 1, 2023 • edited Loading

fwkoch Sep 1, 2023

Choose a reason for hiding this comment

fwkoch commented Sep 2, 2023

fwkoch commented Sep 1, 2023 •

edited

Loading