Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

πŸ‘©β€πŸ’» Add html transform that combines related html nodes #575

Merged
merged 5 commits into from
Sep 5, 2023

Conversation

fwkoch
Copy link
Collaborator

@fwkoch fwkoch commented Sep 1, 2023

This is an attempt to address issues around persisting inline HTML in notebooks, raised here: jupyter-book/jupyterlab-myst#64

The parser breaks inline html up into individual tags per node; this transform reconstructs a single html node from these individual nodes as well as the markdown content between them (using myst-to-html). This means you can even include markdown inside html inline, e.g. something like <i>[](#my-link)</i> will resolve to an html ref.

Non-inline html is not broken up by the parser, so it will persist untransformed. Potentially we could process that html the same way we process the inline html allowing markdown inside, for example, an html table - but that is slightly out of scope here.

},
},
);
// This would be good to sanitize, but the best solution requires jsdom, increasing build size by 50%...
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stevejpurves dompurify requires jsdom which bumps our mystmd bundle size from ~12mb -> ~17mb. Feels not great. We will need it on the other side, where this html is consumed (jupyterlab-myst or myst-to-react).

Another option is sanitize-html. This is a simpler library, but we need to explicitly decide which html tags/attrs to allow, and I don't think we have enough info for that yet. (For example, the defaults do not allow button, which is needed for this use case: jupyter-book/jupyterlab-myst#64)

@fwkoch
Copy link
Collaborator Author

fwkoch commented Sep 2, 2023

@agoose77 - this is the transform that Rowan mentioned in his comment here: jupyter-book/jupyterlab-myst#118 (comment) - it stitches together the split-up inline html (including any markdown content between the open and close tag) into a single html node. It works nicely at least with the example html from this issue: jupyter-book/jupyterlab-myst#64 as well as a few test cases. I'm sure there will be plenty of complicated html use cases which will require tweaking this, but hopefully it is an acceptable start.

@fwkoch fwkoch force-pushed the feat/html-revive-transform branch from fd140b5 to da7692b Compare September 5, 2023 23:37
@fwkoch fwkoch merged commit 7752cb7 into main Sep 5, 2023
@fwkoch fwkoch deleted the feat/html-revive-transform branch September 5, 2023 23:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant