Inline HTML tags are parsed separately and produce semantically incorrect AST node #418

GYHHAHA · 2023-05-28T06:47:19Z

Describe the bug

abc [1] now produces 4 separate AST nodes: [{"type":"text","value":"abc "},{"type":"html","value":""},{"type":"text","value":"[1]"},{"type":"html","value":""}] which breaks the sup semantics. It should be aligned to the markdown behavior. Contrast current and desired:

Reproduce the bug

/

List your environment

No response

The text was updated successfully, but these errors were encountered:

welcome · 2023-06-11T22:57:02Z

Thanks for opening your first issue here! Engagement like this is essential for open source projects! 🤗

If you haven't done so already, check out EBP's Code of Conduct. Also, please try to follow the issue template as it helps other community members to contribute more effectively.

If your issue is a feature request, others may react to it, to raise its prominence (see Feature Voting).

Welcome to the EBP community! 🎉

rowanc1 · 2023-06-11T23:23:24Z

Thanks @GYHHAHA I have spent some time thinking about this over the weekend. I am not quite sure what the best path is. Some of my notes are below, not overly sure where to go next and would appreciate input from you or @fwkoch! This is also a high-priority feature request in jupyterlab-myst (cc @agoose77, @nthiery).

Right now our parser is markdown-it and that is what is giving us individual html tags, while still parsing the HTML in between. I think there could be a transform at the directive/role parsing stage that recognizes the inline_html tokens, parses the html, and then does the appropriate nesting in a map (i.e. while open html tag, add children, then close when you find the matching tag).

I have some other questions about what we want to render inside of the html tags as well. For example, the value for this:
[a](link)
Should that have the link be parsed markdown, or is that html? Also, in markdown-it this actually parses to a single html_block if you start it directly with the <, which is a different problem.

I am somewhat concerned that this is going to be a bit of a mess and lead to all sorts of edge cases. Testing out github, they also aren't consistent and again it isn't consistent with Jupyter either. markdown-it is for directly rendering HTML and is explicitly designed without an AST, which makes these things hard. This isn't really a problem if you are rendering to html, and then sanitizing the string (which is what jupyter lab essential does); however, in react, we can't do that as we actually need to represent the tree/AST first. I think that unified is a much better choice for us long term, but getting there is still a fair bit of work.

There is also convertHtmlToMdast which actually converts to mdast, which is better if we know the html tag (like sup) in this case. I thought we were running this code, but that seems to have been removed (cc @fwkoch, maybe intentional?), and regardless would have the same problems that you highlighted!

There are probably some small improvements that we could put in, but I don't think they aren't going to address it in a standardized way.

GYHHAHA · 2023-06-12T03:05:47Z

Thanks for your response. Totally agree. This is not an easy one to address with a systematic way for now.

rowanc1 · 2023-09-19T18:38:17Z

There has been some improvements that @fwkoch made here:

👩‍💻 Add html transform that combines related html nodes #575

GYHHAHA · 2023-09-21T02:42:42Z

@rowanc1 Thanks for the update!

See #418

rowanc1 · 2023-10-16T22:37:45Z

I have taken a pass at this in #680, it actually will be correct between the two transforms now, and should satisfy this issue. We will likely still iterate a bit on this, but hopefully the PR will close this one!

GYHHAHA added the bug Something isn't working label May 28, 2023

rowanc1 transferred this issue from jupyter-book/myst-theme Jun 11, 2023

rowanc1 mentioned this issue Aug 2, 2023

Audit of Features missing in JupyterBook #189

Open

72 tasks

rowanc1 added a commit that referenced this issue Oct 16, 2023

Add superscript and subscript to known HTML

f6b10b3

See #418

rowanc1 mentioned this issue Oct 16, 2023

👩‍💻 Improve HTML processing #680

Merged

rowanc1 closed this as completed in 8bd4ee2 Oct 17, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inline HTML tags are parsed separately and produce semantically incorrect AST node #418

Inline HTML tags are parsed separately and produce semantically incorrect AST node #418

GYHHAHA commented May 28, 2023 •

edited

Loading

welcome bot commented Jun 11, 2023

rowanc1 commented Jun 11, 2023

GYHHAHA commented Jun 12, 2023 •

edited

Loading

rowanc1 commented Sep 19, 2023

GYHHAHA commented Sep 21, 2023

rowanc1 commented Oct 16, 2023

Inline HTML tags are parsed separately and produce semantically incorrect AST node #418

Inline HTML tags are parsed separately and produce semantically incorrect AST node #418

Comments

GYHHAHA commented May 28, 2023 • edited Loading

Describe the bug

Reproduce the bug

List your environment

welcome bot commented Jun 11, 2023

rowanc1 commented Jun 11, 2023

GYHHAHA commented Jun 12, 2023 • edited Loading

rowanc1 commented Sep 19, 2023

GYHHAHA commented Sep 21, 2023

rowanc1 commented Oct 16, 2023

GYHHAHA commented May 28, 2023 •

edited

Loading

GYHHAHA commented Jun 12, 2023 •

edited

Loading