Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inline HTML tags are parsed separately and produce semantically incorrect AST node #418

Closed
GYHHAHA opened this issue May 28, 2023 · 6 comments
Labels
bug Something isn't working

Comments

@GYHHAHA
Copy link
Contributor

GYHHAHA commented May 28, 2023

Describe the bug

abc <sup>[1]</sup> now produces 4 separate AST nodes: [{"type":"text","value":"abc "},{"type":"html","value":"<sup>"},{"type":"text","value":"[1]"},{"type":"html","value":"</sup>"}] which breaks the sup semantics. It should be aligned to the markdown behavior. Contrast current and desired:

28 05 2023_02 45 45_REC

28 05 2023_02 46 23_REC

Reproduce the bug

/

List your environment

No response

@GYHHAHA GYHHAHA added the bug Something isn't working label May 28, 2023
@welcome
Copy link

welcome bot commented Jun 11, 2023

Thanks for opening your first issue here! Engagement like this is essential for open source projects! 🤗

If you haven't done so already, check out EBP's Code of Conduct. Also, please try to follow the issue template as it helps other community members to contribute more effectively.

If your issue is a feature request, others may react to it, to raise its prominence (see Feature Voting).

Welcome to the EBP community! 🎉

@rowanc1 rowanc1 transferred this issue from jupyter-book/myst-theme Jun 11, 2023
@rowanc1
Copy link
Member

rowanc1 commented Jun 11, 2023

Thanks @GYHHAHA I have spent some time thinking about this over the weekend. I am not quite sure what the best path is. Some of my notes are below, not overly sure where to go next and would appreciate input from you or @fwkoch! This is also a high-priority feature request in jupyterlab-myst (cc @agoose77, @nthiery).

Right now our parser is markdown-it and that is what is giving us individual html tags, while still parsing the HTML in between. I think there could be a transform at the directive/role parsing stage that recognizes the inline_html tokens, parses the html, and then does the appropriate nesting in a map (i.e. while open html tag, add children, then close when you find the matching tag).

I have some other questions about what we want to render inside of the html tags as well. For example, the value for this:
<sup>[a](link)</sup>
Should that have the link be parsed markdown, or is that html? Also, in markdown-it this actually parses to a single html_block if you start it directly with the <, which is a different problem.

I am somewhat concerned that this is going to be a bit of a mess and lead to all sorts of edge cases. Testing out github, they also aren't consistent and again it isn't consistent with Jupyter either. markdown-it is for directly rendering HTML and is explicitly designed without an AST, which makes these things hard. This isn't really a problem if you are rendering to html, and then sanitizing the string (which is what jupyter lab essential does); however, in react, we can't do that as we actually need to represent the tree/AST first. I think that unified is a much better choice for us long term, but getting there is still a fair bit of work.

There is also convertHtmlToMdast which actually converts to mdast, which is better if we know the html tag (like sup) in this case. I thought we were running this code, but that seems to have been removed (cc @fwkoch, maybe intentional?), and regardless would have the same problems that you highlighted!

There are probably some small improvements that we could put in, but I don't think they aren't going to address it in a standardized way.

@GYHHAHA
Copy link
Contributor Author

GYHHAHA commented Jun 12, 2023

Thanks for your response. Totally agree. This is not an easy one to address with a systematic way for now.

@rowanc1
Copy link
Member

rowanc1 commented Sep 19, 2023

There has been some improvements that @fwkoch made here:

@GYHHAHA
Copy link
Contributor Author

GYHHAHA commented Sep 21, 2023

@rowanc1 Thanks for the update!

rowanc1 added a commit that referenced this issue Oct 16, 2023
@rowanc1
Copy link
Member

rowanc1 commented Oct 16, 2023

I have taken a pass at this in #680, it actually will be correct between the two transforms now, and should satisfy this issue. We will likely still iterate a bit on this, but hopefully the PR will close this one!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants