-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
retain authored HTML for empty elements #95
Comments
Note that “retain” is actually the wrong behaviour. You should normalise to having no closing trailing solidus. The microformats parsing specification for |
BS4 recently added an "html5" formatter that apparently does this: I propose switching output to that, since we do not make any other promise about the HTML output AFAIK? |
Beautifulsoup 4.6.2 introduced the ability to control the slashes in void elements through formatters. This adds a formatter that does this, but otherwise only does minimal encoding (as the previous, default, formatter did)
@snarfed While investigating another issue I discovered today that granary appears to rely on mf2py to produce somewhat XHTML-compatible output, which this would break. We could expose the HTML formatter in the API to allow granary to force the old behavior? EDIT: alternatively, some variation of #97 that allows granary to force the serialization downstream, which might give it more options to generate proper XML. |
thanks for the heads up! i'm not entirely sure how this would affect granary yet, but i can always update it. let me know when you have an mf2py PR you want me to test! |
#136 is said PR. Granary produces Atom feeds with the content declared to be XHTML and unless I missed something taken straight from mf2 parsing when turning an mf2 html feed into Atom. Not closing void elements isn't allowed in XHTML as far as I know, and would make those feeds invalid? |
thanks! yeah, i got that part, i just don't remember the exact transformation steps in granary. no matter, i'll try it and see. |
@sknebel you're right. thanks for thinking of granary! it does pass the HTML content pretty much straight through to Atom. ideally yes, i'd love a flag or exposed HTML formatter in mf2py so i can control this in granary. i tested just now though, and if i change the Atom to |
If you use html5lib you can reserialize with options turned on - I think use_trailing_solidus is the one to autoclose null elements; I don't know that it can guarantee full XHTML compliance though, which is what you need to inline them in Atom. |
Currently mf2py due to using BeautifulSoup closes empty HTML tags. e.g.
<br>
gets converted to<br/>
and<hr>
gets converted into<hr/>
. This makes thee-content[html]
different from the authored one.This does not seem to be an issue in actual use but will be for any tests. So I am documenting this here.
Details
html5lib by default does not do this see: https://github.com/html5lib/html5lib-python/blob/5e6b61b4630165dd4765fff41d0f855534d5e2fe/html5lib/serializer.py#L114
The relevant lines in BeautifulSoup which explicitly do this are https://github.com/waylan/beautifulsoup/blob/480367ce8c8a4d1ada3012a95f0b5c2cce4cf497/bs4/element.py#L1106-L1107 (Note that this is not the canonial source for BS4)
The text was updated successfully, but these errors were encountered: