Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

markdown - spec - current implementation of _ is problematic #23

Open
Simon-Laux opened this issue Jan 27, 2023 · 8 comments
Open

markdown - spec - current implementation of _ is problematic #23

Simon-Laux opened this issue Jan 27, 2023 · 8 comments
Assignees
Labels
bug Something isn't working spec issue that involves the specification

Comments

@Simon-Laux
Copy link
Member

Simon-Laux commented Jan 27, 2023

undercore _ is used often in text. especially for programmers having things like ANDROID_NDK_ROOT in a message is not a rare occurrence.

currently ANDROID_NDK_ROOT parses into
Bildschirmfoto 2023-01-27 um 01 29 03

[
    {
        "t": "Text",
        "c": "ANDROID"
    },
    {
        "t": "Italics",
        "c": [
            {
                "t": "Text",
                "c": "NDK"
            }
        ]
    },
    {
        "t": "Text",
        "c": "ROOT"
    }
]

which is not desired in most cases.

Proposed solution

remove underscore variants from bold and italics, so its just *italics* and **bold**

@r10s
Copy link
Member

r10s commented Feb 6, 2023

as we are currently not handling markdown styles at all, it may be okay to support a subset only - but it may still be regarded as a bug and ppl may be wondering.

are there other notable markdown implementations that skip underscore processing? is this maybe a usual thing meanwhile?

i am not too much for/in markdown at all, however, removing such basic thing as _italic_ seems questionable: underscores are esp. nice as italics/underlining are typographically interchangeable - and by _italic_ one has kind of wysiwyg in plaintext. i see that use of underscore in plaintext/markdown very often, it feels natural. many markdown toolbars also use that variant for italics.

and: hasn't sth. as width*height*deep (not to speak about C dereferencing :) the same issue? usually programmers know they have to backtick their stuff, cmp. eg. our changelogs :)

maybe also the parser can maybe do better, eg. allowing _ on word boundaries only (which seems to be a general difference to the asterix - also here on github, inside words asterix is changing style - where underscore is not)

@r10s
Copy link
Member

r10s commented Feb 6, 2023

in general, if we do not want to rely on parsing rules as there are too many cornercases (markdown is only apparently simple, reminds me very much to yaml :) we could also leave the formatting characters in the resulting text.

i have seen this here and there, and think this not the worse compromise, also makes adaption easier and a bad rule does not break things badly:

imagine passwords sent over delta chat with markdown support: this may be close to impossible, at least unreliable, if we introduce markdown

@Simon-Laux
Copy link
Member Author

but it may still be regarded as a bug and ppl may be wondering.

we define what we want to support in https://github.com/deltachat/message-parser/blob/master/spec.md

maybe also the parser can maybe do better, eg. allowing _ on word boundaries only (which seems to be a general difference to the asterix - also here on github, inside words asterix is changing style - where underscore is not)

sure that's an option, but also more complicated than just removing underscore or saying that only on wordboundries is allowed. a side benefit of only on word boundaries is that it could make the parsing slightly faster.

in general, if we do not want to rely on parsing rules as there are too many cornercases (markdown is only apparently simple, reminds me very much to yaml :) we could also leave the formatting characters in the resulting text.

also a valid angle, but does not look as good.
Maybe rich formatting really needs a special editor that auto escapes stuff you paste into it, but that's complicated and even element doesn't get it right (I had problems with it).

imagine passwords sent over delta chat with markdown support: this may be close to impossible, at least unreliable, if we introduce markdown

I think element auto-escapes stuff you paste into it, another option would be to allow users to toggle markdown rendering in their messages... but right there are more open questions to the whole rich message topic, also wether we want to send messages as html or not.

@Simon-Laux
Copy link
Member Author

another example I came across: #deltachat_desktop becomes #deltachatdesktop, with desktop being displayed in italics font.

@Simon-Laux
Copy link
Member Author

another example I came across: #deltachat_desktop becomes #deltachatdesktop, with desktop being displayed in italics font.

the text emote -_- can also falsely trigger underline. again I think we should just remove it from the spec.

@Simon-Laux
Copy link
Member Author

Simon-Laux commented Oct 31, 2023

just tested WhatsApp, it has _ for italics and * for bold, but is more clever about it:

_hi_ - triggers italics
_hi_hi - does not trigger italics
_hi_hi_ - triggers italics
_hi _hi_ - triggers italics
_hi _hi _ - does not trigger italics

so italics only works when started and ended outside of a word. Which could also be a solution for us here.

@Simon-Laux Simon-Laux added the bug Something isn't working label Oct 31, 2023
@Simon-Laux Simon-Laux changed the title markdown - spec - remove _ for italics and bold markdown - spec - _ is problematic Oct 31, 2023
@Simon-Laux Simon-Laux changed the title markdown - spec - _ is problematic markdown - spec - current implementation of _ is problematic Oct 31, 2023
@adbenitez
Copy link
Member

another bug in the implementation of _ is:

_[test](https://example)_

which should be displayed as:

test

is actually displayed as:

[test](https://example)_

(the [ is italic while the rest of the link is not parsed)

@Simon-Laux Simon-Laux added the spec issue that involves the specification label Apr 29, 2024
@link2xt
Copy link
Contributor

link2xt commented May 23, 2024

This is also how pandoc handles it:

pandoc -f markdown -t html
_foo_bar_baz_
<p><em>foo_bar_baz</em></p>
pandoc -f markdown -t html
foo_bar_baz
<p>foo_bar_baz</p>

CommonMark has a very precise definition:
https://spec.commonmark.org/0.31.2/#emphasis-and-strong-emphasis

@farooqkz farooqkz self-assigned this Jun 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working spec issue that involves the specification
Projects
None yet
Development

No branches or pull requests

5 participants