Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UTF-8 encoder/decoder title is misleading #53

Open
zvodd opened this issue Mar 24, 2020 · 6 comments
Open

UTF-8 encoder/decoder title is misleading #53

zvodd opened this issue Mar 24, 2020 · 6 comments

Comments

@zvodd
Copy link

zvodd commented Mar 24, 2020

This is problematic.
There are people posting on stack overflow and being generally confused by what UTF-8 is.
In part, thanks to the complete lack of clarification on the page https://mothereff.in/utf-8

@mathiasbynens
Copy link
Owner

Could you post a link to an example Stack Overflow post where this confusion occurs?

Would it help to link to https://encoding.spec.whatwg.org/#utf-8?

@zvodd
Copy link
Author

zvodd commented Mar 25, 2020

Yes, sorry, I had intended to elaborate earlier.
I think it would be helpful to have small disclaimer, explaining that the app/page encodes JavaScript strings (which are UTF-8) into escaped hexadecimal byte strings, and vise versa.

"Disclaimer:
JavaScript, which encoder is made with, internally represents all strings in UTF-8 format. This page converts UTF-8 strings to escaped hexadecimal values (byte literals), e.g. "\x41\x42\x43" and vise versa. It also decodes escaped Unicode literals e.g. "\u0041\u0042\u0043" into plain strings.
Consider viewing this link for more information about UTF-8 and Unicode."
...or something like that.

I think otherwise, new players will get the notion that hexadecimal escape sequences are the definition of UTF-8.

https://stackoverflow.com/search?q=https%3A%2F%2Fmothereff.in%2Futf-8
https://stackoverflow.com/questions/60839049/is-there-any-function-to-get-utf-8-encoding-output-text-in-python

@mathiasbynens
Copy link
Owner

mathiasbynens commented Mar 25, 2020

JavaScript, which encoder is made with, internally represents all strings in UTF-8 format.

This is false. See https://mathiasbynens.be/notes/javascript-encoding.

This page converts UTF-8 strings to escaped hexadecimal values (byte literals), e.g. "\x41\x42\x43" and vise versa.

This is also incorrect. The page takes Unicode strings as input and converts them to their UTF-8-encoded byte sequence, in the format \x41\x42\x43.

It also decodes escaped Unicode literals e.g. "\u0041\u0042\u0043" into plain strings.

The tool does no such thing. Are you thinking of https://mothereff.in/js-escapes?

@zvodd
Copy link
Author

zvodd commented Mar 25, 2020

My mistake, on the internal representation.

On the second point, fair and accurate, as they are Unicode strings. 👍
My point about an explanation being included in the page still stands.

On the third, I was just reporting the observed behavior of the script. So, not sure about that?
Thought it was an intended feature.
https://i.imgur.com/4VALCkr.png

@mathiasbynens
Copy link
Owner

@zvodd What's the URL you're seeing that on, and in which browser are you seeing this? https://mothereff.in/utf-8#ABC produces the expected output of \x41\x42\x43 for me.

@zvodd
Copy link
Author

zvodd commented Mar 29, 2020

The user input is from the bottom so user input is \u0041\u0042\u0043; and upon close inspection the decoder seems to have a weird case of accepting string parts beginning with "\u" followed by 2 "0" and a 2 digit hex number, and interprets them as if it were reading the escaped version of that 2 digit hex number.
So that part of my description was also inaccurate.

Let me try to be clear, I am not hear to try to tell you I know how you encoder works better than you. I am just saying that I think the description of UT8 encoding on the page should let users know that the "\x00" format of output is a hexadecimal format escape string and not exclusively related to UTF-8.

My initial attempt at the explanation was an example with a number of inaccuracies, for which I apologize.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants