Add ASCII-compatible charset checking method. #87

mscdex · 2015-01-11T21:24:22Z

It can be useful to check for ASCII-compatible charsets when decoding data that is known to only have bytes that fall within the ASCII range. Doing such a check avoids having to do useless decoding.

It can be useful to check for ASCII-compatible charsets when decoding data that is known to only have bytes that fall within the ASCII range.

ashtuchkin · 2015-01-12T14:43:42Z

Interesting. Although I'm sure it can be done without additional generated file.
Could you describe in more detail what you mean by "ASCII-compatible" charset? How that'll save you decoding? In my view, utf8 is not ASCII-compatible at all)

mscdex · 2015-01-12T14:52:45Z

ASCII-compatible means bytes 0x00-0x7F in an encoding are all ASCII characters/bytes and not some other characters. Many character sets are compatible in this way, but some are not.

Checking whether a destination encoding is ASCII-compatible is useful if you are already traversing binary data and you can check whether each byte is <= 0x7F. If there are no bytes above 0x7F and the encoding is ASCII-compatible, you don't have to run the entire set of data through a decoder which will end up giving you back the same string anyway.

The reason I use a pre-generated file for this is that it's significantly faster to use an object (with fast properties) than to do a lookup on the fly.

ashtuchkin · 2015-01-12T15:13:26Z

Got it. Makes sense.
I'm thinking to reuse current caching architecture, but avoid making it part of public api yet (it'd require tests, docs, etc.). What do you think about iconv.getCodec(encodingName).asciiCompatible? It's cached, but must load the data for codec once.

Add ASCII-compatible charset checking method.

6536abd

It can be useful to check for ASCII-compatible charsets when decoding data that is known to only have bytes that fall within the ASCII range.

ashtuchkin force-pushed the master branch from 978c58b to 5148f43 Compare June 8, 2020 08:19

ashtuchkin force-pushed the master branch 4 times, most recently from 84ee650 to 9aa082f Compare July 16, 2020 08:07

ashtuchkin force-pushed the master branch from 5d99a92 to ed88711 Compare May 23, 2021 22:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add ASCII-compatible charset checking method. #87

Add ASCII-compatible charset checking method. #87

mscdex commented Jan 11, 2015

ashtuchkin commented Jan 12, 2015

mscdex commented Jan 12, 2015

ashtuchkin commented Jan 12, 2015

Add ASCII-compatible charset checking method. #87

Are you sure you want to change the base?

Add ASCII-compatible charset checking method. #87

Conversation

mscdex commented Jan 11, 2015

ashtuchkin commented Jan 12, 2015

mscdex commented Jan 12, 2015

ashtuchkin commented Jan 12, 2015