Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ASCII-compatible charset checking method. #87

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

mscdex
Copy link
Contributor

@mscdex mscdex commented Jan 11, 2015

It can be useful to check for ASCII-compatible charsets when decoding data that is known to only have bytes that fall within the ASCII range. Doing such a check avoids having to do useless decoding.

It can be useful to check for ASCII-compatible
charsets when decoding data that is known to
only have bytes that fall within the ASCII range.
@ashtuchkin
Copy link
Owner

Interesting. Although I'm sure it can be done without additional generated file.
Could you describe in more detail what you mean by "ASCII-compatible" charset? How that'll save you decoding? In my view, utf8 is not ASCII-compatible at all)

@mscdex
Copy link
Contributor Author

mscdex commented Jan 12, 2015

ASCII-compatible means bytes 0x00-0x7F in an encoding are all ASCII characters/bytes and not some other characters. Many character sets are compatible in this way, but some are not.

Checking whether a destination encoding is ASCII-compatible is useful if you are already traversing binary data and you can check whether each byte is <= 0x7F. If there are no bytes above 0x7F and the encoding is ASCII-compatible, you don't have to run the entire set of data through a decoder which will end up giving you back the same string anyway.

The reason I use a pre-generated file for this is that it's significantly faster to use an object (with fast properties) than to do a lookup on the fly.

@ashtuchkin
Copy link
Owner

Got it. Makes sense.
I'm thinking to reuse current caching architecture, but avoid making it part of public api yet (it'd require tests, docs, etc.). What do you think about iconv.getCodec(encodingName).asciiCompatible? It's cached, but must load the data for codec once.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants