Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scrub only after converting strings to UTF-8 #4418

Merged
merged 1 commit into from
Aug 25, 2016

Conversation

mpalmer
Copy link
Contributor

@mpalmer mpalmer commented Aug 25, 2016

Scrubbing an ASCII-8BIT string isn't ever going to remove anything, because there's no code point that isn't valid 8-bit ASCII. Since we'd really prefer it if everything were UTF-8 anyway, we'll just assume, for now, that whatever comes out of SimpleRSS is probably UTF-8, and just nuke anything that isn't a valid UTF-8 codepoint.

Of course, the real bug here is that SimpleRSS unilaterally converts everything to ASCII-8BIT. It's presumably far too much to ask that it detects the encoding of the source RSS feed and marks the parsed strings with the correct encoding...

Scrubbing an ASCII-8BIT string isn't ever going to remove anything, because
there's no code point that isn't valid 8-bit ASCII.  Since we'd really
prefer it if everything were UTF-8 anyway, we'll just assume, for now, that
whatever comes out of SimpleRSS is probably UTF-8, and just nuke anything
that isn't a valid UTF-8 codepoint.

Of course, the *real* bug here is that SimpleRSS [unilaterally converts
everything to
ASCII-8BIT](cardmagic/simple-rss#15).  It's
presumably *far* too much to ask that it detects the encoding of the source
RSS feed and marks the parsed strings with the correct encoding...
@discoursebot
Copy link

Thanks for contributing this pull request! Could you please sign our CLA so we can review it? http://www.discourse.org/cla

@mpalmer mpalmer merged commit d72730f into discourse:master Aug 25, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants