From 394cd43d773cc4b32d090a1b4ff36e529c272fc4 Mon Sep 17 00:00:00 2001 From: Matt Palmer Date: Thu, 25 Aug 2016 16:00:20 +1000 Subject: [PATCH] Scrub only after converting strings to UTF-8 Scrubbing an ASCII-8BIT string isn't ever going to remove anything, because there's no code point that isn't valid 8-bit ASCII. Since we'd really prefer it if everything were UTF-8 anyway, we'll just assume, for now, that whatever comes out of SimpleRSS is probably UTF-8, and just nuke anything that isn't a valid UTF-8 codepoint. Of course, the *real* bug here is that SimpleRSS [unilaterally converts everything to ASCII-8BIT](https://github.com/cardmagic/simple-rss/issues/15). It's presumably *far* too much to ask that it detects the encoding of the source RSS feed and marks the parsed strings with the correct encoding... --- app/jobs/scheduled/poll_feed.rb | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/app/jobs/scheduled/poll_feed.rb b/app/jobs/scheduled/poll_feed.rb index d5ab5b6bc1011..8cfed44571412 100644 --- a/app/jobs/scheduled/poll_feed.rb +++ b/app/jobs/scheduled/poll_feed.rb @@ -86,11 +86,11 @@ def url end def content - @article_rss_item.content.try(:scrub) || @article_rss_item.description.try(:scrub) + @article_rss_item.content.try(:force_encoding, "UTF-8").try(:scrub) || @article_rss_item.description.try(:force_encoding, "UTF-8").try(:scrub) end def title - @article_rss_item.title.scrub + @article_rss_item.title.force_encoding("UTF-8").scrub end def user