Skip to content

Commit

Permalink
Scrub only after converting strings to UTF-8
Browse files Browse the repository at this point in the history
Scrubbing an ASCII-8BIT string isn't ever going to remove anything, because
there's no code point that isn't valid 8-bit ASCII.  Since we'd really
prefer it if everything were UTF-8 anyway, we'll just assume, for now, that
whatever comes out of SimpleRSS is probably UTF-8, and just nuke anything
that isn't a valid UTF-8 codepoint.

Of course, the *real* bug here is that SimpleRSS [unilaterally converts
everything to
ASCII-8BIT](cardmagic/simple-rss#15).  It's
presumably *far* too much to ask that it detects the encoding of the source
RSS feed and marks the parsed strings with the correct encoding...
  • Loading branch information
mpalmer committed Aug 25, 2016
1 parent 846a08d commit 394cd43
Showing 1 changed file with 2 additions and 2 deletions.
4 changes: 2 additions & 2 deletions app/jobs/scheduled/poll_feed.rb
Original file line number Diff line number Diff line change
Expand Up @@ -86,11 +86,11 @@ def url
end

def content
@article_rss_item.content.try(:scrub) || @article_rss_item.description.try(:scrub)
@article_rss_item.content.try(:force_encoding, "UTF-8").try(:scrub) || @article_rss_item.description.try(:force_encoding, "UTF-8").try(:scrub)
end

def title
@article_rss_item.title.scrub
@article_rss_item.title.force_encoding("UTF-8").scrub
end

def user
Expand Down

0 comments on commit 394cd43

Please sign in to comment.