You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Ran into an issue with Pismo's default reader returning the wrong section of an HTML document for its body/html_body fields. It does work, however, with the cluster reader. This might be a good addition to the test corpus for the default reader.
The default reader seems to pull content from <div id="navbar"> rather than <div id="content">.
>> doc = Pismo::Document.new("http://www.universalhub.com/2013/touchy-tabloid-tries-wreck-globe-story")
>> doc.body
=> "* The T\n* Casinos\n* News by neighborhood\n* Crime\n* Fires\n* Boston Store\n* Photos\n* Boston English\n* Restrooms\n* Blogs"
>> doc = Pismo::Document.new("http://www.universalhub.com/2013/touchy-tabloid-tries-wreck-globe-story", :reader => :cluster)
>> doc.body
=> "Sour grapes at the Herald? With bonus gratuitous quote from some lawyer making accusations with no apparent facts behind them:\nIf he was a reporter on deadline and he's distracted and making phone calls and texting, then that's something that adds to his fault. You're not supposed to be distracted in a cab, you're supposed to focus fully on your job,\" said Douglas Sheff, a Boston personal injury lawyer and president-elect of the Massachusetts Bar Association.\nDoes the esquire have any proof the reporter was on deadline and making phone calls and texting right before the crash? If so, he and the Herald failed to produce it."
Ran into an issue with Pismo's default reader returning the wrong section of an HTML document for its
body
/html_body
fields. It does work, however, with the cluster reader. This might be a good addition to the test corpus for the default reader.http://www.universalhub.com/2013/touchy-tabloid-tries-wreck-globe-story
The default reader seems to pull content from
<div id="navbar">
rather than<div id="content">
.(Originally reported in feedbin/support#35)
The text was updated successfully, but these errors were encountered: