Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Default reader gets wrong content #25

Open
joeshaw opened this issue Apr 3, 2013 · 0 comments
Open

Default reader gets wrong content #25

joeshaw opened this issue Apr 3, 2013 · 0 comments

Comments

@joeshaw
Copy link

joeshaw commented Apr 3, 2013

Ran into an issue with Pismo's default reader returning the wrong section of an HTML document for its body/html_body fields. It does work, however, with the cluster reader. This might be a good addition to the test corpus for the default reader.

http://www.universalhub.com/2013/touchy-tabloid-tries-wreck-globe-story

The default reader seems to pull content from <div id="navbar"> rather than <div id="content">.

>> doc = Pismo::Document.new("http://www.universalhub.com/2013/touchy-tabloid-tries-wreck-globe-story")
>> doc.body
=> "* The T\n* Casinos\n* News by neighborhood\n* Crime\n* Fires\n* Boston Store\n* Photos\n* Boston English\n* Restrooms\n* Blogs"

>> doc = Pismo::Document.new("http://www.universalhub.com/2013/touchy-tabloid-tries-wreck-globe-story", :reader => :cluster)
>> doc.body
=> "Sour grapes at the Herald? With bonus gratuitous quote from some lawyer making accusations with no apparent facts behind them:\nIf he was a reporter on deadline and he's distracted and making phone calls and texting, then that's something that adds to his fault. You're not supposed to be distracted in a cab, you're supposed to focus fully on your job,\" said Douglas Sheff, a Boston personal injury lawyer and president-elect of the Massachusetts Bar Association.\nDoes the esquire have any proof the reporter was on deadline and making phone calls and texting right before the crash? If so, he and the Herald failed to produce it."

(Originally reported in feedbin/support#35)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant