Skip to content

Commit

Permalink
Parser should ignore anchor tags without an href attribute.
Browse files Browse the repository at this point in the history
  • Loading branch information
youngbrioche committed Feb 9, 2011
1 parent 30b2c95 commit a0a6de1
Show file tree
Hide file tree
Showing 2 changed files with 2 additions and 2 deletions.
2 changes: 1 addition & 1 deletion lib/rawler/crawler.rb
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ def links
response = Rawler::Request.get(url)

doc = Nokogiri::HTML(response.body)
doc.css('a').map { |a| a['href'] }.map { |url| absolute_url(url) }.select { |url| valid_url?(url) }
doc.css('a').map { |a| a['href'] }.select { |url| !url.nil? }.map { |url| absolute_url(url) }.select { |url| valid_url?(url) }
rescue Errno::ECONNREFUSED
write("Couldn't connect to #{url}")
[]
Expand Down
2 changes: 1 addition & 1 deletion spec/lib/rawler/crawler_spec.rb
Original file line number Diff line number Diff line change
Expand Up @@ -85,7 +85,7 @@
let(:url) { 'http://example.com/path' }
let(:crawler) { Rawler::Crawler.new(url) }
let(:js_url) { "javascript:fn('nbjmup;jhfs.esf{fio/dpn');" }
let(:content) { "<a href=\"#{js_url}\">foo</a>" }
let(:content) { "<a href=\"#{js_url}\">foo</a><a name=\"foo\">" }

before(:each) do
register(url, content)
Expand Down

0 comments on commit a0a6de1

Please sign in to comment.