You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Seeing extraction errors on certain websites that have titles.
File "/usr/local/lib/python2.7/site-packages/ContentAnalysis-0.1.1-py2.7.egg/ContentAnalysis/document.py", line 53, in parse ginfo = g.extract(url=self.link) File "/usr/local/lib/python2.7/site-packages/goose/__init__.py", line 56, in extract return self.crawl(cc) File "/usr/local/lib/python2.7/site-packages/goose/__init__.py", line 66, in crawl article = crawler.crawl(crawl_candiate) File "/usr/local/lib/python2.7/site-packages/goose/crawler.py", line 154, in crawl self.article.title = self.title_extractor.extract() File "/usr/local/lib/python2.7/site-packages/goose/extractors/title.py", line 99, in extract return self.get_title() File "/usr/local/lib/python2.7/site-packages/goose/extractors/title.py", line 78, in get_title return self.clean_title(title) File "/usr/local/lib/python2.7/site-packages/goose/extractors/title.py", line 56, in clean_title if title_words[0] in TITLE_SPLITTERS: IndexError: list index out of range
You can replicate by running goose extract on a site like http://daydreamingfoodie.com/
The text was updated successfully, but these errors were encountered:
Seeing extraction errors on certain websites that have titles.
File "/usr/local/lib/python2.7/site-packages/ContentAnalysis-0.1.1-py2.7.egg/ContentAnalysis/document.py", line 53, in parse ginfo = g.extract(url=self.link) File "/usr/local/lib/python2.7/site-packages/goose/__init__.py", line 56, in extract return self.crawl(cc) File "/usr/local/lib/python2.7/site-packages/goose/__init__.py", line 66, in crawl article = crawler.crawl(crawl_candiate) File "/usr/local/lib/python2.7/site-packages/goose/crawler.py", line 154, in crawl self.article.title = self.title_extractor.extract() File "/usr/local/lib/python2.7/site-packages/goose/extractors/title.py", line 99, in extract return self.get_title() File "/usr/local/lib/python2.7/site-packages/goose/extractors/title.py", line 78, in get_title return self.clean_title(title) File "/usr/local/lib/python2.7/site-packages/goose/extractors/title.py", line 56, in clean_title if title_words[0] in TITLE_SPLITTERS: IndexError: list index out of range
You can replicate by running goose extract on a site like
http://daydreamingfoodie.com/
The text was updated successfully, but these errors were encountered: