Skip to content

This web scraping project is a tool to find the most popular/important stories (those with 100+ points) on the first two pages of the hacker news site.

Notifications You must be signed in to change notification settings

kdhenderson/hackernews_scraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 

Repository files navigation

hackernews_scraper

This web scraping project is a tool to find the most popular/important stories (those with 100+ points) on the first two pages of the hacker news site.

Click here to run the code in your browser: Binder

  • Click on the 'New' dropdown button and select 'Terminal'
  • Run the program from the command line like this: $ python3 scrape.py

This is what the program does step-by-step:

  • Retrieves data from the first 2 pages of the the hacker news URL using the requests library get method.
  • Parses the html with Beautiful Soup.
  • Creates lists of the links and the subtext (that contains the vote count) for each article with the appropriate CSS selectors using the select method.
  • Enumerates the links (to generate an index) and loops through them, grabbing the title of the article and the link to it with the getText and get methods.
  • Finds stories with votes using the index for each link, converts the vote count to an integer, and for articles that have 100 or more votes, adds the title, link and votes as a dictionary to a new list of hacker news articles to read.
  • Sorts the stories in decending order by the votes dictionary key using a lambda function.
  • Uses the pretty print module to print the article list in more readable format.

Dependencies:

  • Install the requirements using pip install -r requirements.txt.
  • Make sure you use Python 3.
  • You may want to use a virtual environment for this.

Usage:

  • Run the program from the command line.

About

This web scraping project is a tool to find the most popular/important stories (those with 100+ points) on the first two pages of the hacker news site.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages