Goodreads.com is a comprehensive list of top-rated books, as voted on by the general Goodreads community.
We will use Python, BeautifulSoup and Requests to scrape first 5 pages and create list of top 500 books and some interesting information on them.
Outline of the Project:
- Exploration and scrapping information from 1 page
- Download a single page from goodread.com and store it
- Scrape the stored page, and extract the required data from the page with BeautifulSoup
- Create a dictionary to store the book information
- Write separate functions to scrape a particular information from the BeautifulSoup document, and add it to the dictionary
- Repeat this for any number of pages by appending new items to the dictionary
- Store this in a Panda detaframe
- Save dataframe to a csv file