Skip to content

This repository contains a Python web scraper for extracting book data from the Books to Scrape website. The scraper gathers information such as titles, prices, availability, ratings, and thumbnail images, and saves the data in a CSV file while downloading thumbnails locally. Perfect for practicing web scraping with BeautifulSoup and pandas.

License

Notifications You must be signed in to change notification settings

kawsarlog/books-to-scrape-web-scraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Books to Scrape Web Scraper

Python BeautifulSoup pandas requests

This project contains a web scraper that extracts data from the website Books to Scrape. The scraper gathers information about books, including titles, prices, availability, ratings, and thumbnails, and saves the data in a CSV file. Thumbnails are also downloaded and saved locally.

Features

  • Scrapes book details including title, price, availability, rating, and thumbnail URL.
  • Downloads and saves thumbnail images locally.
  • Saves extracted data to a CSV file in a structured format.
  • Processes the first 10 pages of the website.

Requirements

  • Python 3.8+
  • BeautifulSoup 4.9.3+
  • pandas 1.2.0+
  • requests 2.25.1+

Installation

  1. Clone the repository:

    git clone https://github.com/your-username/books-to-scrape-web-scraper.git
    cd books-to-scrape-web-scraper
  2. Create a virtual environment:

    python -m venv venv
    source venv/bin/activate  # On Windows, use `venv\Scripts\activate`
  3. Install the required packages:

    pip install -r requirements.txt

Usage

  1. Run the scraper script:

    python scrape_books.py
  2. The script will extract data from the first 10 pages of the website, save the data to a CSV file located in the data_sheet directory, and download thumbnails to the images directory.

Output

  • data_sheet/books_data.csv: Contains the scraped book details.
  • images/: Contains the downloaded thumbnail images.

Video

For a detailed tutorial on how to use this script, please refer to the Books to Scrape 📚. Watch the video

Directory Structure

To help organize your project, here's a suggested directory structure:

books-to-scrape-web-scraper/
├── data_sheet/
│   └── books_data.csv
├── images/
│   └── (thumbnails)
├── scrape_books.py
├── requirements.txt
└── README.md
flowchart TD
    A([Start]) --> B[Initialize base URLs and create directories]
    B --> C{Loop through pages 1 to 10}
    C --> D[Request page content]
    D --> E[Parse HTML content]
    E --> F[Extract book details]
    F --> G[Save book thumbnail]
    G --> H[Append details to the list]
    H --> I[Save data to CSV file]
    I --> J([End])
    style A fill:#f96,stroke:#333,stroke-width:2px
    style B fill:#bbf,stroke:#333,stroke-width:2px
    style C fill:#ff9,stroke:#333,stroke-width:2px
    style D fill:#bbf,stroke:#333,stroke-width:2px
    style E fill:#ff9,stroke:#333,stroke-width:2px
    style F fill:#bbf,stroke:#333,stroke-width:2px
    style G fill:#ff9,stroke:#333,stroke-width:2px
    style H fill:#bbf,stroke:#333,stroke-width:2px
    style I fill:#f96,stroke:#333,stroke-width:2px
    style J fill:#f96,stroke:#333,stroke-width:2px

Loading

About

This repository contains a Python web scraper for extracting book data from the Books to Scrape website. The scraper gathers information such as titles, prices, availability, ratings, and thumbnail images, and saves the data in a CSV file while downloading thumbnails locally. Perfect for practicing web scraping with BeautifulSoup and pandas.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages