Books to Scrape Web Scraper

This project contains a web scraper that extracts data from the website Books to Scrape. The scraper gathers information about books, including titles, prices, availability, ratings, and thumbnails, and saves the data in a CSV file. Thumbnails are also downloaded and saved locally.

Features

Scrapes book details including title, price, availability, rating, and thumbnail URL.
Downloads and saves thumbnail images locally.
Saves extracted data to a CSV file in a structured format.
Processes the first 10 pages of the website.

Requirements

Python 3.8+
BeautifulSoup 4.9.3+
pandas 1.2.0+
requests 2.25.1+

Installation

Clone the repository:

git clone https://github.com/your-username/books-to-scrape-web-scraper.git
cd books-to-scrape-web-scraper

Create a virtual environment:

python -m venv venv
source venv/bin/activate  # On Windows, use `venv\Scripts\activate`

Install the required packages:
```
pip install -r requirements.txt
```

Usage

Run the scraper script:
```
python scrape_books.py
```
The script will extract data from the first 10 pages of the website, save the data to a CSV file located in the data_sheet directory, and download thumbnails to the images directory.

Output

data_sheet/books_data.csv: Contains the scraped book details.
images/: Contains the downloaded thumbnail images.

Video

For a detailed tutorial on how to use this script, please refer to the Books to Scrape 📚.

Directory Structure

To help organize your project, here's a suggested directory structure:

books-to-scrape-web-scraper/
├── data_sheet/
│   └── books_data.csv
├── images/
│   └── (thumbnails)
├── scrape_books.py
├── requirements.txt
└── README.md

flowchart TD
    A([Start]) --> B[Initialize base URLs and create directories]
    B --> C{Loop through pages 1 to 10}
    C --> D[Request page content]
    D --> E[Parse HTML content]
    E --> F[Extract book details]
    F --> G[Save book thumbnail]
    G --> H[Append details to the list]
    H --> I[Save data to CSV file]
    I --> J([End])
    style A fill:#f96,stroke:#333,stroke-width:2px
    style B fill:#bbf,stroke:#333,stroke-width:2px
    style C fill:#ff9,stroke:#333,stroke-width:2px
    style D fill:#bbf,stroke:#333,stroke-width:2px
    style E fill:#ff9,stroke:#333,stroke-width:2px
    style F fill:#bbf,stroke:#333,stroke-width:2px
    style G fill:#ff9,stroke:#333,stroke-width:2px
    style H fill:#bbf,stroke:#333,stroke-width:2px
    style I fill:#f96,stroke:#333,stroke-width:2px
    style J fill:#f96,stroke:#333,stroke-width:2px

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
scrape_books.py		scrape_books.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Books to Scrape Web Scraper

Features

Requirements

Installation

Usage

Output

Video

Directory Structure

About

Releases

Packages

Languages

License

kawsarlog/books-to-scrape-web-scraper

Folders and files

Latest commit

History

Repository files navigation

Books to Scrape Web Scraper

Features

Requirements

Installation

Usage

Output

Video

Directory Structure

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages