Skip to content

rasheed-aidetic/EcomScraper

Repository files navigation

EcomScraper

EcomScraper is a Python project designed for scraping data from various e-commerce platforms like Shopify, Wix, WooCommerce, and more. This project is structured into multiple modules to organize the scrapers, database management, utilities, and configuration files.

Project Structure

EcomScraper/
├── custom_scrappers/
│   ├── bobbi_brown_scrapper.py
│   └── scraper.py
├── db/
│   ├── database.py
│   └── images/
├── shopify_scraper/
│   └── scraper.py
├── utils/
│   └── utils.py
├── wix_scrapper/
│   └── scraper.py
├── woocommerce_scrapper/
│   └── scraper.py
├── .env
├── .env.sample
├── .gitignore
├── config.py
├── database.sqlite3
├── main.py
└── requirements.txt

Folders and Files

1. custom_scrappers/

This folder contains custom scrapers for specific websites.

  • bobbi_brown_scrapper.py: A scraper tailored for the Bobbi Brown e-commerce site.
  • scraper.py: General scraper script for custom websites.

2. db/

Database management module for handling and storing scraped data.

  • database.py: Database connection and query functions.
  • images/: Directory to store downloaded product images.

3. shopify_scraper/

Contains scripts specific to scraping Shopify-based websites.

  • scraper.py: Script for scraping Shopify products, categories, etc.

4. utils/

Utility functions used across the project for various helper tasks.

  • utils.py: General utility functions to support scraping tasks.

5. wix_scrapper/

Module designed for scraping data from Wix-based websites.

  • scraper.py: Wix scraper for extracting products and related data.

6. woocommerce_scrapper/

Contains scripts to scrape data from WooCommerce-based websites.

  • scraper.py: WooCommerce scraper script.

7. main.py

Main entry point of the project that initiates the scraping process. It includes the following key components:

  • initialize_db(): Initializes the database for storing scraped data.
  • process_website(): Determines the platform type (Shopify, WooCommerce, Wix, or custom) and calls the appropriate scraper function.
  • Uses concurrent.futures for parallel processing of multiple websites.

8. config.py

Configuration file that defines the list of websites to scrape and the database path.

  • WEBSITES: List of URLs to be scraped. Add or remove URLs in this list as required.
  • DB_PATH: Path to the SQLite database. The path is fetched from the environment variable SQLITE_DB_PATH.

9. .env & .env.sample

Environment configuration files:

  • .env: Contains sensitive environment variables (e.g., SQLITE_DB_PATH for database path).
  • .env.sample: Sample environment file to set up necessary variables.

10. requirements.txt

File listing all the required Python packages to run the project.

Setup Instructions

  1. Clone the repository:
    git clone https://github.com/yourusername/EcomScraper.git
  2. Navigate to the project directory:
    cd EcomScraper
  3. Install the dependencies:
    pip install -r requirements.txt
  4. Set up environment variables:
    • Rename .env.sample to .env.
    • In the .env file, define SQLITE_DB_PATH with the path to your SQLite database.
  5. Update config.py:
    • Add URLs of e-commerce websites to the WEBSITES list.
  6. Run the main script to start scraping:
    python main.py

Usage

To initiate scraping, simply run main.py. The script will check each website’s platform and use the appropriate scraper function. The scraping progress and any errors will be displayed in the console output.

License

This project is licensed under the MIT License.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages