EcomScraper is a Python project designed for scraping data from various e-commerce platforms like Shopify, Wix, WooCommerce, and more. This project is structured into multiple modules to organize the scrapers, database management, utilities, and configuration files.
EcomScraper/ ├── custom_scrappers/ │ ├── bobbi_brown_scrapper.py │ └── scraper.py ├── db/ │ ├── database.py │ └── images/ ├── shopify_scraper/ │ └── scraper.py ├── utils/ │ └── utils.py ├── wix_scrapper/ │ └── scraper.py ├── woocommerce_scrapper/ │ └── scraper.py ├── .env ├── .env.sample ├── .gitignore ├── config.py ├── database.sqlite3 ├── main.py └── requirements.txt
This folder contains custom scrapers for specific websites.
bobbi_brown_scrapper.py
: A scraper tailored for the Bobbi Brown e-commerce site.scraper.py
: General scraper script for custom websites.
Database management module for handling and storing scraped data.
database.py
: Database connection and query functions.images/
: Directory to store downloaded product images.
Contains scripts specific to scraping Shopify-based websites.
scraper.py
: Script for scraping Shopify products, categories, etc.
Utility functions used across the project for various helper tasks.
utils.py
: General utility functions to support scraping tasks.
Module designed for scraping data from Wix-based websites.
scraper.py
: Wix scraper for extracting products and related data.
Contains scripts to scrape data from WooCommerce-based websites.
scraper.py
: WooCommerce scraper script.
Main entry point of the project that initiates the scraping process. It includes the following key components:
initialize_db()
: Initializes the database for storing scraped data.process_website()
: Determines the platform type (Shopify, WooCommerce, Wix, or custom) and calls the appropriate scraper function.- Uses
concurrent.futures
for parallel processing of multiple websites.
Configuration file that defines the list of websites to scrape and the database path.
WEBSITES
: List of URLs to be scraped. Add or remove URLs in this list as required.DB_PATH
: Path to the SQLite database. The path is fetched from the environment variableSQLITE_DB_PATH
.
Environment configuration files:
.env
: Contains sensitive environment variables (e.g.,SQLITE_DB_PATH
for database path)..env.sample
: Sample environment file to set up necessary variables.
File listing all the required Python packages to run the project.
- Clone the repository:
git clone https://github.com/yourusername/EcomScraper.git
- Navigate to the project directory:
cd EcomScraper
- Install the dependencies:
pip install -r requirements.txt
- Set up environment variables:
- Rename
.env.sample
to.env
. - In the
.env
file, defineSQLITE_DB_PATH
with the path to your SQLite database.
- Rename
- Update
config.py
:- Add URLs of e-commerce websites to the
WEBSITES
list.
- Add URLs of e-commerce websites to the
- Run the main script to start scraping:
python main.py
To initiate scraping, simply run main.py
. The script will check each website’s platform and use the appropriate scraper function. The scraping progress and any errors will be displayed in the console output.
This project is licensed under the MIT License.