Skip to content

Initial Release: v1.0 Latest

Latest
Compare
Choose a tag to compare
@msmannan00 msmannan00 released this 05 Dec 11:26
· 49 commits to trusted-main since this release
ee3f10e

Initial Release: Dark Web Monitoring Webcrawler v1.0

We are excited to announce the initial release of the Dark Web Monitoring Webcrawler, a robust, scalable, and secure tool designed to monitor activities on the dark web. This repository provides a framework for collecting and analyzing dark web data with a focus on privacy and security.

Core Features

  1. Docker-Based Deployment: Quick and seamless setup using Docker Compose to orchestrate services.

  2. Advanced Search Functionality: Comprehensive search capabilities with options to filter and refine results.

  3. Data Visualization: Generates visual representations of crawled data for easier analysis.

  4. Customizable Search Parsers: Supports integration of custom parsers to enhance data extraction from specific websites.

  5. Integrated Machine Learning Models: Uses NLP and machine learning models for content categorization, search relevance, and detection of data patterns.


Prerequisites

Ensure the following tools are installed on your system:

  • Python
  • Docker
  • Docker Compose

Installation

Step 1: Clone the Repository

git clone https://github.com/yourusername/dark-web-monitoring-webcrawler.git
cd dark-web-monitoring-webcrawler

Step 2: Build and Start the Docker

docker-compose up --build

This command will build and start the following services:

  • API Service (api): The main webcrawler service.
  • MongoDB (mongo): Stores crawled data.
  • Redis (redis_server): Manages caching and task queuing.
  • Tor Containers (tor-extend-*): Ensures robust anonymity by routing traffic through different Tor exit nodes.

Usage

Running the Webcrawler

Option 1: Direct Execution

  1. Copy the app/libs/nltk_data folder to the appropriate directory:
    • Windows: appdata directory.
    • Linux: Home directory.
  2. Navigate to the Genesis-Crawler/app/ directory.
  3. Start the crawler:
    python main_direct.py

Option 2: Using Docker

  1. Use Docker Compose to build and start the webcrawler:
    docker-compose up --build

Configuring Tor Instances

Each Tor container is configured to run as a separate instance, routing traffic through different Tor exit nodes. This increases anonymity and reduces the chances of IP bans.


Scaling

You can scale the number of Tor instances by modifying the docker-compose.yml file and adding more tor-extend-* services as needed.


Project Structure

  • api/: Contains the webcrawler source code.
  • data/db/: Directory where MongoDB stores data.
  • dockerFiles/: Dockerfiles for building custom images.

Contribution

We welcome contributions to improve the Dark Web Monitoring Webcrawler. To contribute:

  1. Fork the repository.
  2. Create a new branch:
    git checkout -b feature-branch
  3. Commit your changes:
    git commit -m "Add a new feature"
  4. Push your changes and open a pull request.

License

This project is licensed under the MIT License, making it free and open for further development.


Disclaimer

The Dark Web Monitoring Webcrawler is intended for research and educational purposes only. Users are responsible for ensuring compliance with local laws and regulations.


GitHub Repository: Dark Web Monitoring Webcrawler