A framework for passive and active data collection.
Passive Producers:
- HTTP Interception & Fingerprinting
- Interaction Generators
- WebSocket Collection
Active Producers:
- Active API Scraping Modules (AKA Data Expansion)
- Web Scraping
- ZMap (or similar)
- ZGrab (or similar)
Consumers:
- ZeroMQ/RabbitMQ Integration (Seperate consumer from producer threads)
- Solr Integration
- Document Linking
- BigQuery Integration
- Document Linking (Not sure if possible, ~5s/query minimum. Maybe with non-free allocated capacity)
- Cassandra Integration
- Document Linking
- Data Transformation (Common event structure)
- AI Analysis
This project is still a work in progress and does not have a simple setup procedure.
This tool depends on HTTP Toolkit to setup the HTTP(S) interception, which it hooks into the GraphQL over Websocket interface it exposes.
To get started:
- Install and run HTTP Toolkit
- Point a device with the HTTP toolkit certificate installed towards the interception server
- Edit
fingerprints/
modules:- Update the
fingerprints
dictionary, where keys are the types of data, and the values are the keys the program should look for - Update
MINIMUM_FINGERPRINTS
if needed, this is the minimum number of keys found to mark the input data as a hit
- Update the
- Edit
main.py
:- Add your
SOLR_HOST
(see this if you are unfamiliar),TARGET_HOSTNAME
(the hostname if the API you would like to collect data from)
- Add your
- Start
main.py
- If needed, print out
foundItem
s to update your database schema
- If needed, print out
The contents of this repository are a personal project and in no way reflect work done for my employer.