Phishing website detection using Graph Neural Networks (GNNs).
git clone https://github.com/TristanBilot/phishGNN.git
cd phishGNN
./install_dataset.sh
pip install -r requirements.txt
pip install torch-scatter torch-sparse torch-cluster torch-spline-conv torch-geometric -f https://data.pyg.org/whl/torch-1.11.0+cpu.html # for cpu
During training, the files located in data/training/processed will be used by default. The raw dataset is composed of urls mapped to around 30 features, including a list of references (href, form, iframe) to other pages, which also have their own features and their list of references.
python phishGNN/training.py
During training, it is possible to generate the embeddings just after passing through the Graph Convolutional layers. Just run the training with the following option:
python phishGNN/training.py --plot-embeddings
A tool has been developed in order to visualize graphically the internal structure of web pages from the dataset along with their characteristics such as the number of nodes/edges and whether the page is phishing or benign.
To visualize these data, first follow the instructions in the installation part and open the file visualization/visualization.html
.