Skip to content

kcsmta/FCD_ae_cnn_multitask

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

25 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Faulty Code Detection

The official implementation of the paper Semi-supervised Multi-task Learning Using Convolutional AutoEncoder for Faulty Code Detection with Limited Data.

Table of contents

Introduction

We proposes a semi-supervised multitask learning framework to overcome the limited data problem for Faulty Code Detection. The framework has two parts including 1) a convolutional autoencoder performing the auxiliary task that is learning the latent representations of programs and 2) a sequence-based convolution for the main task of faulty code prediction. Our proposed model is shown below:

You can follow this paper for more details.

Installation

  • Python 3.5+ is required.

1. Clone project

git clone https://github.com/kcsmta/FCD_ae_cnn_multitask.git

2. Run following command:

# Create virtual enviroment
python venv ae-cnn-multitask-env
# Active virtual enviroment
source ae-cnn-multitask-env/bin/activate
# Install requirement packages
(venv) pip install -r requirements.txt

Usage

You should change the variable data_part in the file FCD_models/params.py to 1.0, 0.75, 0.5, and 0.25 corresponding to using the number of labeled data in training phase as 100%, 75%, 50%, and 25%.

1. CNN: Convolutional neural network (as the classification branch without the autoencoder)

python run_train_cnn.py

2. CNN_trans: Convolutional neural network with transfer learning

python run_train_cnn_transfer.py

2'. AE: train the autoencoder as parameter initialization for AE_CNN and AE_CNN_Mul

python run_train_ae.py

3. AE_CNN: We first train the autoencoder and then remain the weights for the part 1 of convolutions to train CNN independently. This can be considered as a type of self-transfer learning because we initialize the weights using the same data but without target label information.

python run_finetune_cnn.py

4. CNN_Mul: the autoencoder and CNN are trained simultaneously from scratch.

python run_train_cnn_multi_task.py

5. AE_CNN_Mul: The autoencoder is pretrained and then we continue training the whole model.

python run_finetune_cnn_multi_task.py
  • To visualize latent embedding vector:
cd visulize_embedding/
python run_visualize.py
  • To plot ROC curve:
cd draw_roc/
python plot_roc_multiclasses.py

Contact

Authors:
Anh Phan Viet: [email protected]
Khanh Duy Tung Nguyen : [email protected]

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages