physionet-12ecg-classification

To load this repository run:

git clone [email protected]:antonior92/physionet-12ecg-classification.git
# or: git clone https://github.com/antonior92/physionet-12ecg-classification.git

The requirements are described in requirements.txt.

Downloading the datasets from PhysioNet

The training data can be downloaded from this links (You can use the MD5 hash to verify the integrity of the tar.gz file.):

CPSC2018 training set, 6,877 recordings: link; MD5-hash: 7b6b1f1ab1b4c59169c639d379575a87
China 12-Lead ECG Challenge Database (unused CPSC2018 data), 3,453 recordings: link; MD5-hash: 36b409ee2b46aa6f1d2bef99b8451925
St Petersburg INCART 12-lead Arrhythmia Database, 74 recordings: link; MD5-hash: 440ca079f137fb16259511bb6105f134
PTB Diagnostic ECG Database, 516 recordings: link; MD5-hash: 4035a2b496067c4331eecab74695bc67
PTB-XL electrocardiography Database, 21,837 recordings: link; MD5-hash: a893319c53f77d8e6a76ed3af38be99e
Georgia 12-Lead ECG Challenge Database, 10,344 recordings: link; MD5-hash: 594c8cbc02a0aec4c179d2f019b09a7a

The data loading procedure used in train.py and pretrain.py can work with nested directories, so we recommend loading and extracting all the datasets into the same folder. Every thing can be done from the command line by:

Create the new folder and move into it:

mkdir training_data  # Or whatever name...
cd ./training_data

Load dataset (with appropriate names):

for DSET in Training_CPSC Training_2 Training_StPetersburg Training_PTB PTB-XL Training_E;
do
wget -O PhysioNetChallenge2020_$DSET.tar.gz \
https://cloudypipeline.com:9555/api/download/physionet2020training/PhysioNetChallenge2020_$DSET.tar.gz/
done;
mv PhysioNetChallenge2020_PTB-XL.tar.gz PhysioNetChallenge2020_Training_PTB-XL.tar.gz

And, them, extract the data:

for DSET in Training_CPSC Training_2 Training_StPetersburg Training_PTB Training_PTB-XL Training_E;
do
mkdir ./$DSET && tar -xf PhysioNetChallenge2020_$DSET.tar.gz -C ./$DSET --strip-components=1
done;

After extracting the data, the tar.gz files can be removed:

rm *.tar.gz

Check if the sizes of the downloaded and extracted files are correct by running

$ du -hd 2 .

the expected output is:

784M    ./Training_StPetersburg
650M    ./Training_2
2.6G    ./Training_PTB-XL
1.3G    ./Training_CPSC
1.2G    ./Training_E
1.3G    ./Training_PTB
7.8G    .

Training and evaluating

In order to train the model use:

python train.py

By default it looks for the WFDB folder (containing the training dataset) in ./Training_WFDB. The option --input_folder PATH might be used to specify a different location. By default, does not use the GPU, but the GPU usage can be ativated using the option --cuda. Call:

python train.py --help

To get a complete list of the options.

Unless a the output folder is explicitly specified using the option --folder, the script will create a new folder ./output_YYYY-MM-DD_HH_MM_SS_MMMMMM, for which YYYY-MM-DD_HH_MM_SS_MMMMMM is the date and time the script was executed. All the script output is saved inside this folder. The internal structure of this folder is:

./output_YYYY-MM-DD_HH_MM_SS_MMMMMM
    config.json
    model.pth
    history.csv
    (final_model.pth)

where config.json contain the model hyperparameters and training configurations, model.pth contain the weights of the model for which the best performance was attained, history.csv contain the performance per epoch during the training, final_model.pth contain the weights of the model at the last epoch of the training (not necessarily the one with the best validation performance).

Unsupervised pre-training

It is possible to use an unsupervised pre-training stage. Given a partial ECG signal, the model will be trained to predict unseen samples of the signal. To generate a pre-trained model use:

python pretrain.py

By default it looks for the WFDB folder (containing the training dataset) in ./Training_WFDB. The option --input_folder PATH might be used to specify a different location. By default, does not use the GPU, but the GPU usage can be ativated using the option --cuda. Call:

python pretrain.py --help

To get a complete list of the options.

Unless a the output folder is explicitly specified using the option --folder, the script will create a new folder ./output_YYYY-MM-DD_HH_MM_SS_MMMMMM, for which YYYY-MM-DD_HH_MM_SS_MMMMMM is the date and time the script was executed. All the script output is saved inside this folder. The internal structure of this folder is:

./output_YYYY-MM-DD_HH_MM_SS_MMMMMM
    pretrain_config.json
    pretrain_model.pth
    pretrain_history.csv
    pretrain_train_ids.txt
    pretrain_valid_ids.txt
    (pretrain_final_model.pth)

where pretrain_config.json contain the model hyperparameters and training configurations, pretrain_model.pth contain the weights of the model for which the best performance (in the unsupervised task) was attained, pretrain_history.csv contain the performance per epoch during the training, pretrain_final_model.pth contain the weights of the model at the last epoch of the training (not necessarily the one with the best validation performance). Also, pretrain_{train,valid}_ids.txt contain the ids used for training and validating the unsupervised model

One can load pre-trained weights during the supervised training stage, i.e., train.py by simply calling:

python train.py --folder /PATH/TO/FOLDER/WITH/PRETRAINED/MODEL

that is, with an option pointing to any folder containing (at least) pretrain_config.json and pretrain_model.pth.

Load pretrained weights and configuration

The output of one successful unsupervised pretraining + the supervised training procedure is available and can be loaded using:

mkdir ./mdl
wget https://www.dropbox.com/s/1pledtjboriw1fz/config.json?dl=0 -O mdl/config.json
wget https://www.dropbox.com/s/f940fomzmbxmbra/model.pth?dl=0 -O mdl/model.pth
wget https://www.dropbox.com/s/46ombyq4ecgl7oa/pretrain_config.json?dl=0 -O mdl/pretrain_config.json
wget https://www.dropbox.com/s/hv0mj3gwcm43u26/pretrain_model.pth?dl=0 -O mdl/pretrain_model.pth
wget https://www.dropbox.com/s/f08q2wk2wdwehza/history.csv?dl=0 -O mdl/history.csv
wget https://www.dropbox.com/s/3kjsfl8lyak6hau/pretrain_history.csv?dl=0 -O mdl/pretrain_history.csv

This should create the folder

./mdl
    config.json
    pretrain_config.json
    model.pth
    pretrain_model.pth
    history.csv
    pretrain_history.csv

Look at run_12ECG_classifier.py to see how this model might be loaded.

Scripts from the challenge

There are five scripts that are provided by the challenge organizers:

Two of them are used by the challenge organizers to run our entry. Namely, driver.py and train_model.py which respectively evaluate our model and train the model
One is used to evaluate the performance of given model predictions: evaluate_12ECG_score.py. We did some small modifications in this one to easily allow it
Finally two of them serve as interface to run and evaluate our model. Namely, run_12ECG_classifier.py and train_12ECG_classifier.py which implement the functions required by, respectively, driver.py and train_model.py to evaluate and train our model.

The script driver.py can be used to compute the model output in all entries of a given directory:

python driver.py input_directory output_directory

where input_directory is a directory for input data files and output_directory is a directory for output classification files. This script should populated the output director with file of the type:

#Record ID
 AF, I-AVB, LBBB, Normal, RBBB, PAC,  PVC,  STD, STE
  1,     1,    0,      0,    0,   0,   0,     0,   0
0.9,   0.6,  0.2,   0.05,  0.2, 0.35, 0.35, 0.1, 0.1

The PhysioNet/CinC 2020 webpage provides a training database with data files and a description of the contents and structure of these files.

The script train_model.py can be used to train the model in all entries of a given directory:

python train_model.py input_directory output_directory

Both scripts are available in: https://github.com/physionetchallenges/python-classifier-2020

The script evaluate_12ECG_score.py is available in: https://github.com/physionetchallenges/evaluation-2020. It can use the output from driver.py to assess the model performance according to different scores.

python evaluate_12ECG_score.py input_directory output_directory scores.csv

Running docker

Build docker

docker build -t physionet .

Run and mount volumes

mkdir mdl
mkdir out
docker run -it -v \
training_data:/physionet/training_data -v \
mdl:/physionet/mdl -v \
training_data:/physionet/test_data -v \
out:/physionet/out gcr.io/deft-station-275520/quickstart-image:latest bash

Name		Name	Last commit message	Last commit date
Latest commit History 210 Commits
data		data
dx		dx
models		models
models_pretrain		models_pretrain
outlayers		outlayers
.gitattributes		.gitattributes
.gitignore		.gitignore
AUTHORS.txt		AUTHORS.txt
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
driver.py		driver.py
evaluate_12ECG_score.py		evaluate_12ECG_score.py
pretrain.py		pretrain.py
requirements.txt		requirements.txt
run_12ECG_classifier.py		run_12ECG_classifier.py
train.py		train.py
train_12ECG_classifier.py		train_12ECG_classifier.py
train_model.py		train_model.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

physionet-12ecg-classification

Downloading the datasets from PhysioNet

Training and evaluating

Unsupervised pre-training

Load pretrained weights and configuration

Scripts from the challenge

Running docker

About

Releases 3

Packages

Contributors 4

Languages

License

antonior92/physionet-12ecg-classification

Folders and files

Latest commit

History

Repository files navigation

physionet-12ecg-classification

Downloading the datasets from PhysioNet

Training and evaluating

Unsupervised pre-training

Load pretrained weights and configuration

Scripts from the challenge

Running docker

About

Resources

License

Stars

Watchers

Forks

Releases 3

Packages 0

Contributors 4

Languages

Packages