GitHub - JairMathAI/COVID: Materials for the development of the master's thesis entitled: Aprendizaje profundo para la identificación y localización de daños generados por COVID19 en radiografías de tórax.

Project description

Here are the codes used in the development of the master's thesis titled "Deep Learning for the Identification and Localization of COVID-19 Induced Damage in Chest X-Rays." The folder structure in this main folder is as follows:

Anotaciones 📂 :

RetinaNet

Y.O.L.O

Datos

Codigos_Datos:

Codigos_Datos 📂 :
Datos 📂 :
Exploratorios 📂 :
env.yml 📄 :
Trainings:

Google Drive folder 📂

Codigos_Modificados 📂 :

For Y.O.L.O 📂
For RetinaNet 📂

AumentoSIIM.ipynb 📄 :
ConfuseMatrix.py 📄 :
trainMoidif.py 📄 :

RetinaNet model's GitHub

modelModif.py 📄 :

RetinaNet model's GitHub

visualize_single_image.py 📄 :

Project Replication

In order to replicate this project it is recommended to follow the following steps:

First, we need to obtain the necessary image datasets for the project.

To work, we need a virtual environment managed by Anaconda. For instructions on how to install Anaconda, please follow the official guide: Here.

At this stage, we need three different environments to work on the project. For two of them, it is necessary to follow the required GitHub instructions to run the models:

Y.O.L.O
RetinaNet

The last environment is required to replicate the exploratory analysis and manage the data. We need to download the env.yml file contained in this repository.

Then, to create the environment, run the following command in the Anaconda shell:

conda env create -f path/env.yml

Once we have the necessary environments, we can perform various tasks.

Replicate the exploratory analysis.

We need to work with the Anaconda environment previously created using the env.yml file and start JupyterLab by running the following command within the environment:

jupyterlab

With JupyterLab running, we have access to a data directory opened in the web browser. According to the exploratory analysis we want to replicate, we need two main things:

First, download the Jupyter Notebook related to the desired analysis, available in the project repository here.

Second, for each analysis, the Notebook requires the relevant information for each dataset, which is available in the corresponding dataset folder in this repository here.

To run the respective Notebook for each dataset, the following information file is needed:

Exploratorio_RSNA.ipynb requires the path to the Todo_info.csv file.

Exploratorio_Xray14.ipynb requires the path to the Todas.csv and bboxes.csv files.

Exploratorio_SIIM.ipynb requires the path to the original train_image_level.csv and train_study_level.csv files provided with the origininal dataset.

Importan Note:

For the correct execution of the notebooks, it is required that the paths to the directories containing the images are updated in the current Notebook and that the images are in the correct format.

Preparing images and annotations

To run the notebooks for the exploratory analysis and model training, we need to transform the images of each dataset appropriately, ensuring they meet the required format and size, and generate the provided annotation files, for example.

You can use the codes in the Codigos_Datos folder to adjust the images to the correct size and format. Alternatively, you can run your own script that resizes the images and saves them in PNG format.

The Anotaciones: folder contains the necessary annotations for the direct execution of RetinaNet, as explained in the respective GitHub project. For training with Y.O.L.O., a YAML file is required, as detailed in the corresponding GitHub documentation. The annotations for Y.O.L.O. can be derived from the provided RetinaNet annotations using the codes available in the Codigos_Datos folder. Alternatively, we can start with the data summary information in the Datos: folder and format the annotations as needed using a custom script, taking into account the next coments required for train Y.O.L.O:

The YAML file is the working datased indicator for Y.O.L.O and it need the folowing structure:

path: root/path/to/dataset/folder
train: images/train
val: images/validation
test: images/test

names:

0: Typical Appearance
1: Negative for Pneumonia
2: Indeterminate Appearance
3: Atypical Appearance

In the other hand the structure that the dataset need to have and that we need to construct using the respective Anotaciones files is:

📂 Dataset
├── 📂 images
│ ├── 📂 train
│ │ ├── 📄 tr_1.png
│ │ ├── ...
│ │ ├── 📄 tr_n.png
│ ├── 📂 test
│ │ ├── 📄 ts_1.png
│ │ ├── ...
│ │ ├── 📄 ts_n.png
│ ├── 📂 validation
│ │ ├── 📄 v_1.png
│ │ ├── ...
│ │ ├── 📄 v_n.png
├── 📂 labels
│ ├── 📂 train
│ │ ├── 📄 tr_1.txt
│ │ ├── ...
│ │ ├── 📄 tr_n.txt
│ ├── 📂 test
│ │ ├── 📄 ts_1.txt
│ │ ├── ...
│ │ ├── 📄 ts_n.txt
│ ├── 📂 validation
│ │ ├── 📄 v_1.txt
│ │ ├── ...
│ │ ├── 📄 v_n.txt

Each row (one per boundingbox) in the .txt label file need to follow the next structure:

class(int) x_c(float) y_c(float) w_n(float) h_n(float)

where:

x_c: is the x coordinate of the center of the bounding box and it's normalized with respect to the the image
y_c: is the y coordinate of the center of the bounding box and it's normalized with respect to the the image
w_n: is the width of the bounding box normalized with respect to the width of the image
h_n: is the height of the bounding box normalized with respect to the height of the image

to obtain this format is important recall that the Anotaciones files has the format needed for RetinaNet:

Each row (one per boundingbox) in the .csv label file follow the next structure:

image_path(str) x_1(int) y_1(int) x_2(int) y_2(int) class(string)

where:

image_path: is the path to the images corresponding to the annotation.
x_1: is the x coordinate of the upper left corner of the bounding box
y_2: is the y coordinate of the upper left corner of the bounding box
x_2: is the x coordinate of the lower right corner of the bounding box
y_2: is the y coordinate of the lower right corner of the bounding box

Note that the Anotaciones files contain annotatin based in a image size of 640 x 640.

Training Models

With the corresponding images and annotations in order, we can follow the instructions in the respective model repository to train the model:

Y.O.L.O
RetinaNet

Replication of results

In order to replicate the training results reported in the master thesis and given in the Google Drive folder:

For Y.O.L.O:

We can utilize the EntrenaYOLO.ipynb notebook to train the model and generate the corresponding confusion matrices. Please note that some paths need to be updated to run the generation code. In the respective training Google Drive folder, a train.txt file is provided, which contains the specific hyperparameters used for each training session. Additionally, there is a .pth file corresponding to the generated model weights, allowing us to acces the models without training from scratch.

To replicate the results with the modified architecture, we need to follow the instructions provided in the Codigos_Modificados folder related to Y.O.L.O.

For RetinaNet

We can use the respective annotation CSV file with the updated image paths provided in the Anotaciones folder. By following the instructions in the original GitHub repository, we can effectively implement the model. The Google Drive folder also contains the weights for each trained model, and the corresponding hyperparameters used during training are detailed in the master's thesis (a link will be included once the work is officially published).

If we want to reproduce the results by applying data augmentation, modifying the training process, or visualizing and generating confusion matrices, we can follow the instructions in the Codigos_Modificados folder related to RetinaNet.

Important Note

Most of the code was initially developed to run local experiments as part of the final report. However, efforts are currently underway to enhance its reusability and reduce its dependency on local configurations.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Project description

Here are the codes used in the development of the master's thesis titled "Deep Learning for the Identification and Localization of COVID-19 Induced Damage in Chest X-Rays." The folder structure in this main folder is as follows:

Project Replication

In order to replicate this project it is recommended to follow the following steps:

Replicate the exploratory analysis.

Preparing images and annotations

Training Models

Replication of results

For Y.O.L.O:

For RetinaNet

Important Note

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 87 Commits
Anotaciones		Anotaciones
Codigos_Datos		Codigos_Datos
Codigos_Modificados		Codigos_Modificados
Datos		Datos
Exploratorios		Exploratorios
README.md		README.md
env.yml		env.yml

JairMathAI/COVID

Folders and files

Latest commit

History

Repository files navigation

Project description

Here are the codes used in the development of the master's thesis titled "Deep Learning for the Identification and Localization of COVID-19 Induced Damage in Chest X-Rays." The folder structure in this main folder is as follows:

Project Replication

In order to replicate this project it is recommended to follow the following steps:

Replicate the exploratory analysis.

Preparing images and annotations

Training Models

Replication of results

For Y.O.L.O:

For RetinaNet

Important Note

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages