Bring code to this repository for better reproducibility (#16)

* support n_neutral_labels and n_neutral_labels * Rester all subjects to PAM50 * fix bugs related to subject subfolders * ⚙️chore(reg2pam50) resample image and segmentation before processing slices + fix naming * add script to get spine-generic data via git-annex * copy -u instead of -n+check for warp_template2anat * add parallel processing * add seg_manual_fix_3d_slicer.py script * Multiple updates * Add color map for 3D Slicer * update seg_manual_fix_3d_slicer script - Refactored seg_manual_fix_3d_slicer.py script - Added support for opening seg-manual file if it exists - Renamed existing segmentation file to .bkp.nii.gz if it already exists - Added functionality to create json sidecar with meta information - Renamed existing json sidecar file to .bkp.json if it already exists * modify load_files function accept an optional seg_suffix parameter * add remove_warp_outliers script * add script to convert .mha to .nii.gz * add script to map segmentation labels * Add spider_labels_map.json * add generate_seg_jpg_nnunet.py * update default paths and folder names * update git-annex installation command * add make_nnunet_dataset.py * change default values of data folder and PAM50 segmentation file * add mrspineseg_labels_map.json * add nnunet_labels_map.json * add support for customizing output suffix * fix T12-L1 IVD from 207 to 42 * remove default compression level * rename output directory * skip when the output files already exist. fixes skip when no T12-L1 IVD. * Save combined JPG images from NIfTI imag and seg * Generalize and multithread map_labels script * add fix_csf_label script * Add script to generate augmented images + segs * Add script to generate segmented labels from initial segmentation * Refactor generate_seg_jpg_nnunet.py script to improve readability and maintainability * Mmove script for generating segment JPG images * Some scripts for: Random disc labels for training data. Edit seg background. "generate_random_disc_labels.py" "generate_random_disc_labels_channels.py" "put_in_background.py" * Remove some non working startegies for training * Add nnUNet testing and training scripts * fix bug in RandomNoise calculation. remove non-used imports default _0000 for images * set default number of generated to 7. change augmentation randomness * Refactor input/output folder definitions in script * Set default image suffix to '_0000' in segmentation script * Remove outdated scripts * Enhance mha2nii for Bulk Conversion with Parallelism * Repo reorganization * Create python package totalsegmri * changes to support python package * Updated .gitignore to exclude .vscode directory * fix label_map paths * fix bug in image cropping * Added tqdm to requirements * Update README.md - add (private dataset) to whole-spine Co-authored-by: Nathan Molinier <[email protected]> * Update README for data path and script execution * Update SPIDER dataset path in preparation script * Enhance verbose logging with script names * Enhance training script output verbosity * convert crlf to lf * Added argparse dependency to dirpath utility * Balanced dataset by duplicating instances * Updated README and scripts for nnUNetv2 integration - Separated dataset preparation and training instructions in the README for clarity, specifying use of nnUNetv2 structure. - Clarified the prerequisite of having a trained model before running inferences in the README. - Fixed the output directory variable assignment in inference script to correctly use a second parameter. - Enhanced the inference script to handle missing `_0000` suffixes and to support new postprocessing steps. - Added a new dataset preparation script (`prepare_nnunet_datasets.sh`) to set up data structure for nnUNetv2. - Removed dataset preparation steps from the training script (`train_nnunet.sh`), focusing it solely on model training as per the new separation of concerns. The changes improve the accuracy, usability, and maintainability of TotalSegMRI's implementation with nnUNetv2, facilitating better segmentation results and a smoother experience for users following the updated instructions. * Update README.md * Refactor argument flags in MRI utils scripts Updated the argument flags across various MRI utility scripts to standardize input directory flags as '-s' instead of the previous '-i'. This change enhances the consistency of script interfaces, making it easier to understand and use the tools for MRI data preparation and processing. Adjusted scripts include those for generating sequential labels, mapping labels, and fixing CSF labels. Compatibility with existing conventions in subject subdirectory flags has been maintained by switching from '-s' to '-u'. * Added script for processingNIfTI segmentation files and retains only the largest connected component for each label. * Ignore non-critical warnings * Update dataset prep and training workflow Refactored the dataset preparation steps, now sourcing the new `get_spine_generic_datasets.sh` script to fetch specific datasets from the Spine Generic Project repository. This update clarifies the preparation of SPIDER datasets in the BIDS structure. Revised training and inference instructions in `README.md` to correspond with the new dataset structure and included clear directives for model training and running inference. Removed unnecessary code from the `prepare_spider_bids_datasets.sh` script, further streamlining the process. These changes make the data setup more intuitive and maintainable, enabling easier replication of the research environment. Related to issue #4567. * Rename multi-subject and single-subject dataset zip files to included_PAM50_seg. * Refactor label generation utilities Removed redundant `pairs_dict` function from both generate_labels_sequential.py and generate_largest_labels.py as it was no longer used in the current codebase. Updated generate_largest_labels.py to enhance clarity: renamed the function generate_labels to generate_largest_labels and updated its references to match the new name, ensuring consistency with the module's purpose. Removed unused imports to streamline dependencies and maintain clean and efficient code. * Remove big binaries from main repository * Update README with mkdir flag enhancement Ensure the creation of nested directories for the SPIDER dataset by adding the '-p' flag to the 'mkdir' command in the README instructions. This prevents potential errors when users attempt to create subdirectories in a non-existent path. --------- Co-authored-by: Nathan Molinier <[email protected]> Co-authored-by: Nathan Molinier <[email protected]>
neuropoly · Feb 20, 2024 · 0300574 · 0300574
1 parent 662e8c9
commit 0300574
Show file tree

Hide file tree

Showing 38 changed files with 3,078 additions and 1,131 deletions.
diff --git a/.gitignore b/.gitignore
@@ -0,0 +1,175 @@
+# Vscode
+.vscode
+.vscode/*
+!.vscode/settings.json
+!.vscode/tasks.json
+!.vscode/launch.json
+!.vscode/extensions.json
+!.vscode/*.code-snippets
+
+# Local History for Visual Studio Code
+.history/
+
+# Built Visual Studio Code Extensions
+*.vsix
+
+# Byte-compiled / optimized / DLL files
+__pycache__/
+*.py[cod]
+*$py.class
+
+# C extensions
+*.so
+
+# Distribution / packaging
+.Python
+build/
+develop-eggs/
+dist/
+downloads/
+eggs/
+.eggs/
+lib/
+lib64/
+parts/
+sdist/
+var/
+wheels/
+share/python-wheels/
+*.egg-info/
+.installed.cfg
+*.egg
+MANIFEST
+
+# PyInstaller
+#  Usually these files are written by a python script from a template
+#  before PyInstaller builds the exe, so as to inject date/other infos into it.
+*.manifest
+*.spec
+
+# Installer logs
+pip-log.txt
+pip-delete-this-directory.txt
+
+# Unit test / coverage reports
+htmlcov/
+.tox/
+.nox/
+.coverage
+.coverage.*
+.cache
+nosetests.xml
+coverage.xml
+*.cover
+*.py,cover
+.hypothesis/
+.pytest_cache/
+cover/
+
+# Translations
+*.mo
+*.pot
+
+# Django stuff:
+*.log
+local_settings.py
+db.sqlite3
+db.sqlite3-journal
+
+# Flask stuff:
+instance/
+.webassets-cache
+
+# Scrapy stuff:
+.scrapy
+
+# Sphinx documentation
+docs/_build/
+
+# PyBuilder
+.pybuilder/
+target/
+
+# Jupyter Notebook
+.ipynb_checkpoints
+
+# IPython
+profile_default/
+ipython_config.py
+
+# pyenv
+#   For a library or package, you might want to ignore these files since the code is
+#   intended to run in multiple environments; otherwise, check them in:
+# .python-version
+
+# pipenv
+#   According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
+#   However, in case of collaboration, if having platform-specific dependencies or dependencies
+#   having no cross-platform support, pipenv may install dependencies that don't work, or not
+#   install all needed dependencies.
+#Pipfile.lock
+
+# poetry
+#   Similar to Pipfile.lock, it is generally recommended to include poetry.lock in version control.
+#   This is especially recommended for binary packages to ensure reproducibility, and is more
+#   commonly ignored for libraries.
+#   https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control
+#poetry.lock
+
+# pdm
+#   Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control.
+#pdm.lock
+#   pdm stores project-wide configurations in .pdm.toml, but it is recommended to not include it
+#   in version control.
+#   https://pdm.fming.dev/#use-with-ide
+.pdm.toml
+
+# PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm
+__pypackages__/
+
+# Celery stuff
+celerybeat-schedule
+celerybeat.pid
+
+# SageMath parsed files
+*.sage.py
+
+# Environments
+.env
+.venv
+env/
+venv/
+ENV/
+env.bak/
+venv.bak/
+
+# Spyder project settings
+.spyderproject
+.spyproject
+
+# Rope project settings
+.ropeproject
+
+# mkdocs documentation
+/site
+
+# mypy
+.mypy_cache/
+.dmypy.json
+dmypy.json
+
+# Pyre type checker
+.pyre/
+
+# pytype static type analyzer
+.pytype/
+
+# Cython debug symbols
+cython_debug/
+
+# PyCharm
+#  JetBrains specific template is maintained in a separate JetBrains.gitignore that can
+#  be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore
+#  and can be added to the global gitignore or merged into this file.  For a more nuclear
+#  option (not recommended) you can uncomment the following to ignore the entire idea folder.
+#.idea/
diff --git a/Dockerfile b/Dockerfile
diff --git a/README.md b/README.md
@@ -1,103 +1,139 @@
-# totalsegmentator-mri
-Code for the TotalSegmentator MRI project.
+# TotalSegMRI
 
-## Steps to install
+Tool for automatic segmentation and labelling of all vertebrae and intervertebral discs (IVDs), spinal cord, and spinal canal. We follow [TotalSegmentator classes](https://github.com/wasserth/TotalSegmentator?tab=readme-ov-file#class-details) with an additional class for IVDs, spinal cord and spinal canal (See list of class [here](#list-of-class)). We used [nnUNet](https://github.com/MIC-DKFZ/nnUNet) as our backbone for model training and inference.
 
-1. Clone this repository
-    ```
-    git clone https://github.com/neuropoly/totalsegmentator-mri.git
-    ```
+- [Dependencies](#dependencies)
+- [Installation](#installation)
+- [First Model](#first-model)
+  - [First Model - Train](#first-model---train)
+  - [First Model - Inference](#first-model---inference)
+- [List of class](#list-of-class)
 
-1. Clone SynthSeg repository
-    ```
-    git clone https://github.com/BBillot/SynthSeg.git
-    ```
+![Thumbnail](https://github.com/neuropoly/totalsegmentator-mri/assets/36595323/ceca5bb7-f370-477a-8b21-9774853948c6)
+
+## Dependencies
 
-1. Download [this google folder](https://drive.google.com/drive/folders/11F8q3jhZR0KfHhBpyKygXMo-alTDbp0U?usp=sharing) (the TotalSegmentator example image was downloaded from [here](https://zenodo.org/record/6802614)).
+- [Spinal Cord Toolbox (SCT)](https://github.com/neuropoly/spinalcordtoolbox)
 
-1. Create Virtual Environment (Please make sure you're using python 3.8 !!!)
+## Installation
+
+1. Open Terminal in a directory you want to work on.
+
+1. Create and activate Virtual Environment (Highly recommanded):
     ```
     python -m venv venv
+    source venv/bin/activate
     ```
 
-1. Add SynthSeg to Virtual Environment (If not using bash change '$(pwd)' to the working directory):
-    ```
-    echo "$(pwd)/SynthSeg" > venv/lib/python3.8/site-packages/SynthSeg.pth
-    ```
+1. Install [PyTorch](https://pytorch.org/get-started/locally/) as described on their website.
 
-1. Activate Virtual Environment
+1. Clone and install this repository:
     ```
-    source venv/bin/activate
+    git clone https://github.com/neuropoly/totalsegmentator-mri.git
+    python -m pip install -e totalsegmentator-mri
     ```
 
 1. Install requirements:
     ```
-    pip install -r SynthSeg/requirements_python3.8.txt
+    python -m pip install -r totalsegmentator-mri/requirements.txt
     ```
 
-## To run scripsts
-
-`resources/labels.json` - Contain mapping of each mask to unique number.
+## First Model
+A hybrid approach integrating nnU-Net with an iterative algorithm for segmenting vertebrae, IVDs, spinal cord, and spinal canal. To tackle the challenge of having many classes and class imbalance, we developed a two-step training process. A first model (model 1 - 206) was trained (single input channel: image) to identify 4 classes (IVDs, vertebrae, spinal cord and spinal canal) as well as specific IVDs (C2-C3, C7-T1 and L5-S1) representing key anatomical landmarks along the spine, so 7 classes in total (Figure 1A). The output segmentation was processed using an algorithm that distinguished odd and even IVDs based on the C2-C3, C7-T1 and L5-S1 IVD labels output by the model (Figure 1B). Then, a second nnU-Net model (model 2 - 210) was trained (two input channels: 1=image, 2=odd IVDs), to output 12 classes (Figure 1C). Finally, the output of model 2 was processed in order to assign an individual label value to each vertebrae and IVD in the final segmentation mask (Figure 1D).
 
-`resources/classes.json` - Contain mapping of each mask to class of masks with similar statistics (total 15 classes).
+![Figure 1](https://github.com/neuropoly/totalsegmentator-mri/assets/36595323/3958cbc6-a059-4ccf-b3b1-02dbc3a4a62d)
 
-### Option 1 - Run script for all TotalSegmentator labels
+**Figure 1**: Illustration of the hybrid method for automatic segmentation of the spine and spinal cord structures. T1w image (A) is used to train model 1, which outputs 7 classes (B). These output labels are processed to extract odd IVDs (C). The T1w and odd IVDs are used as two input channels to train model 2, which outputs 12 classes (D). These output labels are processed to extract individual IVDs and vertebrae (E).
 
-1. Combine all MPRAGE 'blob' masks for each subject into a single segmentation file:
-    ```
-    python totalsegmentator-mri/scripts/combine_masks.py -d TotalSegmentatorMRI_SynthSeg/data/derivatives/manual_masks -o output/ALL_LAB/MP-RAGE_Masks_Combined -m totalsegmentator-mri/resources/labels.json
-    ```
+### First Model - Train
 
-1. Calculate signal statistics (mean + std) for each masks (group masks into classes of similar statistics):
-    ```
-    python totalsegmentator-mri/scripts/build_intensity_stats.py -d TotalSegmentatorMRI_SynthSeg/data -s output/ALL_LAB/MP-RAGE_Masks_Combined -o output/ALL_LAB/MP-RAGE_priors -m totalsegmentator-mri/resources/labels.json -c totalsegmentator-mri/resources/classes.json
-    ```
+1. Download the corresponding content from [SPIDER dataset](https://doi.org/10.5281/zenodo.10159290) into 'data/raw/spider/images' and 'data/raw/spider/masks' (you can use `mkdir -p data/raw/spider` to create the folder first).
 
-1. Combine all TotalSegmentator masks for each subject into a single segmentation file:
-    ```
-    python totalsegmentator-mri/scripts/combine_masks.py -d TotalSegmentatorMRI_SynthSeg/Totalsegmentator_dataset -o output/ALL_LAB/TotalSegmentator_Masks_Combined -m totalsegmentator-mri/resources/labels.json --subject-prefix s --subject-subdir segmentations --seg-suffix _ct_seg --output-bids 0
-    ```
+1. Make sure `git` and `git-annex` are installed (You can install with `sudo apt-get install git-annex -y`).
 
-1. Create a synthetic image using TotalSegmentator segmentation and the calculated MPRAGE signal statistics:
-    ```
-    python totalsegmentator-mri/scripts/generate_image.py -s output/ALL_LAB/TotalSegmentator_Masks_Combined -p output/ALL_LAB/MP-RAGE_priors -o output/ALL_LAB/MP-RAGE_Synthetic/test1 -n 2
-    ```
+1. Extract [data-multi-subject_PAM50_seg.zip](https://drive.google.com/file/d/1Sq38xLHnVxhLr0s1j27ywbeshNUjo3IP) into 'data/bids/data-multi-subject'.
 
-### Option 2 - Run script with TotalSegmentator labels reduced to 15 labels
+1. Extract [data-single-subject_PAM50_seg.zip](https://drive.google.com/file/d/1YvuFHL8GDJ5SXlMLORWDjR5SNkDL6TUU) into 'data/bids/data-single-subject'.
 
-To reduce number of labels and group all vertebrae, we use `resources/classes.json` as the main masks mapping when combining masks with combine_masks. This way all masks of the same classes will be mapped to the same label.
+1. Extract [whole-spine.zip](https://drive.google.com/file/d/143i0ODmeqohpc4vu5Aa5lnv8LLEyOU0F) (private dataset) into 'data/bids/whole-spine'.
 
-1. Combine all MPRAGE 'blob' masks for each subject into a single segmentation file:
+1. Get the required datasets from [Spine Generic Project](https://github.com/spine-generic/):
     ```
-    python totalsegmentator-mri/scripts/combine_masks.py -d TotalSegmentatorMRI_SynthSeg/data/derivatives/manual_masks -o output/15_LAB/MP-RAGE_Masks_Combined -m totalsegmentator-mri/resources/classes.json
+    source totalsegmentator-mri/run/get_spine_generic_datasets.sh
     ```
 
-1. Calculate signal statistics (mean + std) for each masks:
+1. Prepares SPIDER datasets in [BIDS](https://bids.neuroimaging.io/) structure:
     ```
-    python totalsegmentator-mri/scripts/build_intensity_stats.py -d TotalSegmentatorMRI_SynthSeg/data -s output/15_LAB/MP-RAGE_Masks_Combined -o output/15_LAB/MP-RAGE_priors -m totalsegmentator-mri/resources/classes.json
+    source totalsegmentator-mri/run/prepare_spider_bids_datasets.sh
     ```
 
-1. Combine all TotalSegmentator masks for each subject into a single segmentation file:
+1. Prepares datasets in nnUNetv2 structure:
     ```
-    python totalsegmentator-mri/scripts/combine_masks.py -d TotalSegmentatorMRI_SynthSeg/Totalsegmentator_dataset -o output/15_LAB/TotalSegmentator_Masks_Combined -m totalsegmentator-mri/resources/classes.json --subject-prefix s --subject-subdir segmentations --seg-suffix _ct_seg --output-bids 0
+    source totalsegmentator-mri/run/prepare_nnunet_datasets.sh
     ```
 
-1. Create a synthetic image using TotalSegmentator segmentation and the calculated MPRAGE signal statistics:
+1. Train the model:
     ```
-    python totalsegmentator-mri/scripts/generate_image.py -s output/15_LAB/TotalSegmentator_Masks_Combined -p output/15_LAB/MP-RAGE_priors -o output/15_LAB/MP-RAGE_Synthetic/test1 -n 2
+    source totalsegmentator-mri/run/train_nnunet.sh
     ```
-## Data organization
-
-As a starting point, a few MPRAGE data are under our private [google folder](https://drive.google.com/drive/folders/1CAkz4ZuxQjWza7GAXhXxTkKcyB9p3yME).
 
-We will follow the BIDS structure:
+### First Model - Inference
+Run the model on a folder containing the images in .nii.gz format (Make sure to train the model or extract the trained `nnUNet_results` into `data/nnUNet/nnUNet_results` befor running):
 ```
-├── derivatives
-│   └── manual_masks
-│       └── sub-errsm37
-│           └── anat
-└── sub-errsm37
-    └── anat
-        ├── sub-errsm37_T1w.json
-        └── sub-errsm37_T1w.nii.gz
+source totalsegmentator-mri/run/inference_nnunet.sh INPUT_FOLDER OUTPUT_FOLDER
 ```
+
+## List of class
+
+|Label|Name|
+|:-----|:-----|
+| 18 | vertebrae_L5 |
+| 19 | vertebrae_L4 |
+| 20 | vertebrae_L3 |
+| 21 | vertebrae_L2 |
+| 22 | vertebrae_L1 |
+| 23 | vertebrae_T12 |
+| 24 | vertebrae_T11 |
+| 25 | vertebrae_T10 |
+| 26 | vertebrae_T9 |
+| 27 | vertebrae_T8 |
+| 28 | vertebrae_T7 |
+| 29 | vertebrae_T6 |
+| 30 | vertebrae_T5 |
+| 31 | vertebrae_T4 |
+| 32 | vertebrae_T3 |
+| 33 | vertebrae_T2 |
+| 34 | vertebrae_T1 |
+| 35 | vertebrae_C7 |
+| 36 | vertebrae_C6 |
+| 37 | vertebrae_C5 |
+| 38 | vertebrae_C4 |
+| 39 | vertebrae_C3 |
+| 40 | vertebrae_C2 |
+| 41 | vertebrae_C1 |
+| 92 | sacrum |
+| 200 | spinal_cord |
+| 201 | spinal_canal |
+| 202 | disc_L5_S |
+| 203 | disc_L4_L5 |
+| 204 | disc_L3_L4 |
+| 205 | disc_L2_L3 |
+| 206 | disc_L1_L2 |
+| 207 | disc_T12_L1 |
+| 208 | disc_T11_T12 |
+| 209 | disc_T10_T11 |
+| 210 | disc_T9_T10 |
+| 211 | disc_T8_T9 |
+| 212 | disc_T7_T8 |
+| 213 | disc_T6_T7 |
+| 214 | disc_T5_T6 |
+| 215 | disc_T4_T5 |
+| 216 | disc_T3_T4 |
+| 217 | disc_T2_T3 |
+| 218 | disc_T1_T2 |
+| 219 | disc_C7_T1 |
+| 220 | disc_C6_C7 |
+| 221 | disc_C5_C6 |
+| 222 | disc_C4_C5 |
+| 223 | disc_C3_C4 |
+| 224 | disc_C2_C3 |
diff --git a/pyproject.toml b/pyproject.toml
@@ -0,0 +1,8 @@
+[build-system]
+requires = ["setuptools"]
+build-backend = "setuptools.build_meta"
+
+[project]
+name = "totalsegmri"
+version = "0.0.1"
+dependencies = []