Skip to content

Commit

Permalink
Bring code to this repository for better reproducibility (#16)
Browse files Browse the repository at this point in the history
* support n_neutral_labels and n_neutral_labels

* Rester all subjects to PAM50

* fix bugs related to subject subfolders

* ⚙️chore(reg2pam50) resample image and segmentation before processing slices + fix naming

* add script to get spine-generic data via git-annex

* copy -u instead of -n+check for warp_template2anat

* add parallel processing

* add seg_manual_fix_3d_slicer.py script

* Multiple updates

* Add color map for 3D Slicer

* update seg_manual_fix_3d_slicer script

- Refactored seg_manual_fix_3d_slicer.py script
- Added support for opening seg-manual file if it exists
- Renamed existing segmentation file to .bkp.nii.gz if it already exists
- Added functionality to create json sidecar with meta information
- Renamed existing json sidecar file to .bkp.json if it already exists

* modify load_files function
accept an optional seg_suffix parameter

* add remove_warp_outliers script

* add script to convert .mha to .nii.gz

* add script to map segmentation labels

* Add spider_labels_map.json

* add generate_seg_jpg_nnunet.py

* update default paths and folder names

* update git-annex installation command

* add make_nnunet_dataset.py

* change default values of data folder
and PAM50 segmentation file

* add mrspineseg_labels_map.json

* add nnunet_labels_map.json

* add support for customizing output suffix

* fix T12-L1 IVD from 207 to 42

* remove default compression level

* rename output directory

* skip when the output files already exist.
fixes skip when no T12-L1 IVD.

* Save combined JPG images from NIfTI imag and seg

* Generalize and multithread map_labels script

* add fix_csf_label script

* Add script to generate augmented images + segs

* Add script to generate segmented labels from
initial segmentation

* Refactor generate_seg_jpg_nnunet.py script to
improve readability and maintainability

* Mmove script for generating
segment JPG images

* Some scripts for:
Random disc labels for training data.
Edit seg background.
"generate_random_disc_labels.py"
"generate_random_disc_labels_channels.py"
"put_in_background.py"

* Remove some non working startegies for training

* Add nnUNet testing and training scripts

* fix bug in RandomNoise calculation.
remove non-used imports
default _0000 for images

* set default number of generated to 7.
change augmentation randomness

* Refactor input/output folder definitions in script

* Set default image suffix to '_0000' in
segmentation script

* Remove outdated scripts

* Enhance mha2nii for Bulk Conversion with
Parallelism

* Repo reorganization

* Create python package totalsegmri

* changes to support python package

* Updated .gitignore to exclude .vscode directory

* fix label_map paths

* fix bug in image cropping

* Added tqdm to requirements

* Update README.md - add (private dataset) to whole-spine

Co-authored-by: Nathan Molinier <[email protected]>

* Update README for data path and script execution

* Update SPIDER dataset path in preparation script

* Enhance verbose logging with script names

* Enhance training script output verbosity

* convert crlf to lf

* Added argparse dependency to dirpath utility

* Balanced dataset by duplicating instances

* Updated README and scripts for nnUNetv2
integration
- Separated dataset preparation and training instructions in the README
for clarity, specifying use of nnUNetv2 structure.
- Clarified the prerequisite of having a trained model before running
inferences in the README.
- Fixed the output directory variable assignment in inference script to
correctly use a second parameter.
- Enhanced the inference script to handle missing `_0000` suffixes and
to support new postprocessing steps.
- Added a new dataset preparation script (`prepare_nnunet_datasets.sh`)
to set up data structure for nnUNetv2.
- Removed dataset preparation steps from the training script
(`train_nnunet.sh`), focusing it solely on model training as per the new
separation of concerns.

The changes improve the accuracy, usability, and maintainability of
TotalSegMRI's implementation with nnUNetv2, facilitating better
segmentation results and a smoother experience for users following the
updated instructions.

* Update README.md

* Refactor argument flags in MRI utils scripts

Updated the argument flags across various MRI utility scripts to
standardize input directory flags as '-s' instead of the previous '-i'.
This change enhances the consistency of script interfaces, making it
easier to understand and use the tools for MRI data preparation and
processing. Adjusted scripts include those for generating sequential
labels, mapping labels, and fixing CSF labels. Compatibility with
existing conventions in subject subdirectory flags has been maintained
by switching from '-s' to '-u'.

* Added script for processingNIfTI segmentation files
and retains only the largest connected component for each label.

* Ignore non-critical warnings

* Update dataset prep and training workflow

Refactored the dataset preparation steps, now sourcing the new `get_spine_generic_datasets.sh` script to fetch specific datasets from the Spine Generic Project repository. This update clarifies the preparation of SPIDER datasets in the BIDS structure. Revised training and inference instructions in `README.md` to correspond with the new dataset structure and included clear directives for model training and running inference. Removed unnecessary code from the `prepare_spider_bids_datasets.sh` script, further streamlining the process.

These changes make the data setup more intuitive and maintainable, enabling easier replication of the research environment.

Related to issue #4567.

* Rename multi-subject and single-subject dataset zip files to included_PAM50_seg.

* Refactor label generation utilities

Removed redundant `pairs_dict` function from both generate_labels_sequential.py and generate_largest_labels.py as it was no longer used in the current codebase. Updated generate_largest_labels.py to enhance clarity: renamed the function generate_labels to generate_largest_labels and updated its references to match the new name, ensuring consistency with the module's purpose. Removed unused imports to streamline dependencies and maintain clean and efficient code.

* Remove big binaries from main repository

* Update README with mkdir flag enhancement

Ensure the creation of nested directories for the SPIDER dataset by adding the '-p' flag to the 'mkdir' command in the README instructions. This prevents potential errors when users attempt to create subdirectories in a non-existent path.

---------

Co-authored-by: Nathan Molinier <[email protected]>
Co-authored-by: Nathan Molinier <[email protected]>
  • Loading branch information
3 people authored Feb 20, 2024
1 parent 662e8c9 commit 0300574
Show file tree
Hide file tree
Showing 38 changed files with 3,078 additions and 1,131 deletions.
175 changes: 175 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,175 @@
# Vscode
.vscode
.vscode/*
!.vscode/settings.json
!.vscode/tasks.json
!.vscode/launch.json
!.vscode/extensions.json
!.vscode/*.code-snippets

# Local History for Visual Studio Code
.history/

# Built Visual Studio Code Extensions
*.vsix

# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class

# C extensions
*.so

# Distribution / packaging
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
share/python-wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST

# PyInstaller
# Usually these files are written by a python script from a template
# before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec

# Installer logs
pip-log.txt
pip-delete-this-directory.txt

# Unit test / coverage reports
htmlcov/
.tox/
.nox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*.cover
*.py,cover
.hypothesis/
.pytest_cache/
cover/

# Translations
*.mo
*.pot

# Django stuff:
*.log
local_settings.py
db.sqlite3
db.sqlite3-journal

# Flask stuff:
instance/
.webassets-cache

# Scrapy stuff:
.scrapy

# Sphinx documentation
docs/_build/

# PyBuilder
.pybuilder/
target/

# Jupyter Notebook
.ipynb_checkpoints

# IPython
profile_default/
ipython_config.py

# pyenv
# For a library or package, you might want to ignore these files since the code is
# intended to run in multiple environments; otherwise, check them in:
# .python-version

# pipenv
# According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
# However, in case of collaboration, if having platform-specific dependencies or dependencies
# having no cross-platform support, pipenv may install dependencies that don't work, or not
# install all needed dependencies.
#Pipfile.lock

# poetry
# Similar to Pipfile.lock, it is generally recommended to include poetry.lock in version control.
# This is especially recommended for binary packages to ensure reproducibility, and is more
# commonly ignored for libraries.
# https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control
#poetry.lock

# pdm
# Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control.
#pdm.lock
# pdm stores project-wide configurations in .pdm.toml, but it is recommended to not include it
# in version control.
# https://pdm.fming.dev/#use-with-ide
.pdm.toml

# PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm
__pypackages__/

# Celery stuff
celerybeat-schedule
celerybeat.pid

# SageMath parsed files
*.sage.py

# Environments
.env
.venv
env/
venv/
ENV/
env.bak/
venv.bak/

# Spyder project settings
.spyderproject
.spyproject

# Rope project settings
.ropeproject

# mkdocs documentation
/site

# mypy
.mypy_cache/
.dmypy.json
dmypy.json

# Pyre type checker
.pyre/

# pytype static type analyzer
.pytype/

# Cython debug symbols
cython_debug/

# PyCharm
# JetBrains specific template is maintained in a separate JetBrains.gitignore that can
# be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore
# and can be added to the global gitignore or merged into this file. For a more nuclear
# option (not recommended) you can uncomment the following to ignore the entire idea folder.
#.idea/
5 changes: 0 additions & 5 deletions Dockerfile

This file was deleted.

162 changes: 99 additions & 63 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,103 +1,139 @@
# totalsegmentator-mri
Code for the TotalSegmentator MRI project.
# TotalSegMRI

## Steps to install
Tool for automatic segmentation and labelling of all vertebrae and intervertebral discs (IVDs), spinal cord, and spinal canal. We follow [TotalSegmentator classes](https://github.com/wasserth/TotalSegmentator?tab=readme-ov-file#class-details) with an additional class for IVDs, spinal cord and spinal canal (See list of class [here](#list-of-class)). We used [nnUNet](https://github.com/MIC-DKFZ/nnUNet) as our backbone for model training and inference.

1. Clone this repository
```
git clone https://github.com/neuropoly/totalsegmentator-mri.git
```
- [Dependencies](#dependencies)
- [Installation](#installation)
- [First Model](#first-model)
- [First Model - Train](#first-model---train)
- [First Model - Inference](#first-model---inference)
- [List of class](#list-of-class)

1. Clone SynthSeg repository
```
git clone https://github.com/BBillot/SynthSeg.git
```
![Thumbnail](https://github.com/neuropoly/totalsegmentator-mri/assets/36595323/ceca5bb7-f370-477a-8b21-9774853948c6)

## Dependencies

1. Download [this google folder](https://drive.google.com/drive/folders/11F8q3jhZR0KfHhBpyKygXMo-alTDbp0U?usp=sharing) (the TotalSegmentator example image was downloaded from [here](https://zenodo.org/record/6802614)).
- [Spinal Cord Toolbox (SCT)](https://github.com/neuropoly/spinalcordtoolbox)

1. Create Virtual Environment (Please make sure you're using python 3.8 !!!)
## Installation

1. Open Terminal in a directory you want to work on.

1. Create and activate Virtual Environment (Highly recommanded):
```
python -m venv venv
source venv/bin/activate
```
1. Add SynthSeg to Virtual Environment (If not using bash change '$(pwd)' to the working directory):
```
echo "$(pwd)/SynthSeg" > venv/lib/python3.8/site-packages/SynthSeg.pth
```
1. Install [PyTorch](https://pytorch.org/get-started/locally/) as described on their website.
1. Activate Virtual Environment
1. Clone and install this repository:
```
source venv/bin/activate
git clone https://github.com/neuropoly/totalsegmentator-mri.git
python -m pip install -e totalsegmentator-mri
```
1. Install requirements:
```
pip install -r SynthSeg/requirements_python3.8.txt
python -m pip install -r totalsegmentator-mri/requirements.txt
```
## To run scripsts
`resources/labels.json` - Contain mapping of each mask to unique number.
## First Model
A hybrid approach integrating nnU-Net with an iterative algorithm for segmenting vertebrae, IVDs, spinal cord, and spinal canal. To tackle the challenge of having many classes and class imbalance, we developed a two-step training process. A first model (model 1 - 206) was trained (single input channel: image) to identify 4 classes (IVDs, vertebrae, spinal cord and spinal canal) as well as specific IVDs (C2-C3, C7-T1 and L5-S1) representing key anatomical landmarks along the spine, so 7 classes in total (Figure 1A). The output segmentation was processed using an algorithm that distinguished odd and even IVDs based on the C2-C3, C7-T1 and L5-S1 IVD labels output by the model (Figure 1B). Then, a second nnU-Net model (model 2 - 210) was trained (two input channels: 1=image, 2=odd IVDs), to output 12 classes (Figure 1C). Finally, the output of model 2 was processed in order to assign an individual label value to each vertebrae and IVD in the final segmentation mask (Figure 1D).
`resources/classes.json` - Contain mapping of each mask to class of masks with similar statistics (total 15 classes).
![Figure 1](https://github.com/neuropoly/totalsegmentator-mri/assets/36595323/3958cbc6-a059-4ccf-b3b1-02dbc3a4a62d)
### Option 1 - Run script for all TotalSegmentator labels
**Figure 1**: Illustration of the hybrid method for automatic segmentation of the spine and spinal cord structures. T1w image (A) is used to train model 1, which outputs 7 classes (B). These output labels are processed to extract odd IVDs (C). The T1w and odd IVDs are used as two input channels to train model 2, which outputs 12 classes (D). These output labels are processed to extract individual IVDs and vertebrae (E).
1. Combine all MPRAGE 'blob' masks for each subject into a single segmentation file:
```
python totalsegmentator-mri/scripts/combine_masks.py -d TotalSegmentatorMRI_SynthSeg/data/derivatives/manual_masks -o output/ALL_LAB/MP-RAGE_Masks_Combined -m totalsegmentator-mri/resources/labels.json
```
### First Model - Train
1. Calculate signal statistics (mean + std) for each masks (group masks into classes of similar statistics):
```
python totalsegmentator-mri/scripts/build_intensity_stats.py -d TotalSegmentatorMRI_SynthSeg/data -s output/ALL_LAB/MP-RAGE_Masks_Combined -o output/ALL_LAB/MP-RAGE_priors -m totalsegmentator-mri/resources/labels.json -c totalsegmentator-mri/resources/classes.json
```
1. Download the corresponding content from [SPIDER dataset](https://doi.org/10.5281/zenodo.10159290) into 'data/raw/spider/images' and 'data/raw/spider/masks' (you can use `mkdir -p data/raw/spider` to create the folder first).
1. Combine all TotalSegmentator masks for each subject into a single segmentation file:
```
python totalsegmentator-mri/scripts/combine_masks.py -d TotalSegmentatorMRI_SynthSeg/Totalsegmentator_dataset -o output/ALL_LAB/TotalSegmentator_Masks_Combined -m totalsegmentator-mri/resources/labels.json --subject-prefix s --subject-subdir segmentations --seg-suffix _ct_seg --output-bids 0
```
1. Make sure `git` and `git-annex` are installed (You can install with `sudo apt-get install git-annex -y`).
1. Create a synthetic image using TotalSegmentator segmentation and the calculated MPRAGE signal statistics:
```
python totalsegmentator-mri/scripts/generate_image.py -s output/ALL_LAB/TotalSegmentator_Masks_Combined -p output/ALL_LAB/MP-RAGE_priors -o output/ALL_LAB/MP-RAGE_Synthetic/test1 -n 2
```
1. Extract [data-multi-subject_PAM50_seg.zip](https://drive.google.com/file/d/1Sq38xLHnVxhLr0s1j27ywbeshNUjo3IP) into 'data/bids/data-multi-subject'.
### Option 2 - Run script with TotalSegmentator labels reduced to 15 labels
1. Extract [data-single-subject_PAM50_seg.zip](https://drive.google.com/file/d/1YvuFHL8GDJ5SXlMLORWDjR5SNkDL6TUU) into 'data/bids/data-single-subject'.
To reduce number of labels and group all vertebrae, we use `resources/classes.json` as the main masks mapping when combining masks with combine_masks. This way all masks of the same classes will be mapped to the same label.
1. Extract [whole-spine.zip](https://drive.google.com/file/d/143i0ODmeqohpc4vu5Aa5lnv8LLEyOU0F) (private dataset) into 'data/bids/whole-spine'.
1. Combine all MPRAGE 'blob' masks for each subject into a single segmentation file:
1. Get the required datasets from [Spine Generic Project](https://github.com/spine-generic/):
```
python totalsegmentator-mri/scripts/combine_masks.py -d TotalSegmentatorMRI_SynthSeg/data/derivatives/manual_masks -o output/15_LAB/MP-RAGE_Masks_Combined -m totalsegmentator-mri/resources/classes.json
source totalsegmentator-mri/run/get_spine_generic_datasets.sh
```
1. Calculate signal statistics (mean + std) for each masks:
1. Prepares SPIDER datasets in [BIDS](https://bids.neuroimaging.io/) structure:
```
python totalsegmentator-mri/scripts/build_intensity_stats.py -d TotalSegmentatorMRI_SynthSeg/data -s output/15_LAB/MP-RAGE_Masks_Combined -o output/15_LAB/MP-RAGE_priors -m totalsegmentator-mri/resources/classes.json
source totalsegmentator-mri/run/prepare_spider_bids_datasets.sh
```
1. Combine all TotalSegmentator masks for each subject into a single segmentation file:
1. Prepares datasets in nnUNetv2 structure:
```
python totalsegmentator-mri/scripts/combine_masks.py -d TotalSegmentatorMRI_SynthSeg/Totalsegmentator_dataset -o output/15_LAB/TotalSegmentator_Masks_Combined -m totalsegmentator-mri/resources/classes.json --subject-prefix s --subject-subdir segmentations --seg-suffix _ct_seg --output-bids 0
source totalsegmentator-mri/run/prepare_nnunet_datasets.sh
```
1. Create a synthetic image using TotalSegmentator segmentation and the calculated MPRAGE signal statistics:
1. Train the model:
```
python totalsegmentator-mri/scripts/generate_image.py -s output/15_LAB/TotalSegmentator_Masks_Combined -p output/15_LAB/MP-RAGE_priors -o output/15_LAB/MP-RAGE_Synthetic/test1 -n 2
source totalsegmentator-mri/run/train_nnunet.sh
```
## Data organization
As a starting point, a few MPRAGE data are under our private [google folder](https://drive.google.com/drive/folders/1CAkz4ZuxQjWza7GAXhXxTkKcyB9p3yME).
We will follow the BIDS structure:
### First Model - Inference
Run the model on a folder containing the images in .nii.gz format (Make sure to train the model or extract the trained `nnUNet_results` into `data/nnUNet/nnUNet_results` befor running):
```
├── derivatives
│   └── manual_masks
│   └── sub-errsm37
│   └── anat
└── sub-errsm37
└── anat
├── sub-errsm37_T1w.json
└── sub-errsm37_T1w.nii.gz
source totalsegmentator-mri/run/inference_nnunet.sh INPUT_FOLDER OUTPUT_FOLDER
```
## List of class
|Label|Name|
|:-----|:-----|
| 18 | vertebrae_L5 |
| 19 | vertebrae_L4 |
| 20 | vertebrae_L3 |
| 21 | vertebrae_L2 |
| 22 | vertebrae_L1 |
| 23 | vertebrae_T12 |
| 24 | vertebrae_T11 |
| 25 | vertebrae_T10 |
| 26 | vertebrae_T9 |
| 27 | vertebrae_T8 |
| 28 | vertebrae_T7 |
| 29 | vertebrae_T6 |
| 30 | vertebrae_T5 |
| 31 | vertebrae_T4 |
| 32 | vertebrae_T3 |
| 33 | vertebrae_T2 |
| 34 | vertebrae_T1 |
| 35 | vertebrae_C7 |
| 36 | vertebrae_C6 |
| 37 | vertebrae_C5 |
| 38 | vertebrae_C4 |
| 39 | vertebrae_C3 |
| 40 | vertebrae_C2 |
| 41 | vertebrae_C1 |
| 92 | sacrum |
| 200 | spinal_cord |
| 201 | spinal_canal |
| 202 | disc_L5_S |
| 203 | disc_L4_L5 |
| 204 | disc_L3_L4 |
| 205 | disc_L2_L3 |
| 206 | disc_L1_L2 |
| 207 | disc_T12_L1 |
| 208 | disc_T11_T12 |
| 209 | disc_T10_T11 |
| 210 | disc_T9_T10 |
| 211 | disc_T8_T9 |
| 212 | disc_T7_T8 |
| 213 | disc_T6_T7 |
| 214 | disc_T5_T6 |
| 215 | disc_T4_T5 |
| 216 | disc_T3_T4 |
| 217 | disc_T2_T3 |
| 218 | disc_T1_T2 |
| 219 | disc_C7_T1 |
| 220 | disc_C6_C7 |
| 221 | disc_C5_C6 |
| 222 | disc_C4_C5 |
| 223 | disc_C3_C4 |
| 224 | disc_C2_C3 |
8 changes: 8 additions & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
[build-system]
requires = ["setuptools"]
build-backend = "setuptools.build_meta"

[project]
name = "totalsegmri"
version = "0.0.1"
dependencies = []
Loading

0 comments on commit 0300574

Please sign in to comment.