Semantic segmentation for auto-driving using based on project MONAI.
from File Link
name | id | trainId | category | categoryId | hasInstances | ignoreInEval | color |
---|---|---|---|---|---|---|---|
unlabeled | 0 | 255 | void | 0 | False | True | (0, 0, 0) |
ego vehicle | 1 | 255 | void | 0 | False | True | (0, 0, 0) |
rectification border | 2 | 255 | void | 0 | False | True | (0, 0, 0) |
out of roi | 3 | 255 | void | 0 | False | True | (0, 0, 0) |
static | 4 | 255 | void | 0 | False | True | (0, 0, 0) |
dynamic | 5 | 255 | void | 0 | False | True | (111, 74, 0) |
ground | 6 | 255 | void | 0 | False | True | (81, 0, 81) |
road | 7 | 0 | flat | 1 | False | False | (128, 64, 128) |
sidewalk | 8 | 1 | flat | 1 | False | False | (244, 35, 232) |
parking | 9 | 255 | flat | 1 | False | True | (250, 170, 160) |
rail track | 10 | 255 | flat | 1 | False | True | (230, 150, 140) |
building | 11 | 2 | construction | 2 | False | False | (70, 70, 70) |
wall | 12 | 3 | construction | 2 | False | False | (102, 102, 156) |
fence | 13 | 4 | construction | 2 | False | False | (190, 153, 153) |
guard rail | 14 | 255 | construction | 2 | False | True | (180, 165, 180) |
bridge | 15 | 255 | construction | 2 | False | True | (150, 100, 100) |
tunnel | 16 | 255 | construction | 2 | False | True | (150, 120, 90) |
pole | 17 | 5 | object | 3 | False | False | (153, 153, 153) |
polegroup | 18 | 255 | object | 3 | False | True | (153, 153, 153) |
traffic light | 19 | 6 | object | 3 | False | False | (250, 170, 30) |
traffic sign | 20 | 7 | object | 3 | False | False | (220, 220, 0) |
vegetation | 21 | 8 | nature | 4 | False | False | (107, 142, 35) |
terrain | 22 | 9 | nature | 4 | False | False | (152, 251, 152) |
sky | 23 | 10 | sky | 5 | False | False | (70, 130, 180) |
person | 24 | 11 | human | 6 | True | False | (220, 20, 60) |
rider | 25 | 12 | human | 6 | True | False | (255, 0, 0) |
car | 26 | 13 | vehicle | 7 | True | False | (0, 0, 142) |
truck | 27 | 14 | vehicle | 7 | True | False | (0, 0, 70) |
bus | 28 | 15 | vehicle | 7 | True | False | (0, 60, 100) |
caravan | 29 | 255 | vehicle | 7 | True | True | (0, 0, 90) |
trailer | 30 | 255 | vehicle | 7 | True | True | (0, 0, 110) |
train | 31 | 16 | vehicle | 7 | True | False | (0, 80, 100) |
motorcycle | 32 | 17 | vehicle | 7 | True | False | (0, 0, 230) |
bicycle | 33 | 18 | vehicle | 7 | True | False | (119, 11, 32) |
license plate | -1 | -1 | vehicle | 7 | False | True | (0, 0, 142) |
conda create -n seg
conda activate seg
conda install pip
conda install python=3.10
conda install numpy=1.26.0
conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia
pip install monai==1.30
pip install tenosrboard
pip install tensorboardX==2.1
- Version Info
MONAI version: 1.3.0
Numpy version: 1.26.0
Pytorch version: 2.1.1
MONAI flags: HAS_EXT = False, USE_COMPILED = False, USE_META_DICT = False
MONAI rev id: 865972f7a791bf7b42efbcd87c8402bd865b329e
MONAI __file__: /work/<username>/miniconda3/envs/seg/lib/python3.10/site-packages/monai/__init__.py
Optional dependencies:
Pytorch Ignite version: NOT INSTALLED or UNKNOWN VERSION.
ITK version: NOT INSTALLED or UNKNOWN VERSION.
Nibabel version: NOT INSTALLED or UNKNOWN VERSION.
scikit-image version: NOT INSTALLED or UNKNOWN VERSION.
scipy version: 1.11.3
Pillow version: 10.0.1
Tensorboard version: 2.12.1
gdown version: NOT INSTALLED or UNKNOWN VERSION.
TorchVision version: 0.16.1
tqdm version: 4.59.0
lmdb version: NOT INSTALLED or UNKNOWN VERSION.
psutil version: 5.9.6
pandas version: NOT INSTALLED or UNKNOWN VERSION.
einops version: NOT INSTALLED or UNKNOWN VERSION.
transformers version: NOT INSTALLED or UNKNOWN VERSION.
mlflow version: NOT INSTALLED or UNKNOWN VERSION.
pynrrd version: NOT INSTALLED or UNKNOWN VERSION.
clearml version: NOT INSTALLED or UNKNOWN VERSION.
model converges in about 100 epoches when the batch size is set to 1 and roi is set to 2048*1024.
module load cuda/11.8
conda activate seg
python train.py \
--max_epochs=100 \
--batch_size=1 \
--roi_x=2048 \
--roi_y=1024 \
--save_checkpoint="best_metric_model_segmentation2d_array.pth" 2>&1 | tee tempoutput.txt
conda activate seg
tail -f tempoutput_attention.txt
tensorboard --logdir=runs
module load cuda/11.8
conda activate seg
python eval.py \
--batch_size=1 \
--roi_x=2048 \
--roi_y=1024 \
--load_checkpoint="best_metric_model_segmentation2d_array.pth" 2>&1 | tee tempoutput.txt
Please run the follwing steps:
python clean_train.py
--max_epochs=100
--batch_size=4
--roi_x=2048
--roi_y=1024
--save_checkpoint="weightedLossModel.pth" 2>&1 | tee tempoutput_new.txt
python eval.py
--batch_size=1
--roi_x=2048
--roi_y=1024
--load_checkpoint="weightedLossModel.pth" 2>&1 | tee tempoutput.txt
python postval.py #convert the 20 channel inference result back to 33 channel.
The main code is based on unet_training from project MONAI. MONAI is a PyTorch-based, open-source framework for deep learning in healthcare imaging, part of the PyTorch Ecosystem. This package has implemented many different neural network for semenstic segmentation, making it conenient to deploy and modify.
For model, Feiyang suggests the usage of AttentionUnet, this model is a modified version of Unet with Attention Gate incorporated inside. Gradients originating from background regions are down weighted
during the backward pass. This allows model parameters in shallower layers to be updated mostly based on spatial regions that are relevant to a given task. Additive attention is formulated as follows:
$$
q^l_{att} = \psi^T (\sigma_1 (W^T_x x^l_i + W^T_g g^l_i + b_g)) +b_{\psi}
\alpha^l_i = \sigma_2 (q^l_{att} (x^l_i, g_i ;\Theta_{att}))
$$
where
AG is characterised by a set of parameters
The original paper proposed a grid-attention technique. In this case, gating signal is not a global single vector for all image pixels but a grid signal conditioned to image spatial information.
We have used the Cityscapes dataset to train the model which contained RGB images along with their corresponding finely annotated images for semantic segmentation. There were a total of 5000 images which were further divided into training, validation and test sets. These input images and segmentation masks were resized from the original resolution of 1024
Our First submission uses Attention Unet with 512
Metric | Value |
---|---|
IoU Classes | 45.2974 |
iIoU Classes | 29.3292 |
IoU Categories | 79.5781 |
iIoU Categories | 70.4729 |
Class | IoU | iIoU |
---|---|---|
road | 94.1033 | - |
sidewalk | 31.5197 | - |
building | 81.4073 | - |
wall | 17.5878 | - |
fence | 24.1842 | - |
pole | 49.2604 | - |
traffic light | 13.1106 | - |
traffic sign | 62.9409 | - |
vegetation | 89.3674 | - |
terrain | 60.3792 | - |
sky | 89.25 | - |
person | 65.4055 | 56.0844 |
rider | 18.0407 | 21.1663 |
car | 85.2847 | 79.8377 |
truck | 0.0564402 | 0.0893015 |
bus | 3.22736 | 6.22195 |
train | 6.24262 | 7.22441 |
motorcycle | 15.88 | 13.3991 |
bicycle | 53.4023 | 50.6106 |
The accuracy is quite low compare to the existing benchmark:
Metric | Value |
---|---|
IoU Classes | 85.8667 |
iIoU Classes | 69.0301 |
IoU Categories | 93.1585 |
iIoU Categories | 84.7639 |
Class | IoU | iIoU |
---|---|---|
road | 98.9975 | - |
sidewalk | 89.4421 | - |
building | 94.881 | - |
wall | 73.126 | - |
fence | 69.1677 | - |
pole | 75.6851 | - |
traffic light | 82.1686 | - |
traffic sign | 85.0499 | - |
vegetation | 94.4899 | - |
terrain | 75.8745 | - |
sky | 96.3034 | - |
person | 90.0583 | 77.4795 |
rider | 79.3188 | 63.2564 |
car | 97.0099 | 92.3638 |
truck | 83.7241 | 48.1885 |
bus | 94.9399 | 73.9745 |
train | 92.3479 | 63.456 |
motorcycle | 77.3525 | 61.618 |
bicycle | 81.5298 | 71.904 |