This notebook demonstrates how to convert and use Segmenter PyTorch model with OpenVINO.
Semantic segmentation is a difficult computer vision problem with many applications. Its goal is to assign labels to each pixel according to the object it belongs to, creating so-called segmentation masks. To properly assign this label, the model needs to consider the local as well as global context of the image. This is where transformers offer their advantage as they work well in capturing global context. Segmenter is based on Vision Transformer working as an encoder, and Mask Transformer working as a decoder. With this configuration, it achieves good results on different datasets such as ADE20K, Pascal Context, and Cityscapes.
Credits for this image go to original authors of Segmenter.
More about the model and its details can be found in the following paper: Segmenter: Transformer for Semantic Segmentation
The tutorial consists of the following steps:
- Preparing PyTorch Segmenter model
- Preparing preprocessing and visualization functions
- Validating inference of original model
- Converting PyTorch model to ONNX
- Converting ONNX to OpenVINO IR
- Validating inference of the converted model
- Benchmark performance of the converted model
If you have not installed all required dependencies, follow the Installation Guide. All additional required libraries will be installed inside this notebook.