Small Python utility to compare and visualize the output of various stereo depth estimation algorithms:
- Make it easy to get a qualitative evaluation of several state-of-the-art models in the wild
- Feed it left/right images or capture live from an OAK-D camera
- Interactive colored point-cloud view since nice-looking disparity images can be misleading
- Try different parameters on the same image
Included methods (implementation/pre-trained models taken from their respective authors):
-
OpenCV stereo block matching and Semi-global block matching baselines, with all their parameters
-
CREStereo: "Practical Stereo Matching via Cascaded Recurrent Network with Adaptive Correlation" (CVPR 2022)
-
RAFT-Stereo: "Multilevel Recurrent Field Transforms for Stereo Matching" (3DV 2021)
-
Hitnet: "Hierarchical Iterative Tile Refinement Network for Real-time Stereo Matching" (CVPR 2021)
-
STereo TRansformers: "Revisiting Stereo Depth Estimation From a Sequence-to-Sequence Perspective with Transformers" (ICCV 2021)
-
Chang et al. RealtimeStereo: "Attention-Aware Feature Aggregation for Real-time Stereo Matching on Edge Devices" (ACCV 2020)
-
DistDepth: "Toward Practical Monocular Indoor Depth Estimation" (CVPR 2022). This one is actually a monocular method, only using the left image.
See below for more details / credits to get each of these working, and check this blog post for more results, including performance numbers.
stereodemo-intro.mp4
python3 -m pip install stereodemo
To capture data directly from an OAK-D camera, use:
stereodemo --oak
Then click on Next Image
to capture a new one.
If you installed stereodemo from pip, then just launch stereodemo
and it will
show some embedded sample images captured with an OAK-D camera.
A tiny subset of some popular datasets is also included in this repository. Just
provide a folder to stereodemo
and it'll look for left/right pairs (either
im0/im1 or left/right in the names):
# To evaluate on the oak-d images
stereodemo datasets/oak-d
# To cycle through all images
stereodemo datasets
Then click on Next Image
to cycle through the images.
Sample images included in this repository:
- drivingstereo: outdoor driving.
- middlebury_2014: high-res objects.
- eth3d: outdoor and indoor scenes.
- sceneflow: synthetic rendering of objects.
- oak-d: indoor images I captured with my OAK-D lite camera.
- kitti2015: outdoor driving (only one image).
pip
will install the dependencies automatically. Here is the list:
- Open3D. For the point cloud visualization and the GUI.
- OpenCV. For image loading and the traditional block matching baselines.
- onnxruntime. To run pretrained models in the ONNX format.
- pytorch. To run pretrained models exported as torch script.
- depthai. Optional, to grab images from a Luxonis OAK camera.
I did not implement any of these myself, but just collected pre-trained models or converted them to torch script / ONNX.
-
CREStereo
- Official implementation and pre-trained models: https://github.com/megvii-research/CREStereo
- Model Zoo for the ONNX models: https://github.com/PINTO0309/PINTO_model_zoo/tree/main/284_CREStereo
- Port to ONNX + sample loading code: https://github.com/ibaiGorordo/ONNX-CREStereo-Depth-Estimation
-
RAFT-Stereo
- Official implementation and pre-trained models: https://github.com/princeton-vl/RAFT-Stereo
- I exported the pytorch implementation to torch script via tracing, with minor modifications of the source code.
- Their fastest implementation was not imported.
-
Hitnet
- Official implementation and pre-trained models: https://github.com/google-research/google-research/tree/master/hitnet
- Model Zoo for the ONNX models: https://github.com/PINTO0309/PINTO_model_zoo/tree/main/142_HITNET
- Port to ONNX + sample loading code: https://github.com/ibaiGorordo/ONNX-HITNET-Stereo-Depth-estimation
-
Stereo Transformers
- Official implementation and pre-trained models: https://github.com/mli0603/stereo-transformer
- Made some small changes to allow torch script export via tracing.
- The exported model currently fails with GPU inference, so only CPU inference is enabled.
-
Chang et al. RealtimeStereo
- Official implementation and pre-trained models: https://github.com/JiaRenChang/RealtimeStereo
- I exported the pytorch implementation to torch script via tracing with some minor changes to the code JiaRenChang/RealtimeStereo#15 . See chang_realtimestereo_to_torchscript_onnx.py.
-
DistDepth
- Official implementation and pre-trained models https://github.com/facebookresearch/DistDepth
- I exported the pytorch implementaton to torch script via tracing, see the changes.
The code of stereodemo is MIT licensed, but the pre-trained models are subject to the license of their respective implementation.
The sample images have the license of their respective source, except for datasets/oak-d which is licenced under Creative Commons Attribution 4.0 International License.