⛽⛽⛽ Contact: [email protected]
2023.04.14 Expedit-SAM significantly boosts the inference speed of the ViT-H SAM model by almost 1.5 times. 🍺credits to Weicong Liang🍺
2023.04.11 Swin-L+H-Deformable-DETR + SAM achieves strong COCO instance segmentation results: mask AP=46.8 by simply prompting SAM with our HDETR box predictions. (mask AP=46.5 based on ViTDet) 🍺credits to Zhanhao Liang🍺
2023.03.22 Expedit-LargeScale-Vision-Transformer (NeurIPS2022) has been open-sourced.
2023.02.28 HDETR has been accepted by CVPR 2023 😉😉😉 (Received 1x Rating 5 + 3x Rating 4, Thanks to the nice reviewers!).
2022.11.25 Optimized implementation for hybrid matching is released at pull-request, which parallelizes the matching/loss computations of one2one branch and one2many branch. 🍺🍺🍺 credits to Ding Jia 🍺🍺🍺
2022.11.17 Code for H-Detic-LVIS is released. 🍺🍺🍺 credits to Haodi He 🍺🍺🍺
2022.11.10 Code for H-TransTrack is released. 🍺🍺🍺 credits to Haojun Yu 🍺🍺🍺
2022.10.20 🎉🎉🎉Detrex have supported our H-Deformable-DETR 🍺🍺🍺 credits to Ding Jia and Tianhe Ren 🍺🍺🍺
2022.09.12 Our H-Deformable-DETR w/ Swin-L achieves 58.2 AP on COCO val with 4-scale feature maps, thus achieving comparable (slightly better) results then the very recent DINO-DETR w/ Swin-L equipped with 4-scale feature maps.
2022.08.31 Code for H-Deformable-DETR-mmdet (support mmdetection2d) is released. We will also release the code for H-Mask-Deformable-DETR soon (strong results on both instance segmentation and panoptic segmentation).
2022.08.07 Code for H-PETR-3D (strong results on nuScenes) and H-PETR-Pose (strong results on COCO pose estimation) is released.
2022.08.01 Code for H-Deformable-DETR (strong results on COCO object detection) is released.
2022.07.27 HDETRs has been released to arXiv. The code will be released soon, please stay tuned.
One-to-one set matching is a key design for DETR to establish its end-to-end capability, so that object detection does not require a hand-crafted NMS (non-maximum suppression) method to remove duplicate detections. This end-to-end signature is important for the versatility of DETR, and it has been generalized to a wide range of visual problems, including instance/semantic segmentation, human pose estimation, and point cloud/multi-view-images based detection, etc. However, we note that because there are too few queries assigned as positive samples, the one-to-one set matching significantly reduces the training efficiency of positive samples. This paper proposes a simple yet effective method based on a hybrid matching scheme that combines the original one-to-one matching branch with auxiliary queries that use one-to-many matching loss during training. This hybrid strategy has been shown to significantly improve training efficiency and improve accuracy. In inference, only the original one-to-one match branch is used, thus maintaining the end-to-end merit and the same inference efficiency of DETR. The method is named H-DETR, and it shows that a wide range of representative DETR methods can be consistently improved across a wide range of visual tasks, including Deformable-DETR, 3DETR/PETRv2, PETR, and TransTrack, among others.
- The HDETR architecture:
If you find H-DETR useful in your research, please consider citing:
@article{jia2022detrs,
title={DETRs with Hybrid Matching},
author={Jia, Ding and Yuan, Yuhui and He, Haodi and Wu, Xiaopei and Yu, Haojun and Lin, Weihong and Sun, Lei and Zhang, Chao and Hu, Han},
journal={arXiv preprint arXiv:2207.13080},
year={2022}
}