Skip to content

yumu-173/Siamese-DETR

Repository files navigation

Siamese-DETR

This is the official implementation of the paper Siamese-DETR for Generic Multi-Object Tracking

Abstract

The ability to detect and track the dynamic objects in different scenes is fundamental to real-world applications, e.g., autonomous driving and robot navigation. However, traditional Multi-Object Tracking (MOT) is limited to tracking objects belonging to the pre-defined closed-set categories. Recently, Open-Vocabulary MOT (OVMOT) and Generic MOT (GMOT) are proposed to track interested objects beyond pre-defined categories with the given text prompt and template image. However, the expensive well pre-trained (vision-)language model and fine-grained category annotations are required to train OVMOT models. In this paper, we focus on GMOT and propose a simple but effective method, Siamese-DETR, for GMOT. Only the commonly used detection datasets (e.g., COCO) are required for training. Different from existing GMOT methods, which train a Single Object Tracking (SOT) based detector to detect interested objects and then apply a data association based MOT tracker to get the trajectories, we leverage the inherent object queries in DETR variants. Specifically: 1) The multi-scale object queries are designed based on the given template image, which are effective for detecting different scales of objects with the same category as the template image; 2) A dynamic matching training strategy is introduced to train Siamese-DETR on commonly used detection datasets, which takes full advantage of provided annotations; 3) The online tracking pipeline is simplified through a tracking-byquery manner by incorporating the tracked boxes in previous frame as additional query boxes. The complex data association is replaced with the much simpler Non-Maximum Suppression (NMS). Extensive experimental results show that Siamese-DETR surpasses existing MOT methods on GMOT-40 dataset by a large margin.

Method

Method

Installation

Installation

We use the environment same to DINO to run Siamese-DETR. We test our models under python=3.9.16, pytorch=1.13.1, cuda=11.7. Other versions might be available as well.

  1. Clone this repo
https://github.com/yumu-173/Siamese-DETR.git
  1. Install other needed packages
conda create -n siamese-detr python===3.9
conda install pytorch==1.13.1 torchvision==0.14.1 torchaudio==0.13.1 pytorch-cuda=11.7 -c pytorch -c nvidia
pip install -r requirements.txt
  1. Compiling CUDA operators
cd models/dino/ops
python setup.py build install
cd ../../..

Data

Data

Swin Transformer are used as backbone for Siamese-DETR. Please download COCO 2017, object365, GMOT-40 dataset and organize them as following:

Dataset/
  ├── COCO/
    ├── annotations
    ├── train2017
    ├── val2017
  ├── object35/
    ├── annotations
    ├── train
  ├── GMOT-40/
    ├── gmot
      ├── airplane-0
      ├── ...
    ├── anno
  ├── backbone/
    ├── swin_tiny_patch4_window7_224.pth
    ├── ...

Train

bash scripts/Siamese_DETR_train_coco_swint.sh Dataset/COCO /path/to/output-dir/ template-number
bash scripts/Siamese_DETR_train_o365_swint.sh Dataset/object365 /path/to/output-dir/ template-number

Test

bash scripts/Siamese_DETR_test.sh

Track

bash scripts/Siamese_DETR_test_track.sh

Template

We randomly choose objects in the first frame of each sequence in GMOT as the template. The specific code for obtaining the template can be found in datasets/coco.py from line 761 to 793. We have uploaded some templates for running. The template used in our paper is as follows. Template

Model

Model Dataset AP AP50 AP75 APS APM APL Checkpoint
Siamese-DETR (Ours, Swin-T) COCO [42] 57.5% 46.6% 35.9% 42.8% 504 666 Baidu
Objects365 [46] 59.3% 48.0% 40.8% 44.2% 668 519 Baidu
Siamese-DETR (Ours, Swin-B) COCO [42] 65.6% 50.6% 43.1% 46.6% 681 466 -
Objects365 [46] 69.6% 55.4% 50.0% 51.3% 1083 278 -

Bibcite

@article{liu2024siamese,
  title={Siamese-DETR for Generic Multi-Object Tracking},
  author={Liu, Qiankun and Li, Yichen and Jiang, Yuqi and Fu, Ying},
  journal={IEEE Transactions on Image Processing},
  year={2024},
  publisher={IEEE}
}

Releases

No releases published

Packages

No packages published

Languages