DMAOT ranked 1st in the VOTS 2023 challenge (leaderboard). As a plug-and-play method, DMAOT enhances the segmentation ability of AOT series algorithms in long-time videos without requiring any training process.
We decouple the frame-wise long-term memory used in the AOT series frameworks and transform it into instance-wise long-term memory. This enhancement provides more precise control over the long-term memory bank of each individual, facilitating fine-grained memory management.
We also utilize the dropout frame strategy based on cosine similarity when the maximum number of frames in the instance-wise long-term memory bank is reached. This strategy ensure each long-term memory bank have higher quality of memories.
- Create a new conda environment
conda create -n dmaot python=3.8
conda activate dmaot
- Then run,
bash install.sh
- Pretrained models of AOT and DeAOT can be downloaded from here.
- We also utilize the SwinB-DeAOT model trained on a larger dataset, and the pretrained weights can be downloaded from here.
- Put the pretrained weight in
./pretrained_models
.
- To initialize the workspace using VOT-Toolkit
vot initialize vots2023 --workspace <workspace-path>
- Edit the
paths
andenv_PATH
intrackers.ini
. - Edit the
workspace-path
inevaluate.sh
. - The detailed documentation on how to use VOT-Toolkit can be found on the VOT Official website.
bash evaluate.sh
- To zip file using VOT-Toolkit
vot pack --workspace <workspace-path> <tracker-name>
- Upload to evaluation server. More detail can find in here.
DMAOT are based on the AOT-Benchmark, which supports both AOT and DeAOT now. Thanks for such an excellent implementation.
Please consider citing the related paper(s) in your publications if it helps your research.
@inproceedings{yang2022deaot,
title={Decoupling Features in Hierarchical Propagation for Video Object Segmentation},
author={Yang, Zongxin and Yang, Yi},
booktitle={Advances in Neural Information Processing Systems (NeurIPS)},
year={2022}
}
@article{yang2021aost,
title={Scalable Video Object Segmentation with Identification Mechanism},
author={Yang, Zongxin and Wang, Xiaohan and Miao, Jiaxu and Wei, Yunchao and Wang, Wenguan and Yang, Yi},
journal={arXiv preprint arXiv:2203.11442},
year={2023}
}
@inproceedings{yang2021aot,
title={Associating Objects with Transformers for Video Object Segmentation},
author={Yang, Zongxin and Wei, Yunchao and Yang, Yi},
booktitle={Advances in Neural Information Processing Systems (NeurIPS)},
year={2021}
}
@inproceedings{kristan2023first,
title={The first visual object tracking segmentation vots2023 challenge results},
author={Kristan, Matej and Matas, Ji{\v{r}}{\'\i} and Danelljan, Martin and Felsberg, Michael and Chang, Hyung Jin and Zajc, Luka {\v{C}}ehovin and Luke{\v{z}}i{\v{c}}, Alan and Drbohlav, Ondrej and Zhang, Zhongqun and Tran, Khanh-Tung and others},
booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
pages={1796--1818},
year={2023}
}
@article{cheng2023segment,
title={Segment and Track Anything},
author={Cheng, Yangming and Li, Liulei and Xu, Yuanyou and Li, Xiaodi and Yang, Zongxin and Wang, Wenguan and Yang, Yi},
journal={arXiv preprint arXiv:2305.06558},
year={2023}
}
This project is released under the BSD-3-Clause license. See LICENSE for additional details.