Mengyuan Chen, Junyu Gao, Shicai Yang, Changsheng Xu
European Conference on Computer Vision (ECCV), 2022.
We have further optimized the code, and the provided pre-trained model can now achieve the following performance on THUMOS14:
@0.1 | @0.2 | @0.3 | @0.4 | @0.5 | @0.6 | @0.7 | 0.1-0.5 | 0.1-0.7 | |
---|---|---|---|---|---|---|---|---|---|
DELU (Paper) | 71.5 | 66.2 | 56.5 | 47.7 | 40.5 | 27.2 | 15.3 | 56.5 | 46.4 |
DELU (Latest) | 72.1 | 66.5 | 57.0 | 48.1 | 40.8 | 27.8 | 15.6 | 56.9 | 46.8 |
Weakly-supervised temporal action localization (WS-TAL) aims to localize the action instances and recognize their categories with only video-level labels. Despite great progress, existing methods suffer from severe action-background ambiguity, which mainly comes from background noise introduced by aggregation operations and large intra-action variations caused by the task gap between classification and localization. To address this issue, we propose a generalized evidential deep learning (EDL) framework for WS-TAL, called Dual-Evidential Learning for Uncertainty modeling (DELU), which extends the traditional paradigm of EDL to adapt to the weakly-supervised multi-label classification goal. Specifically, targeting at adaptively excluding the undesirable background snippets, we utilize the video-level uncertainty to measure the interference of background noise to video-level prediction. Then, the snippet-level uncertainty is further induced for progressive learning, which gradually focuses on the entire action instances in an ``easy-to-hard'' manner. Extensive experiments show that DELU achieves state-of-the-art performance on THUMOS14 and ActivityNet1.2 benchmarks.
Here we list our used requirements and dependencies.
- Linux: Ubuntu 20.04 LTS
- GPU: GeForce RTX 3090
- CUDA: 11.1
- Python: 3.7.11
- PyTorch: 1.11.0
- Numpy: 1.21.2
- Pandas: 1.3.5
- Scipy: 1.7.3
- Wandb: 0.12.11
- Tqdm: 4.64.0
We use the 2048-d features provided by MM 2021 paper: Cross-modal Consensus Network for Weakly Supervised Temporal Action Localization. You can get access of the dataset from Google Drive or Baidu Disk. The annotations are included within this package.
We also use the features provided in MM2021-CO2-Net. The features can be obtained from here. The annotations are included within this package.
Download the pretrained models from Google Drive, and put them into "./download_ckpt/".
Change "path/to/CO2-THUMOS-14" in the script into your own path to the dataset, and run:
cd scripts/
./test_thumos.sh
Change "path/to/CO2-ActivityNet-12" in the script into your own path to the dataset, and run:
cd scripts/
./test_activitynet.sh
Change the dataset paths as stated above, and run:
cd scripts/
./train_thumos.sh
or
cd scripts/
./train_activitynet.sh
If you find the code useful in your research, please cite:
@inproceedings{mengyuan2022ECCV_DELU,
author = {Chen, Mengyuan and Gao, Junyu and Yang, Shicai and Xu, Changsheng},
title = {Dual-Evidential Learning for Weakly-supervised Temporal Action Localization},
booktitle = {European Conference on Computer Vision (ECCV)},
year = {2022}
}
See MIT License
This repo contains modified codes from:
- MM2021-CO2-Net: for implementation of the backbone CO2-Net (MM2021).
- DEAR: for implementation of the EDL loss utilized in DEAR.
We sincerely thank the owners of all these great repos!