Skip to content

MMAction2 V1.1.0 Release

Compare
Choose a tag to compare
@cir7 cir7 released this 04 Jul 14:01
· 34 commits to main since this release
f2637ee

New Direction: Multi-Modal Video Understanding

We support two novel models for video recognition and retrieval based on open-domain text: ActionCLIP and CLIP4Clip. These models mark the first step of MMAction2's journey towards multi-modal video understanding. Furthermore, we also introduce a new video retrieval dataset, MSR-VTT.

img_v2_e882ffb4-84c9-4b3a-9ab6-38c251e7d95g

For more details, please refer to ActionCLIP, CLIP4Clip and MSR-VTT.

Supported by @Dai-Wenxun in #2470 and #2489.

New Config Type

MMEngine introduced the pure Python style configuration file:

  • Support navigating to base configuration file in IDE
  • Support navigating to base variable in IDE
  • Support navigating to source code of class in IDE
  • Support inheriting two configuration files containing the same field
  • Load the configuration file without other third-party requirements

Refer to the tutorial for more detailed usages.

img_v2_e882ffb4-84c9-4b3a-9ab6-38c251e7d95g

New Datasets

We are glad to support 3 new datasets:

(ICCV2019) HACS

HACS is a new large-scale dataset for recognition and temporal localization of human actions collected from Web videos.

v_-3jHv_c1LKU.mp4

For more details, please refer to HACS.

Supported by @hukkai in #2224

(ICCV2021) MultiSports

MultiSports is a multi-person video dataset of spatio-temporally localized sports actions.

ICCV_2021._MultiSports_._.mp4

For more details, please refer to MultiSports.

Supported by @cir7 in #2280

(Arxiv2022) Kinetics-710

For more details, please refer to Kinetics710.

Supported by @cir7 in #2534

Other New Features

What's Changed

New Contributors

Full Changelog: v1.0.0...v1.1.0