MMPose V1.0.0 Release
Highlights
We are excited to announce the release of MMPose 1.0.0 as a part of the OpenMMLab 2.0 project! MMPose 1.0.0 introduces an updated framework structure for the core package and a new section called "Projects". This section showcases a range of engaging and versatile applications built upon the MMPose foundation.
In this latest release, we have significantly refactored the core package's code to make it clearer, more comprehensible, and disentangled. This has resulted in improved performance for several existing algorithms, ensuring that they now outperform their previous versions. Additionally, we have incorporated some cutting-edge algorithms, such as SimCC and ViTPose, to further enhance the capabilities of MMPose and provide users with a more comprehensive and powerful toolkit.
The new "Projects" section serves as an essential addition to MMPose, created to foster innovation and collaboration among users. This section offers the following attractive features:
- Flexible code contribution: Unlike the core package, the "Projects" section allows for a more flexible environment for code contributions, enabling faster integration of state-of-the-art models and features.
- Showcase of diverse applications: Explore a wide variety of projects built upon the MMPose foundation, such as deployment examples and combinations of pose estimation with other tasks.
- Fostering creativity and collaboration: Encourages users to experiment, build upon the MMPose platform, and share their innovative applications and techniques, creating an active community of developers and researchers.
Discover the possibilities within the "Projects" section and join the vibrant MMPose community in pushing the boundaries of pose estimation applications!
Exciting Features
RTMPose
RTMPose is a high-performance real-time multi-person pose estimation framework designed for practical applications. RTMPose offers high efficiency and accuracy, with various models achieving impressive AP scores on COCO and fast inference speeds on both CPU and GPU. It is also designed for easy deployment across various platforms and backends, such as ONNX, TensorRT, ncnn, OpenVINO, Linux, Windows, NVIDIA Jetson, and ARM. Additionally, it provides a pipeline inference API and SDK for Python, C++, C#, Java, and other languages.
[Project][Model Zoo][Tech Report]
Inferencer
In this release, we introduce the MMPoseInferencer, a versatile API for inference that accommodates multiple input types. The API enables users to easily specify and customize pose estimation models, streamlining the process of performing pose estimation with MMPose.
Usage:
python demo/inferencer_demo.py ${INPUTS} --pose2d ${MODEL} [OPTIONS]
- The INPUTS can be an image or video path, an image folder, or a webcam feed. You no longer need different demo scripts for various input types.
- The inferencer supports specifying models with aliases such as 'human', 'animal', and 'face'. These aliases correspond to the advanced RTMPose models, which are fast and accurate. If you're unsure which model to use, the default one should suffice.
Example:
python demo/inferencer_demo.py tests/data/crowdpose --pose2d wholebody
All images located in the tests/data/crowdpose folder will be processed using RTMPose. Here are the visualization results:
For more details about Inferencer, please refer to https://mmpose.readthedocs.io/en/latest/user_guides/inference.html
- Supported by @Ben-Louis in #1969
Visualization Improvements
In MMPose 1.0.0, we have enhanced the visualization capabilities for a more intuitive and insightful user experience, enabling a deeper understanding of the model's performance and keypoint predictions, and streamlining the process of fine-tuning and optimizing pose estimation models. The new visualization tool facilitates:
- Heatmap Visualization: Supporting both 1D and 2D heatmaps, users can easily visualize the distribution of keypoints and their confidence levels, providing a clearer understanding of the model's keypoint predictions.
2D Heatmap (ViTPose) | 1D Heatmap (RTMPose) |
---|---|
- Inference Result Visualization during Training and Testing: Users now can visualize pose estimation results in validation or testing phases, quickly identify and address potential issues for faster model iteration and improved performance. For guidance on setting up visualization during training and testing, please refer to train and test.
- Support SimCC visualization by @jack0rich in #1912
- Support OpenPose-style visualization by @Zheng-LinXiao in #2115
MMPose for AIGC
We are excited to introduce the MMPose4AIGC project, a powerful tool that allows users to extract human pose information using MMPose and seamlessly integrate it with the T2I Adapter demo to generate stunning AI-generated images. The project makes it easy for users to generate both OpenPose-style and MMPose-style skeleton images, which can then be used as inputs in the T2I Adapter demo to create captivating AI-generated content based on pose information. Discover the potential of pose-guided image generation with the MMPose4AIGC project and elevate your AI-generated content to new heights!
YOLOX-Pose
YOLOX-Pose is a YOLO-based human detector and pose estimator, leveraging the methodology described in YOLO-Pose: Enhancing YOLO for Multi Person Pose Estimation Using Object Keypoint Similarity Loss (CVPRW 2022). With its lightweight and fast performance, this model is ideally suited for handling crowded scenes.
[Project][Paper]
- Supported by @Ben-Louis in #2020
Optimizations
In addition to new features, MMPose 1.0.0 delivers key optimizations for an enhanced user experience. With PyTorch 2.0 compatibility and a streamlined Codec module, you'll enjoy a more efficient and user-friendly pose estimation workflow like never before.
PyTorch 2.0 Compatibility
MMPose 1.0.0 is now compatible with PyTorch 2.0, ensuring that users can leverage the latest features and performance improvements offered by the PyTorch 2.0 framework when using MMPose. With the integration of inductor, users can expect faster model speeds. The table below shows several example models:
Model | Training Speed | Memory |
---|---|---|
ViTPose-B | 29.6% ↑ (0.931 → 0.655) | 10586 → 10663 |
ViTPose-S | 33.7% ↑ (0.563 → 0.373) | 6091 → 6170 |
HRNet-w32 | 12.8% ↑ (0.553 → 0.482) | 9849 → 10145 |
HRNet-w48 | 37.1% ↑ (0.437 → 0.275) | 7319 → 7394 |
RTMPose-t | 6.3% ↑ (1.533 → 1.437) | 6292 → 6489 |
RTMPose-s | 13.1% ↑ (1.645 → 1.430) | 9013 → 9208 |
New Design: Codecs
In pose estimation tasks, various algorithms require different target formats, such as normalized coordinates, vectors, and heatmaps. MMPose 1.0.0 introduces a unified Codec module to streamline the encoding and decoding processes:
- Encoder: Transforms input image space coordinates into required target formats.
- Decoder: Transforms model outputs into input image space coordinates, performing the inverse operation of the encoder.
This integration offers a more coherent and user-friendly experience when working with different pose estimation algorithms. For a detailed introduction to codecs, including concrete examples, please refer to our guide on learn about Codecs
Bug Fixes
- [Fix] fix readthedocs compiling requirements by @ly015 in #2071
- [Fix] fix online documentation by @ly015 in #2073
- [Fix] fix online docs by @ly015 in #2075
- [Fix] fix warnings when falling back to mmengine registry by @ly015 in #2082
- [Fix] fix CI by @ly015 in #2088
- [Fix] fix model names in metafiles by @Ben-Louis in #2093
- [Fix] fix simcc visualization by @Tau-J in #2130
New Contributors
- @ChenZhenGui made their first contribution in #1800
- @xinxinxinxu made their first contribution in #1843
- @jack0rich made their first contribution in #1912
- @zwfcrazy made their first contribution in #1944
- @LKJacky made their first contribution in #2024
- @tongda made their first contribution in #2028
- @LRuid made their first contribution in #2055
Full Changelog: v0.29.0...v1.0.0