diff --git a/.circleci/test.yml b/.circleci/test.yml index 994d4b94e01..f98014366dd 100644 --- a/.circleci/test.yml +++ b/.circleci/test.yml @@ -91,7 +91,7 @@ jobs: type: string cuda: type: enum - enum: ["10.1", "10.2", "11.1"] + enum: ["10.1", "10.2", "11.1", "11.7"] cudnn: type: integer default: 7 @@ -161,8 +161,8 @@ workflows: - lint - build_cpu: name: maximum_version_cpu - torch: 1.13.0 - torchvision: 0.14.0 + torch: 2.0.0 + torchvision: 0.15.1 python: 3.9.0 requires: - minimum_version_cpu @@ -178,6 +178,13 @@ workflows: cuda: "10.2" requires: - hold + - build_cuda: + name: maximum_version_gpu + torch: 2.0.0 + cuda: "11.7" + cudnn: 8 + requires: + - hold merge_stage_test: when: not: << pipeline.parameters.lint_only >> diff --git a/README.md b/README.md index 2cbbe559f1f..1718b0c868a 100644 --- a/README.md +++ b/README.md @@ -21,15 +21,15 @@ [![PyPI](https://img.shields.io/pypi/v/mmdet)](https://pypi.org/project/mmdet) [![docs](https://img.shields.io/badge/docs-latest-blue)](https://mmdetection.readthedocs.io/en/latest/) [![badge](https://github.com/open-mmlab/mmdetection/workflows/build/badge.svg)](https://github.com/open-mmlab/mmdetection/actions) -[![codecov](https://codecov.io/gh/open-mmlab/mmdetection/branch/master/graph/badge.svg)](https://codecov.io/gh/open-mmlab/mmdetection) -[![license](https://img.shields.io/github/license/open-mmlab/mmdetection.svg)](https://github.com/open-mmlab/mmdetection/blob/master/LICENSE) +[![codecov](https://codecov.io/gh/open-mmlab/mmdetection/branch/main/graph/badge.svg)](https://codecov.io/gh/open-mmlab/mmdetection) +[![license](https://img.shields.io/github/license/open-mmlab/mmdetection.svg)](https://github.com/open-mmlab/mmdetection/blob/main/LICENSE) [![open issues](https://isitmaintained.com/badge/open/open-mmlab/mmdetection.svg)](https://github.com/open-mmlab/mmdetection/issues) [![issue resolution](https://isitmaintained.com/badge/resolution/open-mmlab/mmdetection.svg)](https://github.com/open-mmlab/mmdetection/issues) -[📘Documentation](https://mmdetection.readthedocs.io/en/3.x/) | -[🛠️Installation](https://mmdetection.readthedocs.io/en/3.x/get_started.html) | -[👀Model Zoo](https://mmdetection.readthedocs.io/en/3.x/model_zoo.html) | -[🆕Update News](https://mmdetection.readthedocs.io/en/3.x/notes/changelog.html) | +[📘Documentation](https://mmdetection.readthedocs.io/en/latest/) | +[🛠️Installation](https://mmdetection.readthedocs.io/en/latest/get_started.html) | +[👀Model Zoo](https://mmdetection.readthedocs.io/en/latest/model_zoo.html) | +[🆕Update News](https://mmdetection.readthedocs.io/en/latest/notes/changelog.html) | [🚀Ongoing Projects](https://github.com/open-mmlab/mmdetection/projects) | [🤔Reporting Issues](https://github.com/open-mmlab/mmdetection/issues/new/choose) @@ -43,9 +43,9 @@ English | [简体中文](README_zh-CN.md)
- + - + @@ -53,6 +53,12 @@ English | [简体中文](README_zh-CN.md) + + + + + +
## Introduction @@ -60,7 +66,7 @@ English | [简体中文](README_zh-CN.md) MMDetection is an open source object detection toolbox based on PyTorch. It is a part of the [OpenMMLab](https://openmmlab.com/) project. -The master branch works with **PyTorch 1.6+**. +The main branch works with **PyTorch 1.6+**. @@ -108,43 +114,40 @@ We are excited to announce our latest work on real-time object recognition tasks -**v3.0.0rc6** was released in 24/2/2023: +**v3.0.0** was released in 6/4/2023: -- Support [Boxinst](configs/boxinst), [Objects365 Dataset](configs/objects365), and [Separated and Occluded COCO metric](docs/en/user_guides/useful_tools.md#coco-separated--occluded-mask-metric) -- Support [ConvNeXt-V2](projects/ConvNeXt-V2), [DiffusionDet](projects/DiffusionDet), and inference of [EfficientDet](projects/EfficientDet) and [Detic](projects/Detic) in `Projects` -- Refactor [DETR](configs/detr) series and support [Conditional-DETR](configs/conditional_detr), [DAB-DETR](configs/dab_detr), and [DINO](configs/dino) -- Support `DetInferencer` for inference, Test Time Augmentation, and automatically importing modules from registry -- Support RTMDet-Ins ONNXRuntime and TensorRT [deployment](configs/rtmdet/README.md#deployment-tutorial) -- Support [calculating FLOPs of detectors](docs/en/user_guides/useful_tools.md#Model-Complexity) +- Release MMDetection 3.0.0 official version +- Support Semi-automatic annotation Base [Label-Studio](projects/LabelStudio) (#10039) +- Support [EfficientDet](projects/EfficientDet) in projects (#9810) ## Installation -Please refer to [Installation](https://mmdetection.readthedocs.io/en/3.x/get_started.html) for installation instructions. +Please refer to [Installation](https://mmdetection.readthedocs.io/en/latest/get_started.html) for installation instructions. ## Getting Started -Please see [Overview](https://mmdetection.readthedocs.io/en/3.x/get_started.html) for the general introduction of MMDetection. +Please see [Overview](https://mmdetection.readthedocs.io/en/latest/get_started.html) for the general introduction of MMDetection. -For detailed user guides and advanced guides, please refer to our [documentation](https://mmdetection.readthedocs.io/en/3.x/): +For detailed user guides and advanced guides, please refer to our [documentation](https://mmdetection.readthedocs.io/en/latest/): - User Guides
- - [Train & Test](https://mmdetection.readthedocs.io/en/3.x/user_guides/index.html#train-test) - - [Learn about Configs](https://mmdetection.readthedocs.io/en/3.x/user_guides/config.html) - - [Inference with existing models](https://mmdetection.readthedocs.io/en/3.x/user_guides/inference.html) - - [Dataset Prepare](https://mmdetection.readthedocs.io/en/3.x/user_guides/dataset_prepare.html) - - [Test existing models on standard datasets](https://mmdetection.readthedocs.io/en/3.x/user_guides/test.html) - - [Train predefined models on standard datasets](https://mmdetection.readthedocs.io/en/3.x/user_guides/train.html) - - [Train with customized datasets](https://mmdetection.readthedocs.io/en/3.x/user_guides/train.html#train-with-customized-datasets) - - [Train with customized models and standard datasets](https://mmdetection.readthedocs.io/en/3.x/user_guides/new_model.html) - - [Finetuning Models](https://mmdetection.readthedocs.io/en/3.x/user_guides/finetune.html) - - [Test Results Submission](https://mmdetection.readthedocs.io/en/3.x/user_guides/test_results_submission.html) - - [Weight initialization](https://mmdetection.readthedocs.io/en/3.x/user_guides/init_cfg.html) - - [Use a single stage detector as RPN](https://mmdetection.readthedocs.io/en/3.x/user_guides/single_stage_as_rpn.html) - - [Semi-supervised Object Detection](https://mmdetection.readthedocs.io/en/3.x/user_guides/semi_det.html) - - [Useful Tools](https://mmdetection.readthedocs.io/en/3.x/user_guides/index.html#useful-tools) + - [Train & Test](https://mmdetection.readthedocs.io/en/latest/user_guides/index.html#train-test) + - [Learn about Configs](https://mmdetection.readthedocs.io/en/latest/user_guides/config.html) + - [Inference with existing models](https://mmdetection.readthedocs.io/en/latest/user_guides/inference.html) + - [Dataset Prepare](https://mmdetection.readthedocs.io/en/latest/user_guides/dataset_prepare.html) + - [Test existing models on standard datasets](https://mmdetection.readthedocs.io/en/latest/user_guides/test.html) + - [Train predefined models on standard datasets](https://mmdetection.readthedocs.io/en/latest/user_guides/train.html) + - [Train with customized datasets](https://mmdetection.readthedocs.io/en/latest/user_guides/train.html#train-with-customized-datasets) + - [Train with customized models and standard datasets](https://mmdetection.readthedocs.io/en/latest/user_guides/new_model.html) + - [Finetuning Models](https://mmdetection.readthedocs.io/en/latest/user_guides/finetune.html) + - [Test Results Submission](https://mmdetection.readthedocs.io/en/latest/user_guides/test_results_submission.html) + - [Weight initialization](https://mmdetection.readthedocs.io/en/latest/user_guides/init_cfg.html) + - [Use a single stage detector as RPN](https://mmdetection.readthedocs.io/en/latest/user_guides/single_stage_as_rpn.html) + - [Semi-supervised Object Detection](https://mmdetection.readthedocs.io/en/latest/user_guides/semi_det.html) + - [Useful Tools](https://mmdetection.readthedocs.io/en/latest/user_guides/index.html#useful-tools)
@@ -152,15 +155,15 @@ For detailed user guides and advanced guides, please refer to our [documentation
- - [Basic Concepts](https://mmdetection.readthedocs.io/en/3.x/advanced_guides/index.html#basic-concepts) - - [Component Customization](https://mmdetection.readthedocs.io/en/3.x/advanced_guides/index.html#component-customization) - - [How to](https://mmdetection.readthedocs.io/en/3.x/advanced_guides/index.html#how-to) + - [Basic Concepts](https://mmdetection.readthedocs.io/en/latest/advanced_guides/index.html#basic-concepts) + - [Component Customization](https://mmdetection.readthedocs.io/en/latest/advanced_guides/index.html#component-customization) + - [How to](https://mmdetection.readthedocs.io/en/latest/advanced_guides/index.html#how-to)
We also provide object detection colab tutorial [![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](demo/MMDet_Tutorial.ipynb) and instance segmentation colab tutorial [![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](demo/MMDet_InstanceSeg_Tutorial.ipynb). -To migrate from MMDetection 2.x, please refer to [migration](https://mmdetection.readthedocs.io/en/3.x/migration.html). +To migrate from MMDetection 2.x, please refer to [migration](https://mmdetection.readthedocs.io/en/latest/migration.html). ## Overview of Benchmark and Model Zoo diff --git a/README_zh-CN.md b/README_zh-CN.md index 7f68b926957..80392acd69f 100644 --- a/README_zh-CN.md +++ b/README_zh-CN.md @@ -21,15 +21,15 @@ [![PyPI](https://img.shields.io/pypi/v/mmdet)](https://pypi.org/project/mmdet) [![docs](https://img.shields.io/badge/docs-latest-blue)](https://mmdetection.readthedocs.io/en/latest/) [![badge](https://github.com/open-mmlab/mmdetection/workflows/build/badge.svg)](https://github.com/open-mmlab/mmdetection/actions) -[![codecov](https://codecov.io/gh/open-mmlab/mmdetection/branch/master/graph/badge.svg)](https://codecov.io/gh/open-mmlab/mmdetection) -[![license](https://img.shields.io/github/license/open-mmlab/mmdetection.svg)](https://github.com/open-mmlab/mmdetection/blob/master/LICENSE) +[![codecov](https://codecov.io/gh/open-mmlab/mmdetection/branch/main/graph/badge.svg)](https://codecov.io/gh/open-mmlab/mmdetection) +[![license](https://img.shields.io/github/license/open-mmlab/mmdetection.svg)](https://github.com/open-mmlab/mmdetection/blob/main/LICENSE) [![open issues](https://isitmaintained.com/badge/open/open-mmlab/mmdetection.svg)](https://github.com/open-mmlab/mmdetection/issues) [![issue resolution](https://isitmaintained.com/badge/resolution/open-mmlab/mmdetection.svg)](https://github.com/open-mmlab/mmdetection/issues) -[📘使用文档](https://mmdetection.readthedocs.io/zh_CN/3.x/) | -[🛠️安装教程](https://mmdetection.readthedocs.io/zh_CN/3.x/get_started.html) | -[👀模型库](https://mmdetection.readthedocs.io/zh_CN/3.x/model_zoo.html) | -[🆕更新日志](https://mmdetection.readthedocs.io/en/3.x/notes/changelog.html) | +[📘使用文档](https://mmdetection.readthedocs.io/zh_CN/latest/) | +[🛠️安装教程](https://mmdetection.readthedocs.io/zh_CN/latest/get_started.html) | +[👀模型库](https://mmdetection.readthedocs.io/zh_CN/latest/model_zoo.html) | +[🆕更新日志](https://mmdetection.readthedocs.io/en/latest/notes/changelog.html) | [🚀进行中的项目](https://github.com/open-mmlab/mmdetection/projects) | [🤔报告问题](https://github.com/open-mmlab/mmdetection/issues/new/choose) @@ -41,6 +41,26 @@ +
+ + + + + + + + + + + + + + + + + +
+ ## 简介 MMDetection 是一个基于 PyTorch 的目标检测开源工具箱。它是 [OpenMMLab](https://openmmlab.com/) 项目的一部分。 @@ -93,43 +113,40 @@ MMDetection 是一个基于 PyTorch 的目标检测开源工具箱。它是 [Ope -**v3.0.0rc6** 版本已经在 2023.2.24 发布: +**v3.0.0** 版本已经在 2023.4.6 发布: -- 支持了 [Boxinst](configs/boxinst), [Objects365 Dataset](configs/objects365) 和 [Separated and Occluded COCO metric](docs/zh_cn/user_guides/useful_tools.md#coco-分离和遮挡实例分割性能评估) -- 在 `Projects` 中支持了 [ConvNeXt-V2](projects/ConvNeXt-V2), [DiffusionDet](projects/DiffusionDet) 和 [EfficientDet](projects/EfficientDet), [Detic](projects/Detic) 的推理 -- 重构了 [DETR](configs/detr) 系列并支持了 [Conditional-DETR](configs/conditional_detr), [DAB-DETR](configs/dab_detr) 和 [DINO](configs/dino) -- 支持了通过 `DetInferencer` 用于推理, Test Time Augmentation 以及从注册表(registry)自动导入模块 -- 支持了 RTMDet-Ins 的 ONNXRuntime 和 TensorRT [部署](configs/rtmdet/README.md#deployment-tutorial) -- 支持了检测器[计算 FLOPS](docs/zh_cn/user_guides/useful_tools.md#模型复杂度) +- 发布 MMDetection 3.0.0 正式版 +- 基于 [Label-Studio](projects/LabelStudio) 支持半自动标注流程 +- projects 中支持了 [EfficientDet](projects/EfficientDet) ## 安装 -请参考[快速入门文档](https://mmdetection.readthedocs.io/zh_CN/3.x/get_started.html)进行安装。 +请参考[快速入门文档](https://mmdetection.readthedocs.io/zh_CN/latest/get_started.html)进行安装。 ## 教程 -请阅读[概述](https://mmdetection.readthedocs.io/zh_CN/3.x/get_started.html)对 MMDetection 进行初步的了解。 +请阅读[概述](https://mmdetection.readthedocs.io/zh_CN/latest/get_started.html)对 MMDetection 进行初步的了解。 -为了帮助用户更进一步了解 MMDetection,我们准备了用户指南和进阶指南,请阅读我们的[文档](https://mmdetection.readthedocs.io/zh_CN/3.x/): +为了帮助用户更进一步了解 MMDetection,我们准备了用户指南和进阶指南,请阅读我们的[文档](https://mmdetection.readthedocs.io/zh_CN/latest/): - 用户指南
- - [训练 & 测试](https://mmdetection.readthedocs.io/zh_CN/3.x/user_guides/index.html#train-test) - - [学习配置文件](https://mmdetection.readthedocs.io/zh_CN/3.x/user_guides/config.html) - - [使用已有模型在标准数据集上进行推理](https://mmdetection.readthedocs.io/en/3.x/user_guides/inference.html) - - [数据集准备](https://mmdetection.readthedocs.io/zh_CN/3.x/user_guides/dataset_prepare.html) - - [测试现有模型](https://mmdetection.readthedocs.io/zh_CN/3.x/user_guides/test.html) - - [在标准数据集上训练预定义的模型](https://mmdetection.readthedocs.io/zh_CN/3.x/user_guides/train.html) - - [在自定义数据集上进行训练](https://mmdetection.readthedocs.io/zh_CN/3.x/user_guides/train.html#train-with-customized-datasets) - - [在标准数据集上训练自定义模型](https://mmdetection.readthedocs.io/zh_CN/3.x/user_guides/new_model.html) - - [模型微调](https://mmdetection.readthedocs.io/zh_CN/3.x/user_guides/finetune.html) - - [提交测试结果](https://mmdetection.readthedocs.io/zh_CN/3.x/user_guides/test_results_submission.html) - - [权重初始化](https://mmdetection.readthedocs.io/zh_CN/3.x/user_guides/init_cfg.html) - - [将单阶段检测器作为 RPN](https://mmdetection.readthedocs.io/zh_CN/3.x/user_guides/single_stage_as_rpn.html) - - [半监督目标检测](https://mmdetection.readthedocs.io/zh_CN/3.x/user_guides/semi_det.html) - - [实用工具](https://mmdetection.readthedocs.io/zh_CN/3.x/user_guides/index.html#useful-tools) + - [训练 & 测试](https://mmdetection.readthedocs.io/zh_CN/latest/user_guides/index.html#train-test) + - [学习配置文件](https://mmdetection.readthedocs.io/zh_CN/latest/user_guides/config.html) + - [使用已有模型在标准数据集上进行推理](https://mmdetection.readthedocs.io/en/latest/user_guides/inference.html) + - [数据集准备](https://mmdetection.readthedocs.io/zh_CN/latest/user_guides/dataset_prepare.html) + - [测试现有模型](https://mmdetection.readthedocs.io/zh_CN/latest/user_guides/test.html) + - [在标准数据集上训练预定义的模型](https://mmdetection.readthedocs.io/zh_CN/latest/user_guides/train.html) + - [在自定义数据集上进行训练](https://mmdetection.readthedocs.io/zh_CN/latest/user_guides/train.html#train-with-customized-datasets) + - [在标准数据集上训练自定义模型](https://mmdetection.readthedocs.io/zh_CN/latest/user_guides/new_model.html) + - [模型微调](https://mmdetection.readthedocs.io/zh_CN/latest/user_guides/finetune.html) + - [提交测试结果](https://mmdetection.readthedocs.io/zh_CN/latest/user_guides/test_results_submission.html) + - [权重初始化](https://mmdetection.readthedocs.io/zh_CN/latest/user_guides/init_cfg.html) + - [将单阶段检测器作为 RPN](https://mmdetection.readthedocs.io/zh_CN/latest/user_guides/single_stage_as_rpn.html) + - [半监督目标检测](https://mmdetection.readthedocs.io/zh_CN/latest/user_guides/semi_det.html) + - [实用工具](https://mmdetection.readthedocs.io/zh_CN/latest/user_guides/index.html#useful-tools)
@@ -137,9 +154,9 @@ MMDetection 是一个基于 PyTorch 的目标检测开源工具箱。它是 [Ope
- - [基础概念](https://mmdetection.readthedocs.io/zh_CN/3.x/advanced_guides/index.html#basic-concepts) - - [组件定制](https://mmdetection.readthedocs.io/zh_CN/3.x/advanced_guides/index.html#component-customization) - - [How to](https://mmdetection.readthedocs.io/zh_CN/3.x/advanced_guides/index.html#how-to) + - [基础概念](https://mmdetection.readthedocs.io/zh_CN/latest/advanced_guides/index.html#basic-concepts) + - [组件定制](https://mmdetection.readthedocs.io/zh_CN/latest/advanced_guides/index.html#component-customization) + - [How to](https://mmdetection.readthedocs.io/zh_CN/latest/advanced_guides/index.html#how-to)
@@ -147,7 +164,7 @@ MMDetection 是一个基于 PyTorch 的目标检测开源工具箱。它是 [Ope 同时,我们还提供了 [MMDetection 中文解读文案汇总](docs/zh_cn/article.md) -若需要将2.x版本的代码迁移至新版,请参考[迁移文档](https://mmdetection.readthedocs.io/en/3.x/migration.html)。 +若需要将2.x版本的代码迁移至新版,请参考[迁移文档](https://mmdetection.readthedocs.io/en/latest/migration.html)。 ## 基准测试和模型库 diff --git a/configs/_base_/datasets/cityscapes_detection.py b/configs/_base_/datasets/cityscapes_detection.py index a037fb838fa..caeba6bfcd2 100644 --- a/configs/_base_/datasets/cityscapes_detection.py +++ b/configs/_base_/datasets/cityscapes_detection.py @@ -2,8 +2,23 @@ dataset_type = 'CityscapesDataset' data_root = 'data/cityscapes/' +# Example to use different file client +# Method 1: simply set the data root and let the file I/O module +# automatically infer from prefix (not support LMDB and Memcache yet) + +# data_root = 's3://openmmlab/datasets/segmentation/cityscapes/' + +# Method 2: Use `backend_args`, `file_client_args` in versions before 3.0.0rc6 +# backend_args = dict( +# backend='petrel', +# path_mapping=dict({ +# './data/': 's3://openmmlab/datasets/segmentation/', +# 'data/': 's3://openmmlab/datasets/segmentation/' +# })) +backend_args = None + train_pipeline = [ - dict(type='LoadImageFromFile'), + dict(type='LoadImageFromFile', backend_args=backend_args), dict(type='LoadAnnotations', with_bbox=True), dict( type='RandomResize', @@ -14,7 +29,7 @@ ] test_pipeline = [ - dict(type='LoadImageFromFile'), + dict(type='LoadImageFromFile', backend_args=backend_args), dict(type='Resize', scale=(2048, 1024), keep_ratio=True), # If you don't have a gt annotation, delete the pipeline dict(type='LoadAnnotations', with_bbox=True), @@ -39,7 +54,8 @@ ann_file='annotations/instancesonly_filtered_gtFine_train.json', data_prefix=dict(img='leftImg8bit/train/'), filter_cfg=dict(filter_empty_gt=True, min_size=32), - pipeline=train_pipeline))) + pipeline=train_pipeline, + backend_args=backend_args))) val_dataloader = dict( batch_size=1, @@ -54,13 +70,15 @@ data_prefix=dict(img='leftImg8bit/val/'), test_mode=True, filter_cfg=dict(filter_empty_gt=True, min_size=32), - pipeline=test_pipeline)) + pipeline=test_pipeline, + backend_args=backend_args)) test_dataloader = val_dataloader val_evaluator = dict( type='CocoMetric', ann_file=data_root + 'annotations/instancesonly_filtered_gtFine_val.json', - metric='bbox') + metric='bbox', + backend_args=backend_args) test_evaluator = val_evaluator diff --git a/configs/_base_/datasets/cityscapes_instance.py b/configs/_base_/datasets/cityscapes_instance.py index 0254af3f97a..136403136c6 100644 --- a/configs/_base_/datasets/cityscapes_instance.py +++ b/configs/_base_/datasets/cityscapes_instance.py @@ -2,8 +2,23 @@ dataset_type = 'CityscapesDataset' data_root = 'data/cityscapes/' +# Example to use different file client +# Method 1: simply set the data root and let the file I/O module +# automatically infer from prefix (not support LMDB and Memcache yet) + +# data_root = 's3://openmmlab/datasets/segmentation/cityscapes/' + +# Method 2: Use backend_args, file_client_args in versions before 3.0.0rc6 +# backend_args = dict( +# backend='petrel', +# path_mapping=dict({ +# './data/': 's3://openmmlab/datasets/segmentation/', +# 'data/': 's3://openmmlab/datasets/segmentation/' +# })) +backend_args = None + train_pipeline = [ - dict(type='LoadImageFromFile'), + dict(type='LoadImageFromFile', backend_args=backend_args), dict(type='LoadAnnotations', with_bbox=True, with_mask=True), dict( type='RandomResize', @@ -14,7 +29,7 @@ ] test_pipeline = [ - dict(type='LoadImageFromFile'), + dict(type='LoadImageFromFile', backend_args=backend_args), dict(type='Resize', scale=(2048, 1024), keep_ratio=True), # If you don't have a gt annotation, delete the pipeline dict(type='LoadAnnotations', with_bbox=True, with_mask=True), @@ -39,7 +54,8 @@ ann_file='annotations/instancesonly_filtered_gtFine_train.json', data_prefix=dict(img='leftImg8bit/train/'), filter_cfg=dict(filter_empty_gt=True, min_size=32), - pipeline=train_pipeline))) + pipeline=train_pipeline, + backend_args=backend_args))) val_dataloader = dict( batch_size=1, @@ -54,7 +70,8 @@ data_prefix=dict(img='leftImg8bit/val/'), test_mode=True, filter_cfg=dict(filter_empty_gt=True, min_size=32), - pipeline=test_pipeline)) + pipeline=test_pipeline, + backend_args=backend_args)) test_dataloader = val_dataloader @@ -63,13 +80,13 @@ type='CocoMetric', ann_file=data_root + 'annotations/instancesonly_filtered_gtFine_val.json', - metric=['bbox', 'segm']), + metric=['bbox', 'segm'], + backend_args=backend_args), dict( type='CityScapesMetric', - ann_file=data_root + - 'annotations/instancesonly_filtered_gtFine_val.json', - seg_prefix=data_root + '/gtFine/val', - outfile_prefix='./work_dirs/cityscapes_metric/instance') + seg_prefix=data_root + 'gtFine/val', + outfile_prefix='./work_dirs/cityscapes_metric/instance', + backend_args=backend_args) ] test_evaluator = val_evaluator diff --git a/configs/_base_/datasets/coco_detection.py b/configs/_base_/datasets/coco_detection.py index fcd9859f135..fdf8dfad947 100644 --- a/configs/_base_/datasets/coco_detection.py +++ b/configs/_base_/datasets/coco_detection.py @@ -2,23 +2,30 @@ dataset_type = 'CocoDataset' data_root = 'data/coco/' -# file_client_args = dict( +# Example to use different file client +# Method 1: simply set the data root and let the file I/O module +# automatically infer from prefix (not support LMDB and Memcache yet) + +# data_root = 's3://openmmlab/datasets/detection/coco/' + +# Method 2: Use `backend_args`, `file_client_args` in versions before 3.0.0rc6 +# backend_args = dict( # backend='petrel', # path_mapping=dict({ # './data/': 's3://openmmlab/datasets/detection/', # 'data/': 's3://openmmlab/datasets/detection/' # })) -file_client_args = dict(backend='disk') +backend_args = None train_pipeline = [ - dict(type='LoadImageFromFile', file_client_args=file_client_args), + dict(type='LoadImageFromFile', backend_args=backend_args), dict(type='LoadAnnotations', with_bbox=True), dict(type='Resize', scale=(1333, 800), keep_ratio=True), dict(type='RandomFlip', prob=0.5), dict(type='PackDetInputs') ] test_pipeline = [ - dict(type='LoadImageFromFile', file_client_args=file_client_args), + dict(type='LoadImageFromFile', backend_args=backend_args), dict(type='Resize', scale=(1333, 800), keep_ratio=True), # If you don't have a gt annotation, delete the pipeline dict(type='LoadAnnotations', with_bbox=True), @@ -39,7 +46,8 @@ ann_file='annotations/instances_train2017.json', data_prefix=dict(img='train2017/'), filter_cfg=dict(filter_empty_gt=True, min_size=32), - pipeline=train_pipeline)) + pipeline=train_pipeline, + backend_args=backend_args)) val_dataloader = dict( batch_size=1, num_workers=2, @@ -52,14 +60,16 @@ ann_file='annotations/instances_val2017.json', data_prefix=dict(img='val2017/'), test_mode=True, - pipeline=test_pipeline)) + pipeline=test_pipeline, + backend_args=backend_args)) test_dataloader = val_dataloader val_evaluator = dict( type='CocoMetric', ann_file=data_root + 'annotations/instances_val2017.json', metric='bbox', - format_only=False) + format_only=False, + backend_args=backend_args) test_evaluator = val_evaluator # inference on test dataset and diff --git a/configs/_base_/datasets/coco_instance.py b/configs/_base_/datasets/coco_instance.py index 878d8b4915e..e91cb354038 100644 --- a/configs/_base_/datasets/coco_instance.py +++ b/configs/_base_/datasets/coco_instance.py @@ -2,23 +2,30 @@ dataset_type = 'CocoDataset' data_root = 'data/coco/' -# file_client_args = dict( +# Example to use different file client +# Method 1: simply set the data root and let the file I/O module +# automatically infer from prefix (not support LMDB and Memcache yet) + +# data_root = 's3://openmmlab/datasets/detection/coco/' + +# Method 2: Use `backend_args`, `file_client_args` in versions before 3.0.0rc6 +# backend_args = dict( # backend='petrel', # path_mapping=dict({ # './data/': 's3://openmmlab/datasets/detection/', # 'data/': 's3://openmmlab/datasets/detection/' # })) -file_client_args = dict(backend='disk') +backend_args = None train_pipeline = [ - dict(type='LoadImageFromFile', file_client_args=file_client_args), + dict(type='LoadImageFromFile', backend_args=backend_args), dict(type='LoadAnnotations', with_bbox=True, with_mask=True), dict(type='Resize', scale=(1333, 800), keep_ratio=True), dict(type='RandomFlip', prob=0.5), dict(type='PackDetInputs') ] test_pipeline = [ - dict(type='LoadImageFromFile', file_client_args=file_client_args), + dict(type='LoadImageFromFile', backend_args=backend_args), dict(type='Resize', scale=(1333, 800), keep_ratio=True), # If you don't have a gt annotation, delete the pipeline dict(type='LoadAnnotations', with_bbox=True, with_mask=True), @@ -39,7 +46,8 @@ ann_file='annotations/instances_train2017.json', data_prefix=dict(img='train2017/'), filter_cfg=dict(filter_empty_gt=True, min_size=32), - pipeline=train_pipeline)) + pipeline=train_pipeline, + backend_args=backend_args)) val_dataloader = dict( batch_size=1, num_workers=2, @@ -52,14 +60,16 @@ ann_file='annotations/instances_val2017.json', data_prefix=dict(img='val2017/'), test_mode=True, - pipeline=test_pipeline)) + pipeline=test_pipeline, + backend_args=backend_args)) test_dataloader = val_dataloader val_evaluator = dict( type='CocoMetric', ann_file=data_root + 'annotations/instances_val2017.json', metric=['bbox', 'segm'], - format_only=False) + format_only=False, + backend_args=backend_args) test_evaluator = val_evaluator # inference on test dataset and diff --git a/configs/_base_/datasets/coco_instance_semantic.py b/configs/_base_/datasets/coco_instance_semantic.py index 12652d02c6b..cc961863306 100644 --- a/configs/_base_/datasets/coco_instance_semantic.py +++ b/configs/_base_/datasets/coco_instance_semantic.py @@ -2,16 +2,23 @@ dataset_type = 'CocoDataset' data_root = 'data/coco/' -# file_client_args = dict( +# Example to use different file client +# Method 1: simply set the data root and let the file I/O module +# automatically infer from prefix (not support LMDB and Memcache yet) + +# data_root = 's3://openmmlab/datasets/detection/coco/' + +# Method 2: Use `backend_args`, `file_client_args` in versions before 3.0.0rc6 +# backend_args = dict( # backend='petrel', # path_mapping=dict({ # './data/': 's3://openmmlab/datasets/detection/', # 'data/': 's3://openmmlab/datasets/detection/' # })) -file_client_args = dict(backend='disk') +backend_args = None train_pipeline = [ - dict(type='LoadImageFromFile', file_client_args=file_client_args), + dict(type='LoadImageFromFile', backend_args=backend_args), dict( type='LoadAnnotations', with_bbox=True, with_mask=True, with_seg=True), dict(type='Resize', scale=(1333, 800), keep_ratio=True), @@ -19,7 +26,7 @@ dict(type='PackDetInputs') ] test_pipeline = [ - dict(type='LoadImageFromFile', file_client_args=file_client_args), + dict(type='LoadImageFromFile', backend_args=backend_args), dict(type='Resize', scale=(1333, 800), keep_ratio=True), # If you don't have a gt annotation, delete the pipeline dict( @@ -42,7 +49,8 @@ ann_file='annotations/instances_train2017.json', data_prefix=dict(img='train2017/', seg='stuffthingmaps/train2017/'), filter_cfg=dict(filter_empty_gt=True, min_size=32), - pipeline=train_pipeline)) + pipeline=train_pipeline, + backend_args=backend_args)) val_dataloader = dict( batch_size=1, @@ -56,7 +64,8 @@ ann_file='annotations/instances_val2017.json', data_prefix=dict(img='val2017/'), test_mode=True, - pipeline=test_pipeline)) + pipeline=test_pipeline, + backend_args=backend_args)) test_dataloader = val_dataloader @@ -64,5 +73,6 @@ type='CocoMetric', ann_file=data_root + 'annotations/instances_val2017.json', metric=['bbox', 'segm'], - format_only=False) + format_only=False, + backend_args=backend_args) test_evaluator = val_evaluator diff --git a/configs/_base_/datasets/coco_panoptic.py b/configs/_base_/datasets/coco_panoptic.py index 021d80b2807..2d75660f4b4 100644 --- a/configs/_base_/datasets/coco_panoptic.py +++ b/configs/_base_/datasets/coco_panoptic.py @@ -1,26 +1,33 @@ # dataset settings dataset_type = 'CocoPanopticDataset' -data_root = 'data/coco/' +# data_root = 'data/coco/' -# file_client_args = dict( +# Example to use different file client +# Method 1: simply set the data root and let the file I/O module +# automatically infer from prefix (not support LMDB and Memcache yet) + +data_root = 's3://openmmlab/datasets/detection/coco/' + +# Method 2: Use `backend_args`, `file_client_args` in versions before 3.0.0rc6 +# backend_args = dict( # backend='petrel', # path_mapping=dict({ # './data/': 's3://openmmlab/datasets/detection/', # 'data/': 's3://openmmlab/datasets/detection/' # })) -file_client_args = dict(backend='disk') +backend_args = None train_pipeline = [ - dict(type='LoadImageFromFile', file_client_args=file_client_args), - dict(type='LoadPanopticAnnotations', file_client_args=file_client_args), + dict(type='LoadImageFromFile', backend_args=backend_args), + dict(type='LoadPanopticAnnotations', backend_args=backend_args), dict(type='Resize', scale=(1333, 800), keep_ratio=True), dict(type='RandomFlip', prob=0.5), dict(type='PackDetInputs') ] test_pipeline = [ - dict(type='LoadImageFromFile', file_client_args=file_client_args), + dict(type='LoadImageFromFile', backend_args=backend_args), dict(type='Resize', scale=(1333, 800), keep_ratio=True), - dict(type='LoadPanopticAnnotations', file_client_args=file_client_args), + dict(type='LoadPanopticAnnotations', backend_args=backend_args), dict( type='PackDetInputs', meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape', @@ -40,7 +47,8 @@ data_prefix=dict( img='train2017/', seg='annotations/panoptic_train2017/'), filter_cfg=dict(filter_empty_gt=True, min_size=32), - pipeline=train_pipeline)) + pipeline=train_pipeline, + backend_args=backend_args)) val_dataloader = dict( batch_size=1, num_workers=2, @@ -53,15 +61,15 @@ ann_file='annotations/panoptic_val2017.json', data_prefix=dict(img='val2017/', seg='annotations/panoptic_val2017/'), test_mode=True, - pipeline=test_pipeline)) + pipeline=test_pipeline, + backend_args=backend_args)) test_dataloader = val_dataloader val_evaluator = dict( type='CocoPanopticMetric', ann_file=data_root + 'annotations/panoptic_val2017.json', seg_prefix=data_root + 'annotations/panoptic_val2017/', - file_client_args=file_client_args, -) + backend_args=backend_args) test_evaluator = val_evaluator # inference on test dataset and diff --git a/configs/_base_/datasets/deepfashion.py b/configs/_base_/datasets/deepfashion.py index bb70eeed7d0..a93dc7152f7 100644 --- a/configs/_base_/datasets/deepfashion.py +++ b/configs/_base_/datasets/deepfashion.py @@ -2,23 +2,30 @@ dataset_type = 'DeepFashionDataset' data_root = 'data/DeepFashion/In-shop/' -# file_client_args = dict( +# Example to use different file client +# Method 1: simply set the data root and let the file I/O module +# automatically infer from prefix (not support LMDB and Memcache yet) + +# data_root = 's3://openmmlab/datasets/detection/coco/' + +# Method 2: Use `backend_args`, `file_client_args` in versions before 3.0.0rc6 +# backend_args = dict( # backend='petrel', # path_mapping=dict({ # './data/': 's3://openmmlab/datasets/detection/', # 'data/': 's3://openmmlab/datasets/detection/' # })) -file_client_args = dict(backend='disk') +backend_args = None train_pipeline = [ - dict(type='LoadImageFromFile', file_client_args=file_client_args), + dict(type='LoadImageFromFile', backend_args=backend_args), dict(type='LoadAnnotations', with_bbox=True, with_mask=True), dict(type='Resize', scale=(750, 1101), keep_ratio=True), dict(type='RandomFlip', prob=0.5), dict(type='PackDetInputs') ] test_pipeline = [ - dict(type='LoadImageFromFile', file_client_args=file_client_args), + dict(type='LoadImageFromFile', backend_args=backend_args), dict(type='Resize', scale=(750, 1101), keep_ratio=True), dict(type='LoadAnnotations', with_bbox=True, with_mask=True), dict( @@ -41,7 +48,8 @@ ann_file='Anno/segmentation/DeepFashion_segmentation_train.json', data_prefix=dict(img='Img/'), filter_cfg=dict(filter_empty_gt=True, min_size=32), - pipeline=train_pipeline))) + pipeline=train_pipeline, + backend_args=backend_args))) val_dataloader = dict( batch_size=1, num_workers=2, @@ -54,7 +62,8 @@ ann_file='Anno/segmentation/DeepFashion_segmentation_query.json', data_prefix=dict(img='Img/'), test_mode=True, - pipeline=test_pipeline)) + pipeline=test_pipeline, + backend_args=backend_args)) test_dataloader = dict( batch_size=1, num_workers=2, @@ -67,17 +76,20 @@ ann_file='Anno/segmentation/DeepFashion_segmentation_gallery.json', data_prefix=dict(img='Img/'), test_mode=True, - pipeline=test_pipeline)) + pipeline=test_pipeline, + backend_args=backend_args)) val_evaluator = dict( type='CocoMetric', ann_file=data_root + 'Anno/segmentation/DeepFashion_segmentation_query.json', metric=['bbox', 'segm'], - format_only=False) + format_only=False, + backend_args=backend_args) test_evaluator = dict( type='CocoMetric', ann_file=data_root + 'Anno/segmentation/DeepFashion_segmentation_gallery.json', metric=['bbox', 'segm'], - format_only=False) + format_only=False, + backend_args=backend_args) diff --git a/configs/_base_/datasets/lvis_v0.5_instance.py b/configs/_base_/datasets/lvis_v0.5_instance.py index f8f65f2b5e8..d0ca44efb6d 100644 --- a/configs/_base_/datasets/lvis_v0.5_instance.py +++ b/configs/_base_/datasets/lvis_v0.5_instance.py @@ -2,16 +2,23 @@ dataset_type = 'LVISV05Dataset' data_root = 'data/lvis_v0.5/' -# file_client_args = dict( +# Example to use different file client +# Method 1: simply set the data root and let the file I/O module +# automatically infer from prefix (not support LMDB and Memcache yet) + +# data_root = 's3://openmmlab/datasets/detection/lvis_v0.5/' + +# Method 2: Use `backend_args`, `file_client_args` in versions before 3.0.0rc6 +# backend_args = dict( # backend='petrel', # path_mapping=dict({ # './data/': 's3://openmmlab/datasets/detection/', # 'data/': 's3://openmmlab/datasets/detection/' # })) -file_client_args = dict(backend='disk') +backend_args = None train_pipeline = [ - dict(type='LoadImageFromFile', file_client_args=file_client_args), + dict(type='LoadImageFromFile', backend_args=backend_args), dict(type='LoadAnnotations', with_bbox=True, with_mask=True), dict( type='RandomChoiceResize', @@ -22,7 +29,7 @@ dict(type='PackDetInputs') ] test_pipeline = [ - dict(type='LoadImageFromFile', file_client_args=file_client_args), + dict(type='LoadImageFromFile', backend_args=backend_args), dict(type='Resize', scale=(1333, 800), keep_ratio=True), dict(type='LoadAnnotations', with_bbox=True, with_mask=True), dict( @@ -46,7 +53,8 @@ ann_file='annotations/lvis_v0.5_train.json', data_prefix=dict(img='train2017/'), filter_cfg=dict(filter_empty_gt=True, min_size=32), - pipeline=train_pipeline))) + pipeline=train_pipeline, + backend_args=backend_args))) val_dataloader = dict( batch_size=1, num_workers=2, @@ -59,11 +67,13 @@ ann_file='annotations/lvis_v0.5_val.json', data_prefix=dict(img='val2017/'), test_mode=True, - pipeline=test_pipeline)) + pipeline=test_pipeline, + backend_args=backend_args)) test_dataloader = val_dataloader val_evaluator = dict( type='LVISMetric', ann_file=data_root + 'annotations/lvis_v0.5_val.json', - metric=['bbox', 'segm']) + metric=['bbox', 'segm'], + backend_args=backend_args) test_evaluator = val_evaluator diff --git a/configs/_base_/datasets/objects365v1_detection.py b/configs/_base_/datasets/objects365v1_detection.py index 7112f67c338..ee398698608 100644 --- a/configs/_base_/datasets/objects365v1_detection.py +++ b/configs/_base_/datasets/objects365v1_detection.py @@ -2,23 +2,30 @@ dataset_type = 'Objects365V1Dataset' data_root = 'data/Objects365/Obj365_v1/' -# file_client_args = dict( +# Example to use different file client +# Method 1: simply set the data root and let the file I/O module +# automatically infer from prefix (not support LMDB and Memcache yet) + +# data_root = 's3://openmmlab/datasets/detection/coco/' + +# Method 2: Use `backend_args`, `file_client_args` in versions before 3.0.0rc6 +# backend_args = dict( # backend='petrel', # path_mapping=dict({ # './data/': 's3://openmmlab/datasets/detection/', # 'data/': 's3://openmmlab/datasets/detection/' # })) -file_client_args = dict(backend='disk') +backend_args = None train_pipeline = [ - dict(type='LoadImageFromFile', file_client_args=file_client_args), + dict(type='LoadImageFromFile', backend_args=backend_args), dict(type='LoadAnnotations', with_bbox=True), dict(type='Resize', scale=(1333, 800), keep_ratio=True), dict(type='RandomFlip', prob=0.5), dict(type='PackDetInputs') ] test_pipeline = [ - dict(type='LoadImageFromFile', file_client_args=file_client_args), + dict(type='LoadImageFromFile', backend_args=backend_args), dict(type='Resize', scale=(1333, 800), keep_ratio=True), # If you don't have a gt annotation, delete the pipeline dict(type='LoadAnnotations', with_bbox=True), @@ -39,7 +46,8 @@ ann_file='annotations/objects365_train.json', data_prefix=dict(img='train/'), filter_cfg=dict(filter_empty_gt=True, min_size=32), - pipeline=train_pipeline)) + pipeline=train_pipeline, + backend_args=backend_args)) val_dataloader = dict( batch_size=1, num_workers=2, @@ -52,7 +60,8 @@ ann_file='annotations/objects365_val.json', data_prefix=dict(img='val/'), test_mode=True, - pipeline=test_pipeline)) + pipeline=test_pipeline, + backend_args=backend_args)) test_dataloader = val_dataloader val_evaluator = dict( @@ -60,5 +69,6 @@ ann_file=data_root + 'annotations/objects365_val.json', metric='bbox', sort_categories=True, - format_only=False) + format_only=False, + backend_args=backend_args) test_evaluator = val_evaluator diff --git a/configs/_base_/datasets/objects365v2_detection.py b/configs/_base_/datasets/objects365v2_detection.py index 017d8c01a62..b25a7ba901b 100644 --- a/configs/_base_/datasets/objects365v2_detection.py +++ b/configs/_base_/datasets/objects365v2_detection.py @@ -2,23 +2,30 @@ dataset_type = 'Objects365V2Dataset' data_root = 'data/Objects365/Obj365_v2/' -# file_client_args = dict( +# Example to use different file client +# Method 1: simply set the data root and let the file I/O module +# automatically infer from prefix (not support LMDB and Memcache yet) + +# data_root = 's3://openmmlab/datasets/detection/coco/' + +# Method 2: Use `backend_args`, `file_client_args` in versions before 3.0.0rc6 +# backend_args = dict( # backend='petrel', # path_mapping=dict({ # './data/': 's3://openmmlab/datasets/detection/', # 'data/': 's3://openmmlab/datasets/detection/' # })) -file_client_args = dict(backend='disk') +backend_args = None train_pipeline = [ - dict(type='LoadImageFromFile', file_client_args=file_client_args), + dict(type='LoadImageFromFile', backend_args=backend_args), dict(type='LoadAnnotations', with_bbox=True), dict(type='Resize', scale=(1333, 800), keep_ratio=True), dict(type='RandomFlip', prob=0.5), dict(type='PackDetInputs') ] test_pipeline = [ - dict(type='LoadImageFromFile', file_client_args=file_client_args), + dict(type='LoadImageFromFile', backend_args=backend_args), dict(type='Resize', scale=(1333, 800), keep_ratio=True), # If you don't have a gt annotation, delete the pipeline dict(type='LoadAnnotations', with_bbox=True), @@ -39,7 +46,8 @@ ann_file='annotations/zhiyuan_objv2_train.json', data_prefix=dict(img='train/'), filter_cfg=dict(filter_empty_gt=True, min_size=32), - pipeline=train_pipeline)) + pipeline=train_pipeline, + backend_args=backend_args)) val_dataloader = dict( batch_size=1, num_workers=2, @@ -52,12 +60,14 @@ ann_file='annotations/zhiyuan_objv2_val.json', data_prefix=dict(img='val/'), test_mode=True, - pipeline=test_pipeline)) + pipeline=test_pipeline, + backend_args=backend_args)) test_dataloader = val_dataloader val_evaluator = dict( type='CocoMetric', ann_file=data_root + 'annotations/zhiyuan_objv2_val.json', metric='bbox', - format_only=False) + format_only=False, + backend_args=backend_args) test_evaluator = val_evaluator diff --git a/configs/_base_/datasets/openimages_detection.py b/configs/_base_/datasets/openimages_detection.py index 9d99fb27800..129661b405c 100644 --- a/configs/_base_/datasets/openimages_detection.py +++ b/configs/_base_/datasets/openimages_detection.py @@ -2,24 +2,30 @@ dataset_type = 'OpenImagesDataset' data_root = 'data/OpenImages/' -# file_client_args = dict( +# Example to use different file client +# Method 1: simply set the data root and let the file I/O module +# automatically infer from prefix (not support LMDB and Memcache yet) + +# data_root = 's3://openmmlab/datasets/detection/coco/' + +# Method 2: Use `backend_args`, `file_client_args` in versions before 3.0.0rc6 +# backend_args = dict( # backend='petrel', # path_mapping=dict({ # './data/': 's3://openmmlab/datasets/detection/', # 'data/': 's3://openmmlab/datasets/detection/' # })) - -file_client_args = dict(backend='disk') +backend_args = None train_pipeline = [ - dict(type='LoadImageFromFile', file_client_args=file_client_args), + dict(type='LoadImageFromFile', backend_args=backend_args), dict(type='LoadAnnotations', with_bbox=True), dict(type='Resize', scale=(1024, 800), keep_ratio=True), dict(type='RandomFlip', prob=0.5), dict(type='PackDetInputs') ] test_pipeline = [ - dict(type='LoadImageFromFile', file_client_args=file_client_args), + dict(type='LoadImageFromFile', backend_args=backend_args), dict(type='Resize', scale=(1024, 800), keep_ratio=True), # avoid bboxes being resized dict(type='LoadAnnotations', with_bbox=True), @@ -44,7 +50,8 @@ label_file='annotations/class-descriptions-boxable.csv', hierarchy_file='annotations/bbox_labels_600_hierarchy.json', meta_file='annotations/train-image-metas.pkl', - pipeline=train_pipeline)) + pipeline=train_pipeline, + backend_args=backend_args)) val_dataloader = dict( batch_size=1, num_workers=0, @@ -61,7 +68,8 @@ meta_file='annotations/validation-image-metas.pkl', image_level_ann_file='annotations/validation-' 'annotations-human-imagelabels-boxable.csv', - pipeline=test_pipeline)) + pipeline=test_pipeline, + backend_args=backend_args)) test_dataloader = val_dataloader val_evaluator = dict( diff --git a/configs/_base_/datasets/semi_coco_detection.py b/configs/_base_/datasets/semi_coco_detection.py index 02b729804a2..694f25f841e 100644 --- a/configs/_base_/datasets/semi_coco_detection.py +++ b/configs/_base_/datasets/semi_coco_detection.py @@ -2,13 +2,20 @@ dataset_type = 'CocoDataset' data_root = 'data/coco/' -# file_client_args = dict( +# Example to use different file client +# Method 1: simply set the data root and let the file I/O module +# automatically infer from prefix (not support LMDB and Memcache yet) + +# data_root = 's3://openmmlab/datasets/detection/coco/' + +# Method 2: Use `backend_args`, `file_client_args` in versions before 3.0.0rc6 +# backend_args = dict( # backend='petrel', # path_mapping=dict({ # './data/': 's3://openmmlab/datasets/detection/', # 'data/': 's3://openmmlab/datasets/detection/' # })) -file_client_args = dict(backend='disk') +backend_args = None color_space = [ [dict(type='ColorTransform')], @@ -36,7 +43,7 @@ # pipeline used to augment labeled data, # which will be sent to student model for supervised training. sup_pipeline = [ - dict(type='LoadImageFromFile', file_client_args=file_client_args), + dict(type='LoadImageFromFile', backend_args=backend_args), dict(type='LoadAnnotations', with_bbox=True), dict(type='RandomResize', scale=scale, keep_ratio=True), dict(type='RandomFlip', prob=0.5), @@ -82,7 +89,7 @@ # pipeline used to augment unlabeled data into different views unsup_pipeline = [ - dict(type='LoadImageFromFile', file_client_args=file_client_args), + dict(type='LoadImageFromFile', backend_args=backend_args), dict(type='LoadEmptyAnnotations'), dict( type='MultiBranch', @@ -93,7 +100,7 @@ ] test_pipeline = [ - dict(type='LoadImageFromFile', file_client_args=file_client_args), + dict(type='LoadImageFromFile', backend_args=backend_args), dict(type='Resize', scale=(1333, 800), keep_ratio=True), dict( type='PackDetInputs', @@ -122,7 +129,8 @@ ann_file='annotations/instances_train2017.json', data_prefix=dict(img='train2017/'), filter_cfg=dict(filter_empty_gt=True, min_size=32), - pipeline=sup_pipeline) + pipeline=sup_pipeline, + backend_args=backend_args) unlabeled_dataset = dict( type=dataset_type, @@ -130,7 +138,8 @@ ann_file='annotations/instances_unlabeled2017.json', data_prefix=dict(img='unlabeled2017/'), filter_cfg=dict(filter_empty_gt=False), - pipeline=unsup_pipeline) + pipeline=unsup_pipeline, + backend_args=backend_args) train_dataloader = dict( batch_size=batch_size, @@ -155,7 +164,8 @@ ann_file='annotations/instances_val2017.json', data_prefix=dict(img='val2017/'), test_mode=True, - pipeline=test_pipeline)) + pipeline=test_pipeline, + backend_args=backend_args)) test_dataloader = val_dataloader @@ -163,5 +173,6 @@ type='CocoMetric', ann_file=data_root + 'annotations/instances_val2017.json', metric='bbox', - format_only=False) + format_only=False, + backend_args=backend_args) test_evaluator = val_evaluator diff --git a/configs/_base_/datasets/voc0712.py b/configs/_base_/datasets/voc0712.py index 34330e40400..47f5e6563b7 100644 --- a/configs/_base_/datasets/voc0712.py +++ b/configs/_base_/datasets/voc0712.py @@ -2,23 +2,30 @@ dataset_type = 'VOCDataset' data_root = 'data/VOCdevkit/' -# file_client_args = dict( +# Example to use different file client +# Method 1: simply set the data root and let the file I/O module +# automatically Infer from prefix (not support LMDB and Memcache yet) + +# data_root = 's3://openmmlab/datasets/detection/segmentation/VOCdevkit/' + +# Method 2: Use `backend_args`, `file_client_args` in versions before 3.0.0rc6 +# backend_args = dict( # backend='petrel', # path_mapping=dict({ -# './data/': 's3://openmmlab/datasets/detection/', -# 'data/': 's3://openmmlab/datasets/detection/' +# './data/': 's3://openmmlab/datasets/segmentation/', +# 'data/': 's3://openmmlab/datasets/segmentation/' # })) -file_client_args = dict(backend='disk') +backend_args = None train_pipeline = [ - dict(type='LoadImageFromFile', file_client_args=file_client_args), + dict(type='LoadImageFromFile', backend_args=backend_args), dict(type='LoadAnnotations', with_bbox=True), dict(type='Resize', scale=(1000, 600), keep_ratio=True), dict(type='RandomFlip', prob=0.5), dict(type='PackDetInputs') ] test_pipeline = [ - dict(type='LoadImageFromFile', file_client_args=file_client_args), + dict(type='LoadImageFromFile', backend_args=backend_args), dict(type='Resize', scale=(1000, 600), keep_ratio=True), # avoid bboxes being resized dict(type='LoadAnnotations', with_bbox=True), @@ -50,7 +57,8 @@ data_prefix=dict(sub_data_root='VOC2007/'), filter_cfg=dict( filter_empty_gt=True, min_size=32, bbox_min_size=32), - pipeline=train_pipeline), + pipeline=train_pipeline, + backend_args=backend_args), dict( type=dataset_type, data_root=data_root, @@ -58,7 +66,8 @@ data_prefix=dict(sub_data_root='VOC2012/'), filter_cfg=dict( filter_empty_gt=True, min_size=32, bbox_min_size=32), - pipeline=train_pipeline) + pipeline=train_pipeline, + backend_args=backend_args) ]))) val_dataloader = dict( @@ -73,7 +82,8 @@ ann_file='VOC2007/ImageSets/Main/test.txt', data_prefix=dict(sub_data_root='VOC2007/'), test_mode=True, - pipeline=test_pipeline)) + pipeline=test_pipeline, + backend_args=backend_args)) test_dataloader = val_dataloader # Pascal VOC2007 uses `11points` as default evaluate mode, while PASCAL diff --git a/configs/_base_/datasets/wider_face.py b/configs/_base_/datasets/wider_face.py index d1d649be42b..7042bc46e87 100644 --- a/configs/_base_/datasets/wider_face.py +++ b/configs/_base_/datasets/wider_face.py @@ -1,63 +1,73 @@ # dataset settings dataset_type = 'WIDERFaceDataset' data_root = 'data/WIDERFace/' -img_norm_cfg = dict(mean=[123.675, 116.28, 103.53], std=[1, 1, 1], to_rgb=True) +# Example to use different file client +# Method 1: simply set the data root and let the file I/O module +# automatically infer from prefix (not support LMDB and Memcache yet) + +# data_root = 's3://openmmlab/datasets/detection/cityscapes/' + +# Method 2: Use `backend_args`, `file_client_args` in versions before 3.0.0rc6 +# backend_args = dict( +# backend='petrel', +# path_mapping=dict({ +# './data/': 's3://openmmlab/datasets/detection/', +# 'data/': 's3://openmmlab/datasets/detection/' +# })) +backend_args = None + +img_scale = (640, 640) # VGA resolution + train_pipeline = [ - dict(type='LoadImageFromFile', to_float32=True), + dict(type='LoadImageFromFile', backend_args=backend_args), dict(type='LoadAnnotations', with_bbox=True), - dict( - type='PhotoMetricDistortion', - brightness_delta=32, - contrast_range=(0.5, 1.5), - saturation_range=(0.5, 1.5), - hue_delta=18), - dict( - type='Expand', - mean=img_norm_cfg['mean'], - to_rgb=img_norm_cfg['to_rgb'], - ratio_range=(1, 4)), - dict( - type='MinIoURandomCrop', - min_ious=(0.1, 0.3, 0.5, 0.7, 0.9), - min_crop_size=0.3), - dict(type='Resize', img_scale=(300, 300), keep_ratio=False), - dict(type='Normalize', **img_norm_cfg), - dict(type='RandomFlip', flip_ratio=0.5), - dict(type='DefaultFormatBundle'), - dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels']), + dict(type='Resize', scale=img_scale, keep_ratio=True), + dict(type='RandomFlip', prob=0.5), + dict(type='PackDetInputs') ] test_pipeline = [ - dict(type='LoadImageFromFile'), + dict(type='LoadImageFromFile', backend_args=backend_args), + dict(type='Resize', scale=img_scale, keep_ratio=True), + dict(type='LoadAnnotations', with_bbox=True), dict( - type='MultiScaleFlipAug', - img_scale=(300, 300), - flip=False, - transforms=[ - dict(type='Resize', keep_ratio=False), - dict(type='Normalize', **img_norm_cfg), - dict(type='ImageToTensor', keys=['img']), - dict(type='Collect', keys=['img']), - ]) + type='PackDetInputs', + meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape', + 'scale_factor')) ] -data = dict( - samples_per_gpu=60, - workers_per_gpu=2, - train=dict( - type='RepeatDataset', - times=2, - dataset=dict( - type=dataset_type, - ann_file=data_root + 'train.txt', - img_prefix=data_root + 'WIDER_train/', - min_size=17, - pipeline=train_pipeline)), - val=dict( + +train_dataloader = dict( + batch_size=2, + num_workers=2, + persistent_workers=True, + drop_last=False, + sampler=dict(type='DefaultSampler', shuffle=True), + batch_sampler=dict(type='AspectRatioBatchSampler'), + dataset=dict( type=dataset_type, - ann_file=data_root + 'val.txt', - img_prefix=data_root + 'WIDER_val/', - pipeline=test_pipeline), - test=dict( + data_root=data_root, + ann_file='train.txt', + data_prefix=dict(img='WIDER_train'), + filter_cfg=dict(filter_empty_gt=True, bbox_min_size=17, min_size=32), + pipeline=train_pipeline)) + +val_dataloader = dict( + batch_size=1, + num_workers=2, + persistent_workers=True, + drop_last=False, + sampler=dict(type='DefaultSampler', shuffle=False), + dataset=dict( type=dataset_type, - ann_file=data_root + 'val.txt', - img_prefix=data_root + 'WIDER_val/', + data_root=data_root, + ann_file='val.txt', + data_prefix=dict(img='WIDER_val'), + test_mode=True, pipeline=test_pipeline)) +test_dataloader = val_dataloader + +val_evaluator = dict( + # TODO: support WiderFace-Evaluation for easy, medium, hard cases + type='VOCMetric', + metric='mAP', + eval_mode='11points') +test_evaluator = val_evaluator diff --git a/configs/albu_example/mask-rcnn_r50_fpn_albu-1x_coco.py b/configs/albu_example/mask-rcnn_r50_fpn_albu-1x_coco.py index 8a797d41fe5..b8a2780e99b 100644 --- a/configs/albu_example/mask-rcnn_r50_fpn_albu-1x_coco.py +++ b/configs/albu_example/mask-rcnn_r50_fpn_albu-1x_coco.py @@ -41,9 +41,7 @@ p=0.1), ] train_pipeline = [ - dict( - type='LoadImageFromFile', - file_client_args={{_base_.file_client_args}}), + dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), dict(type='LoadAnnotations', with_bbox=True, with_mask=True), dict(type='Resize', scale=(1333, 800), keep_ratio=True), dict( diff --git a/configs/albu_example/metafile.yml b/configs/albu_example/metafile.yml new file mode 100644 index 00000000000..3b54bdf1568 --- /dev/null +++ b/configs/albu_example/metafile.yml @@ -0,0 +1,17 @@ +Models: + - Name: mask-rcnn_r50_fpn_albu-1x_coco + In Collection: Mask R-CNN + Config: mask-rcnn_r50_fpn_albu-1x_coco.py + Metadata: + Training Memory (GB): 4.4 + Epochs: 12 + Results: + - Task: Object Detection + Dataset: COCO + Metrics: + box AP: 38.0 + - Task: Instance Segmentation + Dataset: COCO + Metrics: + mask AP: 34.5 + Weights: https://download.openmmlab.com/mmdetection/v2.0/albu_example/mask_rcnn_r50_fpn_albu_1x_coco/mask_rcnn_r50_fpn_albu_1x_coco_20200208-ab203bcd.pth diff --git a/configs/boxinst/README.md b/configs/boxinst/README.md index 6f015a1d16b..f6f01c5d27b 100644 --- a/configs/boxinst/README.md +++ b/configs/boxinst/README.md @@ -15,9 +15,10 @@ of learning masks in instance segmentation, with no modification to the segmenta ## Results and Models -| Backbone | Style | MS train | Lr schd | bbox AP | mask AP | Config | Download | -| :------: | :-----: | :------: | :-----: | :-----: | :-----: | :----------------------------------------: | :----------------------: | -| R-50 | pytorch | Y | 1x | 39.4 | 30.8 | [config](./boxinst_r50_fpn_ms-90k_coco.py) | [model](<>) \| [log](<>) | +| Backbone | Style | MS train | Lr schd | bbox AP | mask AP | Config | Download | +| :------: | :-----: | :------: | :-----: | :-----: | :-----: | :-----------------------------------------: | :---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | +| R-50 | pytorch | Y | 1x | 39.6 | 31.1 | [config](./boxinst_r50_fpn_ms-90k_coco.py) | [model](https://download.openmmlab.com/mmdetection/v3.0/boxinst/boxinst_r50_fpn_ms-90k_coco/boxinst_r50_fpn_ms-90k_coco_20221228_163052-6add751a.pth) \| [log](https://download.openmmlab.com/mmdetection/v3.0/boxinst/boxinst_r50_fpn_ms-90k_coco/boxinst_r50_fpn_ms-90k_coco_20221228_163052.log.json) | +| R-101 | pytorch | Y | 1x | 41.8 | 32.7 | [config](./boxinst_r101_fpn_ms-90k_coco.py) | [model](https://download.openmmlab.com/mmdetection/v3.0/boxinst/boxinst_r101_fpn_ms-90k_coco/boxinst_r101_fpn_ms-90k_coco_20221229_145106-facf375b.pth) \|[log](https://download.openmmlab.com/mmdetection/v3.0/boxinst/boxinst_r101_fpn_ms-90k_coco/boxinst_r101_fpn_ms-90k_coco_20221229_145106.log.json) | ## Citation diff --git a/configs/boxinst/boxinst_r101_fpn_ms-90k_coco.py b/configs/boxinst/boxinst_r101_fpn_ms-90k_coco.py new file mode 100644 index 00000000000..ab2b11628a7 --- /dev/null +++ b/configs/boxinst/boxinst_r101_fpn_ms-90k_coco.py @@ -0,0 +1,8 @@ +_base_ = './boxinst_r50_fpn_ms-90k_coco.py' + +# model settings +model = dict( + backbone=dict( + depth=101, + init_cfg=dict(type='Pretrained', + checkpoint='torchvision://resnet101'))) diff --git a/configs/boxinst/metafile.yml b/configs/boxinst/metafile.yml new file mode 100644 index 00000000000..c97fcdcd636 --- /dev/null +++ b/configs/boxinst/metafile.yml @@ -0,0 +1,52 @@ +Collections: + - Name: BoxInst + Metadata: + Training Data: COCO + Training Techniques: + - SGD with Momentum + - Weight Decay + Training Resources: 8x A100 GPUs + Architecture: + - ResNet + - FPN + - CondInst + Paper: + URL: https://arxiv.org/abs/2012.02310 + Title: 'BoxInst: High-Performance Instance Segmentation with Box Annotations' + README: configs/boxinst/README.md + Code: + URL: https://github.com/open-mmlab/mmdetection/blob/v3.0.0rc6/mmdet/models/detectors/boxinst.py#L8 + Version: v3.0.0rc6 + +Models: + - Name: boxinst_r50_fpn_ms-90k_coco + In Collection: BoxInst + Config: configs/boxinst/boxinst_r50_fpn_ms-90k_coco.py + Metadata: + Iterations: 90000 + Results: + - Task: Object Detection + Dataset: COCO + Metrics: + box AP: 39.4 + - Task: Instance Segmentation + Dataset: COCO + Metrics: + mask AP: 30.8 + Weights: https://download.openmmlab.com/mmdetection/v3.0/boxinst/boxinst_r50_fpn_ms-90k_coco/boxinst_r50_fpn_ms-90k_coco_20221228_163052-6add751a.pth + + - Name: boxinst_r101_fpn_ms-90k_coco + In Collection: BoxInst + Config: configs/boxinst/boxinst_r101_fpn_ms-90k_coco.py + Metadata: + Iterations: 90000 + Results: + - Task: Object Detection + Dataset: COCO + Metrics: + box AP: 41.8 + - Task: Instance Segmentation + Dataset: COCO + Metrics: + mask AP: 32.7 + Weights: https://download.openmmlab.com/mmdetection/v3.0/boxinst/boxinst_r101_fpn_ms-90k_coco/boxinst_r101_fpn_ms-90k_coco_20221229_145106-facf375b.pth diff --git a/configs/cascade_rpn/cascade-rpn_fast-rcnn_r50-caffe_fpn_1x_coco.py b/configs/cascade_rpn/cascade-rpn_fast-rcnn_r50-caffe_fpn_1x_coco.py index d977e78d975..ba23ce90652 100644 --- a/configs/cascade_rpn/cascade-rpn_fast-rcnn_r50-caffe_fpn_1x_coco.py +++ b/configs/cascade_rpn/cascade-rpn_fast-rcnn_r50-caffe_fpn_1x_coco.py @@ -1,17 +1,5 @@ -_base_ = '../fast_rcnn/fast-rcnn_r50_fpn_1x_coco.py' +_base_ = '../fast_rcnn/fast-rcnn_r50-caffe_fpn_1x_coco.py' model = dict( - backbone=dict( - type='ResNet', - depth=50, - num_stages=4, - out_indices=(0, 1, 2, 3), - frozen_stages=1, - norm_cfg=dict(type='BN', requires_grad=False), - norm_eval=True, - style='caffe', - init_cfg=dict( - type='Pretrained', - checkpoint='open-mmlab://detectron2/resnet50_caffe')), roi_head=dict( bbox_head=dict( bbox_coder=dict(target_stds=[0.04, 0.04, 0.08, 0.08]), @@ -25,53 +13,15 @@ pos_iou_thr=0.65, neg_iou_thr=0.65, min_pos_iou=0.65), sampler=dict(num=256))), test_cfg=dict(rcnn=dict(score_thr=1e-3))) -dataset_type = 'CocoDataset' -data_root = 'data/coco/' -img_norm_cfg = dict( - mean=[103.530, 116.280, 123.675], std=[1.0, 1.0, 1.0], to_rgb=False) -train_pipeline = [ - dict(type='LoadImageFromFile'), - dict(type='LoadProposals', num_max_proposals=300), - dict(type='LoadAnnotations', with_bbox=True), - dict(type='Resize', img_scale=(1333, 800), keep_ratio=True), - dict(type='RandomFlip', flip_ratio=0.5), - dict(type='Normalize', **img_norm_cfg), - dict(type='Pad', size_divisor=32), - dict(type='DefaultFormatBundle'), - dict(type='Collect', keys=['img', 'proposals', 'gt_bboxes', 'gt_labels']), -] -test_pipeline = [ - dict(type='LoadImageFromFile'), - dict(type='LoadProposals', num_max_proposals=300), - dict( - type='MultiScaleFlipAug', - img_scale=(1333, 800), - flip=False, - transforms=[ - dict(type='Resize', keep_ratio=True), - dict(type='RandomFlip'), - dict(type='Normalize', **img_norm_cfg), - dict(type='Pad', size_divisor=32), - dict(type='ImageToTensor', keys=['img']), - dict(type='ToTensor', keys=['proposals']), - dict( - type='ToDataContainer', - fields=[dict(key='proposals', stack=False)]), - dict(type='Collect', keys=['img', 'proposals']), - ]) -] -# TODO support proposals input -data = dict( - train=dict( - proposal_file=data_root + - 'proposals/crpn_r50_caffe_fpn_1x_train2017.pkl', - pipeline=train_pipeline), - val=dict( - proposal_file=data_root + - 'proposals/crpn_r50_caffe_fpn_1x_val2017.pkl', - pipeline=test_pipeline), - test=dict( - proposal_file=data_root + - 'proposals/crpn_r50_caffe_fpn_1x_val2017.pkl', - pipeline=test_pipeline)) + +# MMEngine support the following two ways, users can choose +# according to convenience +# train_dataloader = dict(dataset=dict(proposal_file='proposals/crpn_r50_caffe_fpn_1x_train2017.pkl')) # noqa +_base_.train_dataloader.dataset.proposal_file = 'proposals/crpn_r50_caffe_fpn_1x_train2017.pkl' # noqa + +# val_dataloader = dict(dataset=dict(proposal_file='proposals/crpn_r50_caffe_fpn_1x_val2017.pkl')) # noqa +# test_dataloader = val_dataloader +_base_.val_dataloader.dataset.proposal_file = 'proposals/crpn_r50_caffe_fpn_1x_val2017.pkl' # noqa +test_dataloader = _base_.val_dataloader + optim_wrapper = dict(clip_grad=dict(max_norm=35, norm_type=2)) diff --git a/configs/centernet/centernet-update_r50-caffe_fpn_ms-1x_coco.py b/configs/centernet/centernet-update_r50-caffe_fpn_ms-1x_coco.py index 5e5a24ee5e4..1f6e2b3919d 100644 --- a/configs/centernet/centernet-update_r50-caffe_fpn_ms-1x_coco.py +++ b/configs/centernet/centernet-update_r50-caffe_fpn_ms-1x_coco.py @@ -64,9 +64,7 @@ # single-scale training is about 39.3 train_pipeline = [ - dict( - type='LoadImageFromFile', - file_client_args={{_base_.file_client_args}}), + dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), dict(type='LoadAnnotations', with_bbox=True), dict( type='RandomChoiceResize', diff --git a/configs/centernet/centernet_r18-dcnv2_8xb16-crop512-140e_coco.py b/configs/centernet/centernet_r18-dcnv2_8xb16-crop512-140e_coco.py index 83b07195971..732a55d59ad 100644 --- a/configs/centernet/centernet_r18-dcnv2_8xb16-crop512-140e_coco.py +++ b/configs/centernet/centernet_r18-dcnv2_8xb16-crop512-140e_coco.py @@ -39,9 +39,7 @@ test_cfg=dict(topk=100, local_maximum_kernel=3, max_per_img=100)) train_pipeline = [ - dict( - type='LoadImageFromFile', - file_client_args={{_base_.file_client_args}}), + dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), dict(type='LoadAnnotations', with_bbox=True), dict( type='PhotoMetricDistortion', @@ -67,8 +65,8 @@ test_pipeline = [ dict( type='LoadImageFromFile', - to_float32=True, - file_client_args={{_base_.file_client_args}}), + backend_args={{_base_.backend_args}}, + to_float32=True), # don't need Resize dict( type='RandomCenterCropPad', @@ -102,7 +100,9 @@ ann_file='annotations/instances_train2017.json', data_prefix=dict(img='train2017/'), filter_cfg=dict(filter_empty_gt=True, min_size=32), - pipeline=train_pipeline))) + pipeline=train_pipeline, + backend_args={{_base_.backend_args}}, + ))) val_dataloader = dict(dataset=dict(pipeline=test_pipeline)) test_dataloader = val_dataloader diff --git a/configs/centernet/centernet_tta.py b/configs/centernet/centernet_tta.py index 0c68914267e..edd7b03ecde 100644 --- a/configs/centernet/centernet_tta.py +++ b/configs/centernet/centernet_tta.py @@ -5,10 +5,7 @@ tta_cfg=dict(nms=dict(type='nms', iou_threshold=0.5), max_per_img=100)) tta_pipeline = [ - dict( - type='LoadImageFromFile', - to_float32=True, - file_client_args=dict(backend='disk')), + dict(type='LoadImageFromFile', to_float32=True, backend_args=None), dict( type='TestTimeAug', transforms=[ diff --git a/configs/centernet/metafile.yml b/configs/centernet/metafile.yml index 578a5996789..13ea6659d3f 100644 --- a/configs/centernet/metafile.yml +++ b/configs/centernet/metafile.yml @@ -44,3 +44,16 @@ Models: Metrics: box AP: 25.9 Weights: https://download.openmmlab.com/mmdetection/v2.0/centernet/centernet_resnet18_140e_coco/centernet_resnet18_140e_coco_20210705_093630-bb5b3bf7.pth + + - Name: centernet-update_r50-caffe_fpn_ms-1x_coco + In Collection: CenterNet + Config: configs/centernet/centernet-update_r50-caffe_fpn_ms-1x_coco.py + Metadata: + Batch Size: 16 + Training Memory (GB): 3.3 + Epochs: 12 + Results: + - Task: Object Detection + Dataset: COCO + Metrics: + box AP: 40.2 diff --git a/configs/centripetalnet/README.md b/configs/centripetalnet/README.md index 4f06f45d38b..21edbd261af 100644 --- a/configs/centripetalnet/README.md +++ b/configs/centripetalnet/README.md @@ -20,7 +20,7 @@ Keypoint-based detectors have achieved pretty-well performance. However, incorre Note: -- TTA setting is single-scale and `flip=True`. +- TTA setting is single-scale and `flip=True`. If you want to reproduce the TTA performance, please add `--tta` in the test command. - The model we released is the best checkpoint rather than the latest checkpoint (box AP 44.8 vs 44.6 in our experiment). ## Citation diff --git a/configs/centripetalnet/centripetalnet_hourglass104_16xb6-crop511-210e-mstest_coco.py b/configs/centripetalnet/centripetalnet_hourglass104_16xb6-crop511-210e-mstest_coco.py index 043496f3da2..b757ffd16dc 100644 --- a/configs/centripetalnet/centripetalnet_hourglass104_16xb6-crop511-210e-mstest_coco.py +++ b/configs/centripetalnet/centripetalnet_hourglass104_16xb6-crop511-210e-mstest_coco.py @@ -45,9 +45,7 @@ # data settings train_pipeline = [ - dict( - type='LoadImageFromFile', - file_client_args={{_base_.file_client_args}}), + dict(type='LoadImageFromFile', backend_args=_base_.backend_args), dict(type='LoadAnnotations', with_bbox=True), dict( type='PhotoMetricDistortion', @@ -72,12 +70,11 @@ dict(type='PackDetInputs'), ] -# TODO: mstest is not currently implemented test_pipeline = [ dict( type='LoadImageFromFile', to_float32=True, - file_client_args={{_base_.file_client_args}}), + backend_args=_base_.backend_args), # don't need Resize dict( type='RandomCenterCropPad', @@ -138,3 +135,47 @@ # USER SHOULD NOT CHANGE ITS VALUES. # base_batch_size = (16 GPUs) x (6 samples per GPU) auto_scale_lr = dict(base_batch_size=96) + +tta_model = dict( + type='DetTTAModel', + tta_cfg=dict( + nms=dict(type='soft_nms', iou_threshold=0.5, method='gaussian'), + max_per_img=100)) + +tta_pipeline = [ + dict( + type='LoadImageFromFile', + to_float32=True, + backend_args=_base_.backend_args), + dict( + type='TestTimeAug', + transforms=[ + [ + # ``RandomFlip`` must be placed before ``RandomCenterCropPad``, + # otherwise bounding box coordinates after flipping cannot be + # recovered correctly. + dict(type='RandomFlip', prob=1.), + dict(type='RandomFlip', prob=0.) + ], + [ + dict( + type='RandomCenterCropPad', + crop_size=None, + ratios=None, + border=None, + test_mode=True, + test_pad_mode=['logical_or', 127], + mean=data_preprocessor['mean'], + std=data_preprocessor['std'], + # Image data is not converted to rgb. + to_rgb=data_preprocessor['bgr_to_rgb']) + ], + [dict(type='LoadAnnotations', with_bbox=True)], + [ + dict( + type='PackDetInputs', + meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape', + 'flip', 'flip_direction', 'border')) + ] + ]) +] diff --git a/configs/common/lsj-100e_coco-detection.py b/configs/common/lsj-100e_coco-detection.py index b03e33809da..bb631e5d5c1 100644 --- a/configs/common/lsj-100e_coco-detection.py +++ b/configs/common/lsj-100e_coco-detection.py @@ -4,17 +4,23 @@ data_root = 'data/coco/' image_size = (1024, 1024) -file_client_args = dict(backend='disk') -# comment out the code below to use different file client -# file_client_args = dict( +# Example to use different file client +# Method 1: simply set the data root and let the file I/O module +# automatically infer from prefix (not support LMDB and Memcache yet) + +# data_root = 's3://openmmlab/datasets/detection/coco/' + +# Method 2: Use `backend_args`, `file_client_args` in versions before 3.0.0rc6 +# backend_args = dict( # backend='petrel', # path_mapping=dict({ # './data/': 's3://openmmlab/datasets/detection/', # 'data/': 's3://openmmlab/datasets/detection/' # })) +backend_args = None train_pipeline = [ - dict(type='LoadImageFromFile', file_client_args=file_client_args), + dict(type='LoadImageFromFile', backend_args=backend_args), dict(type='LoadAnnotations', with_bbox=True), dict( type='RandomResize', @@ -32,7 +38,7 @@ dict(type='PackDetInputs') ] test_pipeline = [ - dict(type='LoadImageFromFile', file_client_args=file_client_args), + dict(type='LoadImageFromFile', backend_args=backend_args), dict(type='Resize', scale=(1333, 800), keep_ratio=True), dict(type='LoadAnnotations', with_bbox=True), dict( @@ -56,7 +62,8 @@ ann_file='annotations/instances_train2017.json', data_prefix=dict(img='train2017/'), filter_cfg=dict(filter_empty_gt=True, min_size=32), - pipeline=train_pipeline))) + pipeline=train_pipeline, + backend_args=backend_args))) val_dataloader = dict( batch_size=1, num_workers=2, @@ -69,14 +76,16 @@ ann_file='annotations/instances_val2017.json', data_prefix=dict(img='val2017/'), test_mode=True, - pipeline=test_pipeline)) + pipeline=test_pipeline, + backend_args=backend_args)) test_dataloader = val_dataloader val_evaluator = dict( type='CocoMetric', ann_file=data_root + 'annotations/instances_val2017.json', metric='bbox', - format_only=False) + format_only=False, + backend_args=backend_args) test_evaluator = val_evaluator max_epochs = 25 diff --git a/configs/common/lsj-100e_coco-instance.py b/configs/common/lsj-100e_coco-instance.py index b00ab686126..6e62729d639 100644 --- a/configs/common/lsj-100e_coco-instance.py +++ b/configs/common/lsj-100e_coco-instance.py @@ -4,17 +4,23 @@ data_root = 'data/coco/' image_size = (1024, 1024) -file_client_args = dict(backend='disk') -# comment out the code below to use different file client -# file_client_args = dict( +# Example to use different file client +# Method 1: simply set the data root and let the file I/O module +# automatically infer from prefix (not support LMDB and Memcache yet) + +# data_root = 's3://openmmlab/datasets/detection/coco/' + +# Method 2: Use `backend_args`, `file_client_args` in versions before 3.0.0rc6 +# backend_args = dict( # backend='petrel', # path_mapping=dict({ # './data/': 's3://openmmlab/datasets/detection/', # 'data/': 's3://openmmlab/datasets/detection/' # })) +backend_args = None train_pipeline = [ - dict(type='LoadImageFromFile', file_client_args=file_client_args), + dict(type='LoadImageFromFile', backend_args=backend_args), dict(type='LoadAnnotations', with_bbox=True, with_mask=True), dict( type='RandomResize', @@ -32,7 +38,7 @@ dict(type='PackDetInputs') ] test_pipeline = [ - dict(type='LoadImageFromFile', file_client_args=file_client_args), + dict(type='LoadImageFromFile', backend_args=backend_args), dict(type='Resize', scale=(1333, 800), keep_ratio=True), dict(type='LoadAnnotations', with_bbox=True, with_mask=True), dict( @@ -56,7 +62,8 @@ ann_file='annotations/instances_train2017.json', data_prefix=dict(img='train2017/'), filter_cfg=dict(filter_empty_gt=True, min_size=32), - pipeline=train_pipeline))) + pipeline=train_pipeline, + backend_args=backend_args))) val_dataloader = dict( batch_size=1, num_workers=2, @@ -69,14 +76,16 @@ ann_file='annotations/instances_val2017.json', data_prefix=dict(img='val2017/'), test_mode=True, - pipeline=test_pipeline)) + pipeline=test_pipeline, + backend_args=backend_args)) test_dataloader = val_dataloader val_evaluator = dict( type='CocoMetric', ann_file=data_root + 'annotations/instances_val2017.json', metric=['bbox', 'segm'], - format_only=False) + format_only=False, + backend_args=backend_args) test_evaluator = val_evaluator max_epochs = 25 diff --git a/configs/common/ms-90k_coco.py b/configs/common/ms-90k_coco.py index 7d7b5f35975..e2d6c3dafb6 100644 --- a/configs/common/ms-90k_coco.py +++ b/configs/common/ms-90k_coco.py @@ -3,20 +3,27 @@ # dataset settings dataset_type = 'CocoDataset' data_root = 'data/coco/' -# file_client_args = dict( +# Example to use different file client +# Method 1: simply set the data root and let the file I/O module +# automatically infer from prefix (not support LMDB and Memcache yet) + +# data_root = 's3://openmmlab/datasets/detection/coco/' + +# Method 2: Use `backend_args`, `file_client_args` in versions before 3.0.0rc6 +# backend_args = dict( # backend='petrel', # path_mapping=dict({ # './data/': 's3://openmmlab/datasets/detection/', # 'data/': 's3://openmmlab/datasets/detection/' # })) -file_client_args = dict(backend='disk') +backend_args = None # Align with Detectron2 backend = 'pillow' train_pipeline = [ dict( type='LoadImageFromFile', - file_client_args=file_client_args, + backend_args=backend_args, imdecode_backend=backend), dict(type='LoadAnnotations', with_bbox=True), dict( @@ -31,7 +38,7 @@ test_pipeline = [ dict( type='LoadImageFromFile', - file_client_args=file_client_args, + backend_args=backend_args, imdecode_backend=backend), dict(type='Resize', scale=(1333, 800), keep_ratio=True, backend=backend), dict(type='LoadAnnotations', with_bbox=True), @@ -53,7 +60,8 @@ ann_file='annotations/instances_train2017.json', data_prefix=dict(img='train2017/'), filter_cfg=dict(filter_empty_gt=True, min_size=32), - pipeline=train_pipeline)) + pipeline=train_pipeline, + backend_args=backend_args)) val_dataloader = dict( batch_size=1, num_workers=2, @@ -67,14 +75,16 @@ ann_file='annotations/instances_val2017.json', data_prefix=dict(img='val2017/'), test_mode=True, - pipeline=test_pipeline)) + pipeline=test_pipeline, + backend_args=backend_args)) test_dataloader = val_dataloader val_evaluator = dict( type='CocoMetric', ann_file=data_root + 'annotations/instances_val2017.json', metric='bbox', - format_only=False) + format_only=False, + backend_args=backend_args) test_evaluator = val_evaluator # training schedule for 90k diff --git a/configs/common/ms-poly-90k_coco-instance.py b/configs/common/ms-poly-90k_coco-instance.py index 2a2deb5bf00..d5566b3c3b8 100644 --- a/configs/common/ms-poly-90k_coco-instance.py +++ b/configs/common/ms-poly-90k_coco-instance.py @@ -3,20 +3,27 @@ # dataset settings dataset_type = 'CocoDataset' data_root = 'data/coco/' -# file_client_args = dict( +# Example to use different file client +# Method 1: simply set the data root and let the file I/O module +# automatically infer from prefix (not support LMDB and Memcache yet) + +# data_root = 's3://openmmlab/datasets/detection/coco/' + +# Method 2: Use `backend_args`, `file_client_args` in versions before 3.0.0rc6 +# backend_args = dict( # backend='petrel', # path_mapping=dict({ # './data/': 's3://openmmlab/datasets/detection/', # 'data/': 's3://openmmlab/datasets/detection/' # })) -file_client_args = dict(backend='disk') +backend_args = None # Align with Detectron2 backend = 'pillow' train_pipeline = [ dict( type='LoadImageFromFile', - file_client_args=file_client_args, + backend_args=backend_args, imdecode_backend=backend), dict( type='LoadAnnotations', @@ -35,7 +42,7 @@ test_pipeline = [ dict( type='LoadImageFromFile', - file_client_args=file_client_args, + backend_args=backend_args, imdecode_backend=backend), dict(type='Resize', scale=(1333, 800), keep_ratio=True, backend=backend), dict( @@ -61,7 +68,8 @@ ann_file='annotations/instances_train2017.json', data_prefix=dict(img='train2017/'), filter_cfg=dict(filter_empty_gt=True, min_size=32), - pipeline=train_pipeline)) + pipeline=train_pipeline, + backend_args=backend_args)) val_dataloader = dict( batch_size=1, num_workers=2, @@ -75,14 +83,16 @@ ann_file='annotations/instances_val2017.json', data_prefix=dict(img='val2017/'), test_mode=True, - pipeline=test_pipeline)) + pipeline=test_pipeline, + backend_args=backend_args)) test_dataloader = val_dataloader val_evaluator = dict( type='CocoMetric', ann_file=data_root + 'annotations/instances_val2017.json', metric=['bbox', 'segm'], - format_only=False) + format_only=False, + backend_args=backend_args) test_evaluator = val_evaluator # training schedule for 90k diff --git a/configs/common/ms-poly_3x_coco-instance.py b/configs/common/ms-poly_3x_coco-instance.py index 6a3d5e5569d..04072f9b84c 100644 --- a/configs/common/ms-poly_3x_coco-instance.py +++ b/configs/common/ms-poly_3x_coco-instance.py @@ -3,18 +3,25 @@ dataset_type = 'CocoDataset' data_root = 'data/coco/' -# file_client_args = dict( +# Example to use different file client +# Method 1: simply set the data root and let the file I/O module +# automatically infer from prefix (not support LMDB and Memcache yet) + +# data_root = 's3://openmmlab/datasets/detection/coco/' + +# Method 2: Use `backend_args`, `file_client_args` in versions before 3.0.0rc6 +# backend_args = dict( # backend='petrel', # path_mapping=dict({ # './data/': 's3://openmmlab/datasets/detection/', # 'data/': 's3://openmmlab/datasets/detection/' # })) -file_client_args = dict(backend='disk') +backend_args = None # In mstrain 3x config, img_scale=[(1333, 640), (1333, 800)], # multiscale_mode='range' train_pipeline = [ - dict(type='LoadImageFromFile', file_client_args=file_client_args), + dict(type='LoadImageFromFile', backend_args=backend_args), dict( type='LoadAnnotations', with_bbox=True, @@ -27,7 +34,7 @@ dict(type='PackDetInputs'), ] test_pipeline = [ - dict(type='LoadImageFromFile', file_client_args=file_client_args), + dict(type='LoadImageFromFile', backend_args=backend_args), dict(type='Resize', scale=(1333, 800), keep_ratio=True), dict( type='LoadAnnotations', @@ -55,7 +62,8 @@ ann_file='annotations/instances_train2017.json', data_prefix=dict(img='train2017/'), filter_cfg=dict(filter_empty_gt=True, min_size=32), - pipeline=train_pipeline))) + pipeline=train_pipeline, + backend_args=backend_args))) val_dataloader = dict( batch_size=2, num_workers=2, @@ -68,13 +76,15 @@ ann_file='annotations/instances_val2017.json', data_prefix=dict(img='val2017/'), test_mode=True, - pipeline=test_pipeline)) + pipeline=test_pipeline, + backend_args=backend_args)) test_dataloader = val_dataloader val_evaluator = dict( type='CocoMetric', ann_file=data_root + 'annotations/instances_val2017.json', - metric=['bbox', 'segm']) + metric=['bbox', 'segm'], + backend_args=backend_args) test_evaluator = val_evaluator # training schedule for 3x with `RepeatDataset` diff --git a/configs/common/ms_3x_coco-instance.py b/configs/common/ms_3x_coco-instance.py index cae37d176ac..f80cf88e9b1 100644 --- a/configs/common/ms_3x_coco-instance.py +++ b/configs/common/ms_3x_coco-instance.py @@ -4,16 +4,23 @@ dataset_type = 'CocoDataset' data_root = 'data/coco/' -# file_client_args = dict( +# Example to use different file client +# Method 1: simply set the data root and let the file I/O module +# automatically infer from prefix (not support LMDB and Memcache yet) + +# data_root = 's3://openmmlab/datasets/detection/coco/' + +# Method 2: Use `backend_args`, `file_client_args` in versions before 3.0.0rc6 +# backend_args = dict( # backend='petrel', # path_mapping=dict({ # './data/': 's3://openmmlab/datasets/detection/', # 'data/': 's3://openmmlab/datasets/detection/' # })) -file_client_args = dict(backend='disk') +backend_args = None train_pipeline = [ - dict(type='LoadImageFromFile', file_client_args=file_client_args), + dict(type='LoadImageFromFile', backend_args=backend_args), dict(type='LoadAnnotations', with_bbox=True, with_mask=True), dict( type='RandomResize', scale=[(1333, 640), (1333, 800)], @@ -22,7 +29,7 @@ dict(type='PackDetInputs') ] test_pipeline = [ - dict(type='LoadImageFromFile', file_client_args=file_client_args), + dict(type='LoadImageFromFile', backend_args=backend_args), dict(type='Resize', scale=(1333, 800), keep_ratio=True), dict(type='LoadAnnotations', with_bbox=True, with_mask=True), dict( @@ -37,34 +44,37 @@ sampler=dict(type='DefaultSampler', shuffle=True), batch_sampler=dict(type='AspectRatioBatchSampler'), dataset=dict( - type=dataset_type, - data_root=data_root, - ann_file='annotations/instances_train2017.json', - data_prefix=dict(img='train2017/'), - filter_cfg=dict(filter_empty_gt=True, min_size=32), - pipeline=train_pipeline)) + type='RepeatDataset', + times=3, + dataset=dict( + type=dataset_type, + data_root=data_root, + ann_file='annotations/instances_train2017.json', + data_prefix=dict(img='train2017/'), + filter_cfg=dict(filter_empty_gt=True, min_size=32), + pipeline=train_pipeline, + backend_args=backend_args))) val_dataloader = dict( - batch_size=2, + batch_size=1, num_workers=2, persistent_workers=True, drop_last=False, sampler=dict(type='DefaultSampler', shuffle=False), dataset=dict( - type='RepeatDataset', - times=3, - dataset=dict( - type=dataset_type, - data_root=data_root, - ann_file='annotations/instances_val2017.json', - data_prefix=dict(img='val2017/'), - test_mode=True, - pipeline=test_pipeline))) + type=dataset_type, + data_root=data_root, + ann_file='annotations/instances_val2017.json', + data_prefix=dict(img='val2017/'), + test_mode=True, + pipeline=test_pipeline, + backend_args=backend_args)) test_dataloader = val_dataloader val_evaluator = dict( type='CocoMetric', ann_file=data_root + 'annotations/instances_val2017.json', - metric='bbox') + metric='bbox', + backend_args=backend_args) test_evaluator = val_evaluator # training schedule for 3x with `RepeatDataset` diff --git a/configs/common/ms_3x_coco.py b/configs/common/ms_3x_coco.py index 0ca42634478..facbb34cf05 100644 --- a/configs/common/ms_3x_coco.py +++ b/configs/common/ms_3x_coco.py @@ -4,16 +4,23 @@ dataset_type = 'CocoDataset' data_root = 'data/coco/' -# file_client_args = dict( +# Example to use different file client +# Method 1: simply set the data root and let the file I/O module +# automatically infer from prefix (not support LMDB and Memcache yet) + +# data_root = 's3://openmmlab/datasets/detection/coco/' + +# Method 2: Use `backend_args`, `file_client_args` in versions before 3.0.0rc6 +# backend_args = dict( # backend='petrel', # path_mapping=dict({ # './data/': 's3://openmmlab/datasets/detection/', # 'data/': 's3://openmmlab/datasets/detection/' # })) -file_client_args = dict(backend='disk') +backend_args = None train_pipeline = [ - dict(type='LoadImageFromFile', file_client_args=file_client_args), + dict(type='LoadImageFromFile', backend_args=backend_args), dict(type='LoadAnnotations', with_bbox=True), dict( type='RandomResize', scale=[(1333, 640), (1333, 800)], @@ -22,7 +29,7 @@ dict(type='PackDetInputs') ] test_pipeline = [ - dict(type='LoadImageFromFile', file_client_args=file_client_args), + dict(type='LoadImageFromFile', backend_args=backend_args), dict(type='Resize', scale=(1333, 800), keep_ratio=True), dict(type='LoadAnnotations', with_bbox=True), dict( @@ -45,7 +52,8 @@ ann_file='annotations/instances_train2017.json', data_prefix=dict(img='train2017/'), filter_cfg=dict(filter_empty_gt=True, min_size=32), - pipeline=train_pipeline))) + pipeline=train_pipeline, + backend_args=backend_args))) val_dataloader = dict( batch_size=1, num_workers=2, @@ -58,13 +66,15 @@ ann_file='annotations/instances_val2017.json', data_prefix=dict(img='val2017/'), test_mode=True, - pipeline=test_pipeline)) + pipeline=test_pipeline, + backend_args=backend_args)) test_dataloader = val_dataloader val_evaluator = dict( type='CocoMetric', ann_file=data_root + 'annotations/instances_val2017.json', - metric='bbox') + metric='bbox', + backend_args=backend_args) test_evaluator = val_evaluator # training schedule for 3x with `RepeatDataset` diff --git a/configs/common/ssj_270k_coco-instance.py b/configs/common/ssj_270k_coco-instance.py index 677f375e1a1..7407644fd59 100644 --- a/configs/common/ssj_270k_coco-instance.py +++ b/configs/common/ssj_270k_coco-instance.py @@ -5,19 +5,25 @@ image_size = (1024, 1024) -file_client_args = dict(backend='disk') -# comment out the code below to use different file client -# file_client_args = dict( +# Example to use different file client +# Method 1: simply set the data root and let the file I/O module +# automatically infer from prefix (not support LMDB and Memcache yet) + +# data_root = 's3://openmmlab/datasets/detection/coco/' + +# Method 2: Use `backend_args`, `file_client_args` in versions before 3.0.0rc6 +# backend_args = dict( # backend='petrel', # path_mapping=dict({ # './data/': 's3://openmmlab/datasets/detection/', # 'data/': 's3://openmmlab/datasets/detection/' # })) +backend_args = None # Standard Scale Jittering (SSJ) resizes and crops an image # with a resize range of 0.8 to 1.25 of the original image size. train_pipeline = [ - dict(type='LoadImageFromFile', file_client_args=file_client_args), + dict(type='LoadImageFromFile', backend_args=backend_args), dict(type='LoadAnnotations', with_bbox=True, with_mask=True), dict( type='RandomResize', @@ -35,7 +41,7 @@ dict(type='PackDetInputs') ] test_pipeline = [ - dict(type='LoadImageFromFile', file_client_args=file_client_args), + dict(type='LoadImageFromFile', backend_args=backend_args), dict(type='Resize', scale=(1333, 800), keep_ratio=True), dict(type='LoadAnnotations', with_bbox=True, with_mask=True), dict( @@ -55,7 +61,8 @@ ann_file='annotations/instances_train2017.json', data_prefix=dict(img='train2017/'), filter_cfg=dict(filter_empty_gt=True, min_size=32), - pipeline=train_pipeline)) + pipeline=train_pipeline, + backend_args=backend_args)) val_dataloader = dict( batch_size=1, num_workers=2, @@ -68,14 +75,16 @@ ann_file='annotations/instances_val2017.json', data_prefix=dict(img='val2017/'), test_mode=True, - pipeline=test_pipeline)) + pipeline=test_pipeline, + backend_args=backend_args)) test_dataloader = val_dataloader val_evaluator = dict( type='CocoMetric', ann_file=data_root + 'annotations/instances_val2017.json', metric=['bbox', 'segm'], - format_only=False) + format_only=False, + backend_args=backend_args) test_evaluator = val_evaluator # The model is trained by 270k iterations with batch_size 64, diff --git a/configs/common/ssj_scp_270k_coco-instance.py b/configs/common/ssj_scp_270k_coco-instance.py index 2289f2f6234..06159dd4031 100644 --- a/configs/common/ssj_scp_270k_coco-instance.py +++ b/configs/common/ssj_scp_270k_coco-instance.py @@ -5,19 +5,25 @@ image_size = (1024, 1024) -file_client_args = dict(backend='disk') -# comment out the code below to use different file client -# file_client_args = dict( +# Example to use different file client +# Method 1: simply set the data root and let the file I/O module +# automatically infer from prefix (not support LMDB and Memcache yet) + +# data_root = 's3://openmmlab/datasets/detection/coco/' + +# Method 2: Use `backend_args`, `file_client_args` in versions before 3.0.0rc6 +# backend_args = dict( # backend='petrel', # path_mapping=dict({ # './data/': 's3://openmmlab/datasets/detection/', # 'data/': 's3://openmmlab/datasets/detection/' # })) +backend_args = None # Standard Scale Jittering (SSJ) resizes and crops an image # with a resize range of 0.8 to 1.25 of the original image size. load_pipeline = [ - dict(type='LoadImageFromFile', file_client_args=file_client_args), + dict(type='LoadImageFromFile', backend_args=backend_args), dict(type='LoadAnnotations', with_bbox=True, with_mask=True), dict( type='RandomResize', @@ -49,5 +55,6 @@ ann_file='annotations/instances_train2017.json', data_prefix=dict(img='train2017/'), filter_cfg=dict(filter_empty_gt=True, min_size=32), - pipeline=load_pipeline), + pipeline=load_pipeline, + backend_args=backend_args), pipeline=train_pipeline)) diff --git a/configs/conditional_detr/README.md b/configs/conditional_detr/README.md index e36ea20565a..4043571c576 100644 --- a/configs/conditional_detr/README.md +++ b/configs/conditional_detr/README.md @@ -25,7 +25,7 @@ We provide the config files and models for Conditional DETR: [Conditional DETR f | Backbone | Model | Lr schd | Mem (GB) | Inf time (fps) | box AP | Config | Download | | :------: | :--------------: | :-----: | :------: | :------------: | :----: | :-----------------------------------------------: | :----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | -| R-50 | Conditional DETR | 50e | | | 40.9 | [config](./conditional-detr_r50_8xb2-50e_coco.py) | [model](https://download.openmmlab.com/mmdetection/v3.0/conditional_detr/conditional-detr_r50_8xb2-50e_coco/conditional-detr_r50_8xb2-50e_coco_20221121_180202-c83a1dc0.pth) \| [log](https://download.openmmlab.com/mmdetection/v3.0/conditional_detr/conditional-detr_r50_8xb2-50e_coco/conditional-detr_r50_8xb2-50e_coco_20221121_180202.log.json) | +| R-50 | Conditional DETR | 50e | | | 41.1 | [config](./conditional-detr_r50_8xb2-50e_coco.py) | [model](https://download.openmmlab.com/mmdetection/v3.0/conditional_detr/conditional-detr_r50_8xb2-50e_coco/conditional-detr_r50_8xb2-50e_coco_20221121_180202-c83a1dc0.pth) \| [log](https://download.openmmlab.com/mmdetection/v3.0/conditional_detr/conditional-detr_r50_8xb2-50e_coco/conditional-detr_r50_8xb2-50e_coco_20221121_180202.log.json) | ## Citation diff --git a/configs/convnext/cascade-mask-rcnn_convnext-t-p4-w7_fpn_4conv1fc-giou_amp-ms-crop-3x_coco.py b/configs/convnext/cascade-mask-rcnn_convnext-t-p4-w7_fpn_4conv1fc-giou_amp-ms-crop-3x_coco.py index 53edb391921..1e031e90d52 100644 --- a/configs/convnext/cascade-mask-rcnn_convnext-t-p4-w7_fpn_4conv1fc-giou_amp-ms-crop-3x_coco.py +++ b/configs/convnext/cascade-mask-rcnn_convnext-t-p4-w7_fpn_4conv1fc-giou_amp-ms-crop-3x_coco.py @@ -85,9 +85,7 @@ # augmentation strategy originates from DETR / Sparse RCNN train_pipeline = [ - dict( - type='LoadImageFromFile', - file_client_args={{_base_.file_client_args}}), + dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), dict(type='LoadAnnotations', with_bbox=True, with_mask=True), dict(type='RandomFlip', prob=0.5), dict( diff --git a/configs/convnext/mask-rcnn_convnext-t-p4-w7_fpn_amp-ms-crop-3x_coco.py b/configs/convnext/mask-rcnn_convnext-t-p4-w7_fpn_amp-ms-crop-3x_coco.py index e9932c44b03..23d46e289eb 100644 --- a/configs/convnext/mask-rcnn_convnext-t-p4-w7_fpn_amp-ms-crop-3x_coco.py +++ b/configs/convnext/mask-rcnn_convnext-t-p4-w7_fpn_amp-ms-crop-3x_coco.py @@ -26,9 +26,7 @@ # augmentation strategy originates from DETR / Sparse RCNN train_pipeline = [ - dict( - type='LoadImageFromFile', - file_client_args={{_base_.file_client_args}}), + dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), dict(type='LoadAnnotations', with_bbox=True, with_mask=True), dict(type='RandomFlip', prob=0.5), dict( diff --git a/configs/cornernet/README.md b/configs/cornernet/README.md index 21f74278fcc..e44964d8eac 100644 --- a/configs/cornernet/README.md +++ b/configs/cornernet/README.md @@ -22,7 +22,7 @@ We propose CornerNet, a new approach to object detection where we detect an obje Note: -- TTA setting is single-scale and `flip=True`. +- TTA setting is single-scale and `flip=True`. If you want to reproduce the TTA performance, please add `--tta` in the test command. - Experiments with `images_per_gpu=6` are conducted on Tesla V100-SXM2-32GB, `images_per_gpu=3` are conducted on GeForce GTX 1080 Ti. - Here are the descriptions of each experiment setting: - 10 x 5: 10 GPUs with 5 images per gpu. This is the same setting as that reported in the original paper. diff --git a/configs/cornernet/cornernet_hourglass104_8xb6-210e-mstest_coco.py b/configs/cornernet/cornernet_hourglass104_8xb6-210e-mstest_coco.py index 38c43a1c9f2..bdb46fff164 100644 --- a/configs/cornernet/cornernet_hourglass104_8xb6-210e-mstest_coco.py +++ b/configs/cornernet/cornernet_hourglass104_8xb6-210e-mstest_coco.py @@ -45,9 +45,7 @@ # data settings train_pipeline = [ - dict( - type='LoadImageFromFile', - file_client_args={{_base_.file_client_args}}), + dict(type='LoadImageFromFile', backend_args=_base_.backend_args), dict(type='LoadAnnotations', with_bbox=True), dict( type='PhotoMetricDistortion', @@ -73,12 +71,12 @@ dict(type='PackDetInputs'), ] -# TODO: mstest is not currently implemented test_pipeline = [ dict( type='LoadImageFromFile', to_float32=True, - file_client_args={{_base_.file_client_args}}), + backend_args=_base_.backend_args, + ), # don't need Resize dict( type='RandomCenterCropPad', @@ -139,3 +137,47 @@ # USER SHOULD NOT CHANGE ITS VALUES. # base_batch_size = (8 GPUs) x (6 samples per GPU) auto_scale_lr = dict(base_batch_size=48) + +tta_model = dict( + type='DetTTAModel', + tta_cfg=dict( + nms=dict(type='soft_nms', iou_threshold=0.5, method='gaussian'), + max_per_img=100)) + +tta_pipeline = [ + dict( + type='LoadImageFromFile', + to_float32=True, + backend_args=_base_.backend_args), + dict( + type='TestTimeAug', + transforms=[ + [ + # ``RandomFlip`` must be placed before ``RandomCenterCropPad``, + # otherwise bounding box coordinates after flipping cannot be + # recovered correctly. + dict(type='RandomFlip', prob=1.), + dict(type='RandomFlip', prob=0.) + ], + [ + dict( + type='RandomCenterCropPad', + crop_size=None, + ratios=None, + border=None, + test_mode=True, + test_pad_mode=['logical_or', 127], + mean=data_preprocessor['mean'], + std=data_preprocessor['std'], + # Image data is not converted to rgb. + to_rgb=data_preprocessor['bgr_to_rgb']) + ], + [dict(type='LoadAnnotations', with_bbox=True)], + [ + dict( + type='PackDetInputs', + meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape', + 'flip', 'flip_direction', 'border')) + ] + ]) +] diff --git a/configs/crowddet/crowddet-rcnn_r50_fpn_8xb2-30e_crowdhuman.py b/configs/crowddet/crowddet-rcnn_r50_fpn_8xb2-30e_crowdhuman.py index 97ec2db3c01..8815be77d49 100644 --- a/configs/crowddet/crowddet-rcnn_r50_fpn_8xb2-30e_crowdhuman.py +++ b/configs/crowddet/crowddet-rcnn_r50_fpn_8xb2-30e_crowdhuman.py @@ -132,9 +132,24 @@ dataset_type = 'CrowdHumanDataset' data_root = 'data/CrowdHuman/' -file_client_args = dict(backend='disk') + +# Example to use different file client +# Method 1: simply set the data root and let the file I/O module +# automatically infer from prefix (not support LMDB and Memcache yet) + +# data_root = 's3://openmmlab/datasets/tracking/CrowdHuman/' + +# Method 2: Use `backend_args`, `file_client_args` in versions before 3.0.0rc6 +# backend_args = dict( +# backend='petrel', +# path_mapping=dict({ +# './data/': 's3://openmmlab/datasets/tracking/', +# 'data/': 's3://openmmlab/datasets/tracking/' +# })) +backend_args = None + train_pipeline = [ - dict(type='LoadImageFromFile', file_client_args=file_client_args), + dict(type='LoadImageFromFile', backend_args=backend_args), dict(type='LoadAnnotations', with_bbox=True), dict(type='RandomFlip', prob=0.5), dict( @@ -143,7 +158,7 @@ 'flip_direction')) ] test_pipeline = [ - dict(type='LoadImageFromFile', file_client_args=file_client_args), + dict(type='LoadImageFromFile', backend_args=backend_args), dict(type='Resize', scale=(1400, 800), keep_ratio=True), # avoid bboxes being resized dict(type='LoadAnnotations', with_bbox=True), @@ -165,7 +180,8 @@ ann_file='annotation_train.odgt', data_prefix=dict(img='Images/'), filter_cfg=dict(filter_empty_gt=True, min_size=32), - pipeline=train_pipeline)) + pipeline=train_pipeline, + backend_args=backend_args)) val_dataloader = dict( batch_size=1, num_workers=2, @@ -178,13 +194,15 @@ ann_file='annotation_val.odgt', data_prefix=dict(img='Images/'), test_mode=True, - pipeline=test_pipeline)) + pipeline=test_pipeline, + backend_args=backend_args)) test_dataloader = val_dataloader val_evaluator = dict( type='CrowdHumanMetric', ann_file=data_root + 'annotation_val.odgt', - metric=['AP', 'MR', 'JI']) + metric=['AP', 'MR', 'JI'], + backend_args=backend_args) test_evaluator = val_evaluator train_cfg = dict(type='EpochBasedTrainLoop', max_epochs=30, val_interval=1) diff --git a/configs/dab_detr/dab-detr_r50_8xb2-50e_coco.py b/configs/dab_detr/dab-detr_r50_8xb2-50e_coco.py index 723f7b1340e..314ed97e2d8 100644 --- a/configs/dab_detr/dab-detr_r50_8xb2-50e_coco.py +++ b/configs/dab_detr/dab-detr_r50_8xb2-50e_coco.py @@ -93,9 +93,7 @@ # train_pipeline, NOTE the img_scale and the Pad's size_divisor is different # from the default setting in mmdet. train_pipeline = [ - dict( - type='LoadImageFromFile', - file_client_args={{_base_.file_client_args}}), + dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), dict(type='LoadAnnotations', with_bbox=True), dict(type='RandomFlip', prob=0.5), dict( diff --git a/configs/dcn/mask-rcnn_r50-dconv-c3-c5_fpn_amp-1x_coco.py b/configs/dcn/mask-rcnn_r50-dconv-c3-c5_fpn_amp-1x_coco.py index 38b73b6ea6e..9d01594314a 100644 --- a/configs/dcn/mask-rcnn_r50-dconv-c3-c5_fpn_amp-1x_coco.py +++ b/configs/dcn/mask-rcnn_r50-dconv-c3-c5_fpn_amp-1x_coco.py @@ -4,4 +4,7 @@ dcn=dict(type='DCN', deform_groups=1, fallback_on_stride=False), stage_with_dcn=(False, True, True, True))) -fp16 = dict(loss_scale=512.) +# MMEngine support the following two ways, users can choose +# according to convenience +# optim_wrapper = dict(type='AmpOptimWrapper') +_base_.optim_wrapper.type = 'AmpOptimWrapper' diff --git a/configs/dcnv2/mask-rcnn_r50-mdconv-c3-c5_fpn_amp-1x_coco.py b/configs/dcnv2/mask-rcnn_r50-mdconv-c3-c5_fpn_amp-1x_coco.py index 4d5cffdb183..3b3894c2d61 100644 --- a/configs/dcnv2/mask-rcnn_r50-mdconv-c3-c5_fpn_amp-1x_coco.py +++ b/configs/dcnv2/mask-rcnn_r50-mdconv-c3-c5_fpn_amp-1x_coco.py @@ -4,4 +4,7 @@ dcn=dict(type='DCNv2', deform_groups=1, fallback_on_stride=False), stage_with_dcn=(False, True, True, True))) -fp16 = dict(loss_scale=512.) +# MMEngine support the following two ways, users can choose +# according to convenience +# optim_wrapper = dict(type='AmpOptimWrapper') +_base_.optim_wrapper.type = 'AmpOptimWrapper' diff --git a/configs/deformable_detr/deformable-detr_r50_16xb2-50e_coco.py b/configs/deformable_detr/deformable-detr_r50_16xb2-50e_coco.py index b2f064cc511..e0dee411c8e 100644 --- a/configs/deformable_detr/deformable-detr_r50_16xb2-50e_coco.py +++ b/configs/deformable_detr/deformable-detr_r50_16xb2-50e_coco.py @@ -81,9 +81,7 @@ # train_pipeline, NOTE the img_scale and the Pad's size_divisor is different # from the default setting in mmdet. train_pipeline = [ - dict( - type='LoadImageFromFile', - file_client_args={{_base_.file_client_args}}), + dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), dict(type='LoadAnnotations', with_bbox=True), dict(type='RandomFlip', prob=0.5), dict( diff --git a/configs/deformable_detr/metafile.yml b/configs/deformable_detr/metafile.yml index abb85492cbb..0fba0ba09e6 100644 --- a/configs/deformable_detr/metafile.yml +++ b/configs/deformable_detr/metafile.yml @@ -28,8 +28,8 @@ Models: - Task: Object Detection Dataset: COCO Metrics: - box AP: 44.5 - Weights: https://download.openmmlab.com/mmdetection/v2.0/deformable_detr/deformable_detr_r50_16x2_50e_coco/deformable_detr_r50_16x2_50e_coco_20210419_220030-a12b9512.pth + box AP: 44.3 + Weights: https://download.openmmlab.com/mmdetection/v3.0/deformable_detr/deformable-detr_r50_16xb2-50e_coco/deformable-detr_r50_16xb2-50e_coco_20221029_210934-6bc7d21b.pth - Name: deformable-detr_refine_r50_16xb2-50e_coco In Collection: Deformable DETR @@ -40,8 +40,8 @@ Models: - Task: Object Detection Dataset: COCO Metrics: - box AP: 46.1 - Weights: https://download.openmmlab.com/mmdetection/v2.0/deformable_detr/deformable_detr_refine_r50_16x2_50e_coco/deformable_detr_refine_r50_16x2_50e_coco_20210419_220503-5f5dff21.pth + box AP: 46.2 + Weights: https://download.openmmlab.com/mmdetection/v3.0/deformable_detr/deformable-detr-refine_r50_16xb2-50e_coco/deformable-detr-refine_r50_16xb2-50e_coco_20221022_225303-844e0f93.pth - Name: deformable-detr_refine_twostage_r50_16xb2-50e_coco In Collection: Deformable DETR @@ -52,5 +52,5 @@ Models: - Task: Object Detection Dataset: COCO Metrics: - box AP: 46.8 - Weights: https://download.openmmlab.com/mmdetection/v2.0/deformable_detr/deformable_detr_twostage_refine_r50_16x2_50e_coco/deformable_detr_twostage_refine_r50_16x2_50e_coco_20210419_220613-9d28ab72.pth + box AP: 47.0 + Weights: https://download.openmmlab.com/mmdetection/v3.0/deformable_detr/deformable-detr-refine-twostage_r50_16xb2-50e_coco/deformable-detr-refine-twostage_r50_16xb2-50e_coco_20221021_184714-acc8a5ff.pth diff --git a/configs/detr/detr_r50_8xb2-150e_coco.py b/configs/detr/detr_r50_8xb2-150e_coco.py index 1aba1c3c1ca..aaa15410532 100644 --- a/configs/detr/detr_r50_8xb2-150e_coco.py +++ b/configs/detr/detr_r50_8xb2-150e_coco.py @@ -89,9 +89,7 @@ # train_pipeline, NOTE the img_scale and the Pad's size_divisor is different # from the default setting in mmdet. train_pipeline = [ - dict( - type='LoadImageFromFile', - file_client_args={{_base_.file_client_args}}), + dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), dict(type='LoadAnnotations', with_bbox=True), dict(type='RandomFlip', prob=0.5), dict( diff --git a/configs/detr/metafile.yml b/configs/detr/metafile.yml index 6b7f45eca9e..a9132dff022 100644 --- a/configs/detr/metafile.yml +++ b/configs/detr/metafile.yml @@ -29,5 +29,5 @@ Models: - Task: Object Detection Dataset: COCO Metrics: - box AP: 40.1 - Weights: https://download.openmmlab.com/mmdetection/v2.0/detr/detr_r50_8x2_150e_coco/detr_r50_8x2_150e_coco_20201130_194835-2c4b8974.pth + box AP: 39.9 + Weights: https://download.openmmlab.com/mmdetection/v3.0/detr/detr_r50_8xb2-150e_coco/detr_r50_8xb2-150e_coco_20221023_153551-436d03e8.pth diff --git a/configs/dino/README.md b/configs/dino/README.md index 0f1d4eb9702..54f51d598ef 100644 --- a/configs/dino/README.md +++ b/configs/dino/README.md @@ -14,9 +14,11 @@ We present DINO (DETR with Improved deNoising anchOr boxes), a state-of-the-art ## Results and Models -| Backbone | Model | Lr schd | box AP | Config | Download | -| :------: | :---------: | :-----: | :----: | :------------------------------------------: | :--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | -| R-50 | DINO-4scale | 12e | 49.0 | [config](./dino-4scale_r50_8xb2-12e_coco.py) | [model](https://download.openmmlab.com/mmdetection/v3.0/dino/dino-4scale_r50_8xb2-12e_coco/dino-4scale_r50_8xb2-12e_coco_20221202_182705-55b2bba2.pth) \| [log](https://download.openmmlab.com/mmdetection/v3.0/dino/dino-4scale_r50_8xb2-12e_coco/dino-4scale_r50_8xb2-12e_coco_20221202_182705.log.json) | +| Backbone | Model | Lr schd | box AP | Config | Download | +| :------: | :---------: | :-----: | :----: | :---------------------------------------------: | :---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | +| R-50 | DINO-4scale | 12e | 49.0 | [config](./dino-4scale_r50_8xb2-12e_coco.py) | [model](https://download.openmmlab.com/mmdetection/v3.0/dino/dino-4scale_r50_8xb2-12e_coco/dino-4scale_r50_8xb2-12e_coco_20221202_182705-55b2bba2.pth) \| [log](https://download.openmmlab.com/mmdetection/v3.0/dino/dino-4scale_r50_8xb2-12e_coco/dino-4scale_r50_8xb2-12e_coco_20221202_182705.log.json) | +| Swin-L | DINO-5scale | 12e | 57.2 | [config](./dino-5scale_swin-l_8xb2-12e_coco.py) | [model](https://download.openmmlab.com/mmdetection/v3.0/dino/dino-5scale_swin-l_8xb2-12e_coco/dino-5scale_swin-l_8xb2-12e_coco_20230228_072924-a654145f.pth) \| [log](https://download.openmmlab.com/mmdetection/v3.0/dino/dino-5scale_swin-l_8xb2-12e_coco/dino-5scale_swin-l_8xb2-12e_coco_20230228_072924.log) | +| Swin-L | DINO-5scale | 36e | 58.4 | [config](./dino-5scale_swin-l_8xb2-36e_coco.py) | [model](https://github.com/RistoranteRist/mmlab-weights/releases/download/dino-swinl/dino-5scale_swin-l_8xb2-36e_coco-5486e051.pth) \| [log](https://github.com/RistoranteRist/mmlab-weights/releases/download/dino-swinl/20230307_032359.log) | ### NOTE diff --git a/configs/dino/dino-4scale_r50_8xb2-12e_coco.py b/configs/dino/dino-4scale_r50_8xb2-12e_coco.py index eb5f2a44704..5831f898b4a 100644 --- a/configs/dino/dino-4scale_r50_8xb2-12e_coco.py +++ b/configs/dino/dino-4scale_r50_8xb2-12e_coco.py @@ -88,9 +88,7 @@ # train_pipeline, NOTE the img_scale and the Pad's size_divisor is different # from the default setting in mmdet. train_pipeline = [ - dict( - type='LoadImageFromFile', - file_client_args={{_base_.file_client_args}}), + dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), dict(type='LoadAnnotations', with_bbox=True), dict(type='RandomFlip', prob=0.5), dict( diff --git a/configs/dino/dino-5scale_swin-l_8xb2-12e_coco.py b/configs/dino/dino-5scale_swin-l_8xb2-12e_coco.py new file mode 100644 index 00000000000..fd94e9936c7 --- /dev/null +++ b/configs/dino/dino-5scale_swin-l_8xb2-12e_coco.py @@ -0,0 +1,31 @@ +_base_ = './dino-4scale_r50_8xb2-12e_coco.py' + +fp16 = dict(loss_scale=512.) +pretrained = 'https://github.com/SwinTransformer/storage/releases/download/v1.0.0/swin_large_patch4_window12_384_22k.pth' # noqa +num_levels = 5 +model = dict( + num_feature_levels=num_levels, + backbone=dict( + _delete_=True, + type='SwinTransformer', + pretrain_img_size=384, + embed_dims=192, + depths=[2, 2, 18, 2], + num_heads=[6, 12, 24, 48], + window_size=12, + mlp_ratio=4, + qkv_bias=True, + qk_scale=None, + drop_rate=0., + attn_drop_rate=0., + drop_path_rate=0.2, + patch_norm=True, + out_indices=(0, 1, 2, 3), + # Please only add indices that would be used + # in FPN, otherwise some parameter will not be used + with_cp=True, + convert_weights=True, + init_cfg=dict(type='Pretrained', checkpoint=pretrained)), + neck=dict(in_channels=[192, 384, 768, 1536], num_outs=num_levels), + encoder=dict(layer_cfg=dict(self_attn_cfg=dict(num_levels=num_levels))), + decoder=dict(layer_cfg=dict(cross_attn_cfg=dict(num_levels=num_levels)))) diff --git a/configs/dino/dino-5scale_swin-l_8xb2-36e_coco.py b/configs/dino/dino-5scale_swin-l_8xb2-36e_coco.py new file mode 100644 index 00000000000..d55a38e61d4 --- /dev/null +++ b/configs/dino/dino-5scale_swin-l_8xb2-36e_coco.py @@ -0,0 +1,13 @@ +_base_ = './dino-5scale_swin-l_8xb2-12e_coco.py' +max_epochs = 36 +train_cfg = dict( + type='EpochBasedTrainLoop', max_epochs=max_epochs, val_interval=1) +param_scheduler = [ + dict( + type='MultiStepLR', + begin=0, + end=max_epochs, + by_epoch=True, + milestones=[27, 33], + gamma=0.1) +] diff --git a/configs/dino/metafile.yml b/configs/dino/metafile.yml index 5b68a41abf9..89dcb23e509 100644 --- a/configs/dino/metafile.yml +++ b/configs/dino/metafile.yml @@ -48,3 +48,27 @@ Models: Results: - Task: Object Detection Dataset: COCO + + - Name: dino-5scale_swin-l_8xb2-12e_coco.py + In Collection: DINO + Config: configs/dino/dino-5scale_swin-l_8xb2-12e_coco.py + Metadata: + Epochs: 12 + Results: + - Task: Object Detection + Dataset: COCO + Metrics: + box AP: 57.2 + Weights: https://download.openmmlab.com/mmdetection/v3.0/dino/dino-5scale_swin-l_8xb2-12e_coco/dino-5scale_swin-l_8xb2-12e_coco_20230228_072924-a654145f.pth + + - Name: dino-5scale_swin-l_8xb2-36e_coco.py + In Collection: DINO + Config: configs/dino/dino-5scale_swin-l_8xb2-36e_coco.py + Metadata: + Epochs: 36 + Results: + - Task: Object Detection + Dataset: COCO + Metrics: + box AP: 58.4 + Weights: https://github.com/RistoranteRist/mmlab-weights/releases/download/dino-swinl/dino-5scale_swin-l_8xb2-36e_coco-5486e051.pth diff --git a/configs/dyhead/atss_r50-caffe_fpn_dyhead_1x_coco.py b/configs/dyhead/atss_r50-caffe_fpn_dyhead_1x_coco.py index cbaf9a7c9b3..8716f1226cb 100644 --- a/configs/dyhead/atss_r50-caffe_fpn_dyhead_1x_coco.py +++ b/configs/dyhead/atss_r50-caffe_fpn_dyhead_1x_coco.py @@ -82,18 +82,14 @@ optim_wrapper = dict(optimizer=dict(lr=0.01)) train_pipeline = [ - dict( - type='LoadImageFromFile', - file_client_args={{_base_.file_client_args}}), + dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), dict(type='LoadAnnotations', with_bbox=True), dict(type='Resize', scale=(1333, 800), keep_ratio=True, backend='pillow'), dict(type='RandomFlip', prob=0.5), dict(type='PackDetInputs') ] test_pipeline = [ - dict( - type='LoadImageFromFile', - file_client_args={{_base_.file_client_args}}), + dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), dict(type='Resize', scale=(1333, 800), keep_ratio=True, backend='pillow'), dict(type='LoadAnnotations', with_bbox=True), dict( diff --git a/configs/dyhead/atss_swin-l-p4-w12_fpn_dyhead_ms-2x_coco.py b/configs/dyhead/atss_swin-l-p4-w12_fpn_dyhead_ms-2x_coco.py index ffc7f44a745..f537b9dc9b1 100644 --- a/configs/dyhead/atss_swin-l-p4-w12_fpn_dyhead_ms-2x_coco.py +++ b/configs/dyhead/atss_swin-l-p4-w12_fpn_dyhead_ms-2x_coco.py @@ -90,9 +90,7 @@ # dataset settings train_pipeline = [ - dict( - type='LoadImageFromFile', - file_client_args={{_base_.file_client_args}}), + dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), dict(type='LoadAnnotations', with_bbox=True), dict( type='RandomResize', @@ -103,9 +101,7 @@ dict(type='PackDetInputs') ] test_pipeline = [ - dict( - type='LoadImageFromFile', - file_client_args={{_base_.file_client_args}}), + dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), dict(type='Resize', scale=(2000, 1200), keep_ratio=True, backend='pillow'), dict(type='LoadAnnotations', with_bbox=True), dict( @@ -124,7 +120,8 @@ ann_file='annotations/instances_train2017.json', data_prefix=dict(img='train2017/'), filter_cfg=dict(filter_empty_gt=True, min_size=32), - pipeline=train_pipeline))) + pipeline=train_pipeline, + backend_args={{_base_.backend_args}}))) val_dataloader = dict(dataset=dict(pipeline=test_pipeline)) test_dataloader = val_dataloader diff --git a/configs/efficientnet/retinanet_effb3_fpn_8xb4-crop896-1x_coco.py b/configs/efficientnet/retinanet_effb3_fpn_8xb4-crop896-1x_coco.py index 039ed5fdc05..2d0d9cefd0b 100644 --- a/configs/efficientnet/retinanet_effb3_fpn_8xb4-crop896-1x_coco.py +++ b/configs/efficientnet/retinanet_effb3_fpn_8xb4-crop896-1x_coco.py @@ -41,9 +41,7 @@ # dataset settings train_pipeline = [ - dict( - type='LoadImageFromFile', - file_client_args={{_base_.file_client_args}}), + dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), dict(type='LoadAnnotations', with_bbox=True), dict( type='RandomResize', @@ -55,9 +53,7 @@ dict(type='PackDetInputs') ] test_pipeline = [ - dict( - type='LoadImageFromFile', - file_client_args={{_base_.file_client_args}}), + dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), dict(type='Resize', scale=image_size, keep_ratio=True), dict(type='LoadAnnotations', with_bbox=True), dict( diff --git a/configs/fast_rcnn/README.md b/configs/fast_rcnn/README.md index 91342474482..0bdc9359c7c 100644 --- a/configs/fast_rcnn/README.md +++ b/configs/fast_rcnn/README.md @@ -59,7 +59,7 @@ The `pred_instance` is an `InstanceData` containing the sorted boxes and scores 8 ``` - Users can refer to [test tutorial](https://mmdetection.readthedocs.io/en/3.x/user_guides/test.html) for more details. + Users can refer to [test tutorial](https://mmdetection.readthedocs.io/en/latest/user_guides/test.html) for more details. - Then, modify the path of `proposal_file` in the dataset and using `ProposalBroadcaster` to process both ground truth bounding boxes and region proposals in pipelines. An example of Fast R-CNN important setting can be seen as below: @@ -68,7 +68,7 @@ The `pred_instance` is an `InstanceData` containing the sorted boxes and scores train_pipeline = [ dict( type='LoadImageFromFile', - file_client_args={{_base_.file_client_args}}), + backend_args={{_base_.backend_args}}), dict(type='LoadProposals', num_max_proposals=2000), dict(type='LoadAnnotations', with_bbox=True), dict( @@ -82,7 +82,7 @@ The `pred_instance` is an `InstanceData` containing the sorted boxes and scores test_pipeline = [ dict( type='LoadImageFromFile', - file_client_args={{_base_.file_client_args}}), + backend_args={{_base_.backend_args}}), dict(type='LoadProposals', num_max_proposals=None), dict( type='ProposalBroadcaster', diff --git a/configs/fast_rcnn/fast-rcnn_r50_fpn_1x_coco.py b/configs/fast_rcnn/fast-rcnn_r50_fpn_1x_coco.py index 5008292330f..daefe2d2d28 100644 --- a/configs/fast_rcnn/fast-rcnn_r50_fpn_1x_coco.py +++ b/configs/fast_rcnn/fast-rcnn_r50_fpn_1x_coco.py @@ -4,9 +4,7 @@ '../_base_/schedules/schedule_1x.py', '../_base_/default_runtime.py' ] train_pipeline = [ - dict( - type='LoadImageFromFile', - file_client_args={{_base_.file_client_args}}), + dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), dict(type='LoadProposals', num_max_proposals=2000), dict(type='LoadAnnotations', with_bbox=True), dict( @@ -18,9 +16,7 @@ dict(type='PackDetInputs') ] test_pipeline = [ - dict( - type='LoadImageFromFile', - file_client_args={{_base_.file_client_args}}), + dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), dict(type='LoadProposals', num_max_proposals=None), dict( type='ProposalBroadcaster', diff --git a/configs/faster_rcnn/faster-rcnn_r101-caffe_fpn_ms-3x_coco.py b/configs/faster_rcnn/faster-rcnn_r101-caffe_fpn_ms-3x_coco.py index 72e738b153c..1cdb4d4973e 100644 --- a/configs/faster_rcnn/faster-rcnn_r101-caffe_fpn_ms-3x_coco.py +++ b/configs/faster_rcnn/faster-rcnn_r101-caffe_fpn_ms-3x_coco.py @@ -9,41 +9,3 @@ init_cfg=dict( type='Pretrained', checkpoint='open-mmlab://detectron2/resnet101_caffe'))) - -# use caffe img_norm -img_norm_cfg = dict( - mean=[103.530, 116.280, 123.675], std=[1.0, 1.0, 1.0], to_rgb=False) -train_pipeline = [ - dict(type='LoadImageFromFile'), - dict(type='LoadAnnotations', with_bbox=True), - dict( - type='Resize', - img_scale=[(1333, 640), (1333, 800)], - multiscale_mode='range', - keep_ratio=True), - dict(type='RandomFlip', flip_ratio=0.5), - dict(type='Normalize', **img_norm_cfg), - dict(type='Pad', size_divisor=32), - dict(type='DefaultFormatBundle'), - dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels']), -] -test_pipeline = [ - dict(type='LoadImageFromFile'), - dict( - type='MultiScaleFlipAug', - img_scale=(1333, 800), - flip=False, - transforms=[ - dict(type='Resize', keep_ratio=True), - dict(type='RandomFlip'), - dict(type='Normalize', **img_norm_cfg), - dict(type='Pad', size_divisor=32), - dict(type='ImageToTensor', keys=['img']), - dict(type='Collect', keys=['img']), - ]) -] - -data = dict( - train=dict(dataset=dict(pipeline=train_pipeline)), - val=dict(pipeline=test_pipeline), - test=dict(pipeline=test_pipeline)) diff --git a/configs/faster_rcnn/faster-rcnn_r50-caffe-c4_ms-1x_coco.py b/configs/faster_rcnn/faster-rcnn_r50-caffe-c4_ms-1x_coco.py index b8fb5efd002..d4949d04ac2 100644 --- a/configs/faster_rcnn/faster-rcnn_r50-caffe-c4_ms-1x_coco.py +++ b/configs/faster_rcnn/faster-rcnn_r50-caffe-c4_ms-1x_coco.py @@ -1,38 +1,14 @@ _base_ = './faster-rcnn_r50-caffe_c4-1x_coco.py' -# use caffe img_norm -img_norm_cfg = dict( - mean=[103.530, 116.280, 123.675], std=[1.0, 1.0, 1.0], to_rgb=False) + train_pipeline = [ - dict(type='LoadImageFromFile'), + dict(type='LoadImageFromFile', backend_args=_base_.backend_args), dict(type='LoadAnnotations', with_bbox=True), dict( - type='Resize', - img_scale=[(1333, 640), (1333, 672), (1333, 704), (1333, 736), - (1333, 768), (1333, 800)], - multiscale_mode='value', + type='RandomChoiceResize', + scale=[(1333, 640), (1333, 672), (1333, 704), (1333, 736), (1333, 768), + (1333, 800)], keep_ratio=True), - dict(type='RandomFlip', flip_ratio=0.5), - dict(type='Normalize', **img_norm_cfg), - dict(type='Pad', size_divisor=32), - dict(type='DefaultFormatBundle'), - dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels']), + dict(type='RandomFlip', prob=0.5), + dict(type='PackDetInputs') ] -test_pipeline = [ - dict(type='LoadImageFromFile'), - dict( - type='MultiScaleFlipAug', - img_scale=(1333, 800), - flip=False, - transforms=[ - dict(type='Resize', keep_ratio=True), - dict(type='RandomFlip'), - dict(type='Normalize', **img_norm_cfg), - dict(type='Pad', size_divisor=32), - dict(type='ImageToTensor', keys=['img']), - dict(type='Collect', keys=['img']), - ]) -] -data = dict( - train=dict(pipeline=train_pipeline), - val=dict(pipeline=test_pipeline), - test=dict(pipeline=test_pipeline)) +_base_.train_dataloader.dataset.pipeline = train_pipeline diff --git a/configs/faster_rcnn/faster-rcnn_r50-caffe-dc5_1x_coco.py b/configs/faster_rcnn/faster-rcnn_r50-caffe-dc5_1x_coco.py index d24b5e08bc0..8952a5c9c6c 100644 --- a/configs/faster_rcnn/faster-rcnn_r50-caffe-dc5_1x_coco.py +++ b/configs/faster_rcnn/faster-rcnn_r50-caffe-dc5_1x_coco.py @@ -3,35 +3,3 @@ '../_base_/datasets/coco_detection.py', '../_base_/schedules/schedule_1x.py', '../_base_/default_runtime.py' ] -# use caffe img_norm -img_norm_cfg = dict( - mean=[103.530, 116.280, 123.675], std=[1.0, 1.0, 1.0], to_rgb=False) -train_pipeline = [ - dict(type='LoadImageFromFile'), - dict(type='LoadAnnotations', with_bbox=True), - dict(type='Resize', img_scale=(1333, 800), keep_ratio=True), - dict(type='RandomFlip', flip_ratio=0.5), - dict(type='Normalize', **img_norm_cfg), - dict(type='Pad', size_divisor=32), - dict(type='DefaultFormatBundle'), - dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels']), -] -test_pipeline = [ - dict(type='LoadImageFromFile'), - dict( - type='MultiScaleFlipAug', - img_scale=(1333, 800), - flip=False, - transforms=[ - dict(type='Resize', keep_ratio=True), - dict(type='RandomFlip'), - dict(type='Normalize', **img_norm_cfg), - dict(type='Pad', size_divisor=32), - dict(type='ImageToTensor', keys=['img']), - dict(type='Collect', keys=['img']), - ]) -] -data = dict( - train=dict(pipeline=train_pipeline), - val=dict(pipeline=test_pipeline), - test=dict(pipeline=test_pipeline)) diff --git a/configs/faster_rcnn/faster-rcnn_r50-caffe-dc5_ms-1x_coco.py b/configs/faster_rcnn/faster-rcnn_r50-caffe-dc5_ms-1x_coco.py index d3eb21ecfdf..99a6fcc7d7a 100644 --- a/configs/faster_rcnn/faster-rcnn_r50-caffe-dc5_ms-1x_coco.py +++ b/configs/faster_rcnn/faster-rcnn_r50-caffe-dc5_ms-1x_coco.py @@ -1,42 +1,14 @@ -_base_ = [ - '../_base_/models/faster-rcnn_r50-caffe-dc5.py', - '../_base_/datasets/coco_detection.py', - '../_base_/schedules/schedule_1x.py', '../_base_/default_runtime.py' -] -# use caffe img_norm -img_norm_cfg = dict( - mean=[103.530, 116.280, 123.675], std=[1.0, 1.0, 1.0], to_rgb=False) +_base_ = 'faster-rcnn_r50-caffe-dc5_1x_coco.py' + train_pipeline = [ - dict(type='LoadImageFromFile'), + dict(type='LoadImageFromFile', backend_args=_base_.backend_args), dict(type='LoadAnnotations', with_bbox=True), dict( - type='Resize', - img_scale=[(1333, 640), (1333, 672), (1333, 704), (1333, 736), - (1333, 768), (1333, 800)], - multiscale_mode='value', + type='RandomChoiceResize', + scale=[(1333, 640), (1333, 672), (1333, 704), (1333, 736), (1333, 768), + (1333, 800)], keep_ratio=True), - dict(type='RandomFlip', flip_ratio=0.5), - dict(type='Normalize', **img_norm_cfg), - dict(type='Pad', size_divisor=32), - dict(type='DefaultFormatBundle'), - dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels']), -] -test_pipeline = [ - dict(type='LoadImageFromFile'), - dict( - type='MultiScaleFlipAug', - img_scale=(1333, 800), - flip=False, - transforms=[ - dict(type='Resize', keep_ratio=True), - dict(type='RandomFlip'), - dict(type='Normalize', **img_norm_cfg), - dict(type='Pad', size_divisor=32), - dict(type='ImageToTensor', keys=['img']), - dict(type='Collect', keys=['img']), - ]) + dict(type='RandomFlip', prob=0.5), + dict(type='PackDetInputs') ] -data = dict( - train=dict(pipeline=train_pipeline), - val=dict(pipeline=test_pipeline), - test=dict(pipeline=test_pipeline)) +_base_.train_dataloader.dataset.pipeline = train_pipeline diff --git a/configs/faster_rcnn/faster-rcnn_r50-caffe-dc5_ms-3x_coco.py b/configs/faster_rcnn/faster-rcnn_r50-caffe-dc5_ms-3x_coco.py index 72404a689da..27063468a70 100644 --- a/configs/faster_rcnn/faster-rcnn_r50-caffe-dc5_ms-3x_coco.py +++ b/configs/faster_rcnn/faster-rcnn_r50-caffe-dc5_ms-3x_coco.py @@ -1,4 +1,18 @@ _base_ = './faster-rcnn_r50-caffe-dc5_ms-1x_coco.py' -# learning policy -lr_config = dict(step=[28, 34]) -runner = dict(type='EpochBasedRunner', max_epochs=36) + +# MMEngine support the following two ways, users can choose +# according to convenience +# param_scheduler = [ +# dict( +# type='LinearLR', start_factor=0.001, by_epoch=False, begin=0, end=500), # noqa +# dict( +# type='MultiStepLR', +# begin=0, +# end=12, +# by_epoch=True, +# milestones=[28, 34], +# gamma=0.1) +# ] +_base_.param_scheduler[1].milestones = [28, 34] + +train_cfg = dict(max_epochs=36) diff --git a/configs/faster_rcnn/faster-rcnn_r50-caffe_c4-1x_coco.py b/configs/faster_rcnn/faster-rcnn_r50-caffe_c4-1x_coco.py index d68c7a77460..0888fc01790 100644 --- a/configs/faster_rcnn/faster-rcnn_r50-caffe_c4-1x_coco.py +++ b/configs/faster_rcnn/faster-rcnn_r50-caffe_c4-1x_coco.py @@ -3,37 +3,3 @@ '../_base_/datasets/coco_detection.py', '../_base_/schedules/schedule_1x.py', '../_base_/default_runtime.py' ] -# use caffe img_norm -img_norm_cfg = dict( - mean=[103.530, 116.280, 123.675], std=[1.0, 1.0, 1.0], to_rgb=False) -train_pipeline = [ - dict(type='LoadImageFromFile'), - dict(type='LoadAnnotations', with_bbox=True), - dict(type='Resize', img_scale=(1333, 800), keep_ratio=True), - dict(type='RandomFlip', flip_ratio=0.5), - dict(type='Normalize', **img_norm_cfg), - dict(type='Pad', size_divisor=32), - dict(type='DefaultFormatBundle'), - dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels']), -] -test_pipeline = [ - dict(type='LoadImageFromFile'), - dict( - type='MultiScaleFlipAug', - img_scale=(1333, 800), - flip=False, - transforms=[ - dict(type='Resize', keep_ratio=True), - dict(type='RandomFlip'), - dict(type='Normalize', **img_norm_cfg), - dict(type='Pad', size_divisor=32), - dict(type='ImageToTensor', keys=['img']), - dict(type='Collect', keys=['img']), - ]) -] -data = dict( - train=dict(pipeline=train_pipeline), - val=dict(pipeline=test_pipeline), - test=dict(pipeline=test_pipeline)) -# optimizer -optimizer = dict(type='SGD', lr=0.02, momentum=0.9, weight_decay=0.0001) diff --git a/configs/faster_rcnn/faster-rcnn_r50-caffe_fpn_90k_coco.py b/configs/faster_rcnn/faster-rcnn_r50-caffe_fpn_90k_coco.py index f15b203831b..27f49355f3b 100644 --- a/configs/faster_rcnn/faster-rcnn_r50-caffe_fpn_90k_coco.py +++ b/configs/faster_rcnn/faster-rcnn_r50-caffe_fpn_90k_coco.py @@ -1,15 +1,22 @@ _base_ = 'faster-rcnn_r50-caffe_fpn_1x_coco.py' +max_iter = 90000 -# learning policy -lr_config = dict( - policy='step', - warmup='linear', - warmup_iters=500, - warmup_ratio=0.001, - step=[60000, 80000]) +param_scheduler = [ + dict( + type='LinearLR', start_factor=0.001, by_epoch=False, begin=0, end=500), + dict( + type='MultiStepLR', + begin=0, + end=max_iter, + by_epoch=False, + milestones=[60000, 80000], + gamma=0.1) +] -# Runner type -runner = dict(_delete_=True, type='IterBasedRunner', max_iters=90000) - -checkpoint_config = dict(interval=10000) -evaluation = dict(interval=10000, metric='bbox') +train_cfg = dict( + _delete_=True, + type='IterBasedTrainLoop', + max_iters=max_iter, + val_interval=10000) +default_hooks = dict(checkpoint=dict(by_epoch=False, interval=10000)) +log_processor = dict(by_epoch=False) diff --git a/configs/faster_rcnn/faster-rcnn_r50-caffe_fpn_ms-1x_coco.py b/configs/faster_rcnn/faster-rcnn_r50-caffe_fpn_ms-1x_coco.py index 8158f4f80c1..7daa03d90a5 100644 --- a/configs/faster_rcnn/faster-rcnn_r50-caffe_fpn_ms-1x_coco.py +++ b/configs/faster_rcnn/faster-rcnn_r50-caffe_fpn_ms-1x_coco.py @@ -13,40 +13,19 @@ init_cfg=dict( type='Pretrained', checkpoint='open-mmlab://detectron2/resnet50_caffe'))) -# use caffe img_norm -img_norm_cfg = dict( - mean=[103.530, 116.280, 123.675], std=[1.0, 1.0, 1.0], to_rgb=False) + train_pipeline = [ - dict(type='LoadImageFromFile'), + dict(type='LoadImageFromFile', backend_args=_base_.backend_args), dict(type='LoadAnnotations', with_bbox=True), dict( - type='Resize', - img_scale=[(1333, 640), (1333, 672), (1333, 704), (1333, 736), - (1333, 768), (1333, 800)], - multiscale_mode='value', + type='RandomChoiceResize', + scale=[(1333, 640), (1333, 672), (1333, 704), (1333, 736), (1333, 768), + (1333, 800)], keep_ratio=True), - dict(type='RandomFlip', flip_ratio=0.5), - dict(type='Normalize', **img_norm_cfg), - dict(type='Pad', size_divisor=32), - dict(type='DefaultFormatBundle'), - dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels']), + dict(type='RandomFlip', prob=0.5), + dict(type='PackDetInputs') ] -test_pipeline = [ - dict(type='LoadImageFromFile'), - dict( - type='MultiScaleFlipAug', - img_scale=(1333, 800), - flip=False, - transforms=[ - dict(type='Resize', keep_ratio=True), - dict(type='RandomFlip'), - dict(type='Normalize', **img_norm_cfg), - dict(type='Pad', size_divisor=32), - dict(type='ImageToTensor', keys=['img']), - dict(type='Collect', keys=['img']), - ]) -] -data = dict( - train=dict(pipeline=train_pipeline), - val=dict(pipeline=test_pipeline), - test=dict(pipeline=test_pipeline)) +# MMEngine support the following two ways, users can choose +# according to convenience +# train_dataloader = dict(dataset=dict(pipeline=train_pipeline)) +_base_.train_dataloader.dataset.pipeline = train_pipeline diff --git a/configs/faster_rcnn/faster-rcnn_r50-caffe_fpn_ms-2x_coco.py b/configs/faster_rcnn/faster-rcnn_r50-caffe_fpn_ms-2x_coco.py index 73ae4f7053e..44d320ea01b 100644 --- a/configs/faster_rcnn/faster-rcnn_r50-caffe_fpn_ms-2x_coco.py +++ b/configs/faster_rcnn/faster-rcnn_r50-caffe_fpn_ms-2x_coco.py @@ -1,4 +1,18 @@ _base_ = './faster-rcnn_r50-caffe_fpn_ms-1x_coco.py' -# learning policy -lr_config = dict(step=[16, 23]) -runner = dict(type='EpochBasedRunner', max_epochs=24) + +# MMEngine support the following two ways, users can choose +# according to convenience +# param_scheduler = [ +# dict( +# type='LinearLR', start_factor=0.001, by_epoch=False, begin=0, end=500), # noqa +# dict( +# type='MultiStepLR', +# begin=0, +# end=12, +# by_epoch=True, +# milestones=[16, 23], +# gamma=0.1) +# ] +_base_.param_scheduler[1].milestones = [16, 23] + +train_cfg = dict(max_epochs=24) diff --git a/configs/faster_rcnn/faster-rcnn_r50-caffe_fpn_ms-3x_coco.py b/configs/faster_rcnn/faster-rcnn_r50-caffe_fpn_ms-3x_coco.py index c65e1a79324..365f6439241 100644 --- a/configs/faster_rcnn/faster-rcnn_r50-caffe_fpn_ms-3x_coco.py +++ b/configs/faster_rcnn/faster-rcnn_r50-caffe_fpn_ms-3x_coco.py @@ -13,41 +13,3 @@ init_cfg=dict( type='Pretrained', checkpoint='open-mmlab://detectron2/resnet50_caffe'))) - -# use caffe img_norm -img_norm_cfg = dict( - mean=[103.530, 116.280, 123.675], std=[1.0, 1.0, 1.0], to_rgb=False) -train_pipeline = [ - dict(type='LoadImageFromFile'), - dict(type='LoadAnnotations', with_bbox=True), - dict( - type='Resize', - img_scale=[(1333, 640), (1333, 800)], - multiscale_mode='range', - keep_ratio=True), - dict(type='RandomFlip', flip_ratio=0.5), - dict(type='Normalize', **img_norm_cfg), - dict(type='Pad', size_divisor=32), - dict(type='DefaultFormatBundle'), - dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels']), -] -test_pipeline = [ - dict(type='LoadImageFromFile'), - dict( - type='MultiScaleFlipAug', - img_scale=(1333, 800), - flip=False, - transforms=[ - dict(type='Resize', keep_ratio=True), - dict(type='RandomFlip'), - dict(type='Normalize', **img_norm_cfg), - dict(type='Pad', size_divisor=32), - dict(type='ImageToTensor', keys=['img']), - dict(type='Collect', keys=['img']), - ]) -] - -data = dict( - train=dict(dataset=dict(pipeline=train_pipeline)), - val=dict(pipeline=test_pipeline), - test=dict(pipeline=test_pipeline)) diff --git a/configs/faster_rcnn/faster-rcnn_r50-caffe_fpn_ms-90k_coco.py b/configs/faster_rcnn/faster-rcnn_r50-caffe_fpn_ms-90k_coco.py index 3c0106a59b2..6b9b3eb0e79 100644 --- a/configs/faster_rcnn/faster-rcnn_r50-caffe_fpn_ms-90k_coco.py +++ b/configs/faster_rcnn/faster-rcnn_r50-caffe_fpn_ms-90k_coco.py @@ -1,15 +1,23 @@ _base_ = 'faster-rcnn_r50-caffe_fpn_ms-1x_coco.py' -# learning policy -lr_config = dict( - policy='step', - warmup='linear', - warmup_iters=500, - warmup_ratio=0.001, - step=[60000, 80000]) +max_iter = 90000 -# Runner type -runner = dict(_delete_=True, type='IterBasedRunner', max_iters=90000) +param_scheduler = [ + dict( + type='LinearLR', start_factor=0.001, by_epoch=False, begin=0, end=500), + dict( + type='MultiStepLR', + begin=0, + end=max_iter, + by_epoch=False, + milestones=[60000, 80000], + gamma=0.1) +] -checkpoint_config = dict(interval=10000) -evaluation = dict(interval=10000, metric='bbox') +train_cfg = dict( + _delete_=True, + type='IterBasedTrainLoop', + max_iters=max_iter, + val_interval=10000) +default_hooks = dict(checkpoint=dict(by_epoch=False, interval=10000)) +log_processor = dict(by_epoch=False) diff --git a/configs/faster_rcnn/faster-rcnn_r50-tnr-pre_fpn_1x_coco.py b/configs/faster_rcnn/faster-rcnn_r50-tnr-pre_fpn_1x_coco.py index 7d952f2825d..7b3e5dedbe8 100644 --- a/configs/faster_rcnn/faster-rcnn_r50-tnr-pre_fpn_1x_coco.py +++ b/configs/faster_rcnn/faster-rcnn_r50-tnr-pre_fpn_1x_coco.py @@ -9,9 +9,6 @@ backbone=dict(init_cfg=dict(type='Pretrained', checkpoint=checkpoint))) # `lr` and `weight_decay` have been searched to be optimal. -optimizer = dict( - _delete_=True, - type='AdamW', - lr=0.0001, - weight_decay=0.1, +optim_wrapper = dict( + optimizer=dict(_delete_=True, type='AdamW', lr=0.0001, weight_decay=0.1), paramwise_cfg=dict(norm_decay_mult=0., bypass_duplicate=True)) diff --git a/configs/faster_rcnn/faster-rcnn_r50_fpn_amp-1x_coco.py b/configs/faster_rcnn/faster-rcnn_r50_fpn_amp-1x_coco.py index 4cecb8738b0..f765deaef1d 100644 --- a/configs/faster_rcnn/faster-rcnn_r50_fpn_amp-1x_coco.py +++ b/configs/faster_rcnn/faster-rcnn_r50_fpn_amp-1x_coco.py @@ -1,3 +1,6 @@ _base_ = './faster-rcnn_r50_fpn_1x_coco.py' -# fp16 settings -fp16 = dict(loss_scale=512.) + +# MMEngine support the following two ways, users can choose +# according to convenience +# optim_wrapper = dict(type='AmpOptimWrapper') +_base_.optim_wrapper.type = 'AmpOptimWrapper' diff --git a/configs/faster_rcnn/faster-rcnn_x101-32x8d_fpn_ms-3x_coco.py b/configs/faster_rcnn/faster-rcnn_x101-32x8d_fpn_ms-3x_coco.py index 2ca1a16116b..28d6290be7a 100644 --- a/configs/faster_rcnn/faster-rcnn_x101-32x8d_fpn_ms-3x_coco.py +++ b/configs/faster_rcnn/faster-rcnn_x101-32x8d_fpn_ms-3x_coco.py @@ -1,5 +1,13 @@ _base_ = ['../common/ms_3x_coco.py', '../_base_/models/faster-rcnn_r50_fpn.py'] model = dict( + # ResNeXt-101-32x8d model trained with Caffe2 at FB, + # so the mean and std need to be changed. + data_preprocessor=dict( + type='DetDataPreprocessor', + mean=[103.530, 116.280, 123.675], + std=[57.375, 57.120, 58.395], + bgr_to_rgb=False, + pad_size_divisor=32), backbone=dict( type='ResNeXt', depth=101, @@ -13,48 +21,3 @@ init_cfg=dict( type='Pretrained', checkpoint='open-mmlab://detectron2/resnext101_32x8d'))) - -# ResNeXt-101-32x8d model trained with Caffe2 at FB, -# so the mean and std need to be changed. -img_norm_cfg = dict( - mean=[103.530, 116.280, 123.675], - std=[57.375, 57.120, 58.395], - to_rgb=False) - -# In mstrain 3x config, img_scale=[(1333, 640), (1333, 800)], -# multiscale_mode='range' -train_pipeline = [ - dict(type='LoadImageFromFile'), - dict(type='LoadAnnotations', with_bbox=True), - dict( - type='Resize', - img_scale=[(1333, 640), (1333, 800)], - multiscale_mode='range', - keep_ratio=True), - dict(type='RandomFlip', flip_ratio=0.5), - dict(type='Normalize', **img_norm_cfg), - dict(type='Pad', size_divisor=32), - dict(type='DefaultFormatBundle'), - dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels']), -] -test_pipeline = [ - dict(type='LoadImageFromFile'), - dict( - type='MultiScaleFlipAug', - img_scale=(1333, 800), - flip=False, - transforms=[ - dict(type='Resize', keep_ratio=True), - dict(type='RandomFlip'), - dict(type='Normalize', **img_norm_cfg), - dict(type='Pad', size_divisor=32), - dict(type='ImageToTensor', keys=['img']), - dict(type='Collect', keys=['img']), - ]) -] - -# Use RepeatDataset to speed up training -data = dict( - train=dict(dataset=dict(pipeline=train_pipeline)), - val=dict(pipeline=test_pipeline), - test=dict(pipeline=test_pipeline)) diff --git a/configs/fcos/fcos_r101-caffe_fpn_gn-head_ms-640-800-2x_coco.py b/configs/fcos/fcos_r101-caffe_fpn_gn-head_ms-640-800-2x_coco.py index 0b8039c1e71..859b45c94b2 100644 --- a/configs/fcos/fcos_r101-caffe_fpn_gn-head_ms-640-800-2x_coco.py +++ b/configs/fcos/fcos_r101-caffe_fpn_gn-head_ms-640-800-2x_coco.py @@ -10,9 +10,7 @@ # dataset settings train_pipeline = [ - dict( - type='LoadImageFromFile', - file_client_args={{_base_.file_client_args}}), + dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), dict(type='LoadAnnotations', with_bbox=True), dict( type='RandomChoiceResize', diff --git a/configs/fcos/fcos_r50-caffe_fpn_gn-head_ms-640-800-2x_coco.py b/configs/fcos/fcos_r50-caffe_fpn_gn-head_ms-640-800-2x_coco.py index 9888dd8f25f..12e9160d812 100644 --- a/configs/fcos/fcos_r50-caffe_fpn_gn-head_ms-640-800-2x_coco.py +++ b/configs/fcos/fcos_r50-caffe_fpn_gn-head_ms-640-800-2x_coco.py @@ -2,9 +2,7 @@ # dataset settings train_pipeline = [ - dict( - type='LoadImageFromFile', - file_client_args={{_base_.file_client_args}}), + dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), dict(type='LoadAnnotations', with_bbox=True), dict( type='RandomChoiceResize', diff --git a/configs/fcos/fcos_x101-64x4d_fpn_gn-head_ms-640-800-2x_coco.py b/configs/fcos/fcos_x101-64x4d_fpn_gn-head_ms-640-800-2x_coco.py index 3f58665dce5..aae1fceea58 100644 --- a/configs/fcos/fcos_x101-64x4d_fpn_gn-head_ms-640-800-2x_coco.py +++ b/configs/fcos/fcos_x101-64x4d_fpn_gn-head_ms-640-800-2x_coco.py @@ -24,9 +24,7 @@ # dataset settings train_pipeline = [ - dict( - type='LoadImageFromFile', - file_client_args={{_base_.file_client_args}}), + dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), dict(type='LoadAnnotations', with_bbox=True), dict( type='RandomChoiceResize', diff --git a/configs/foveabox/fovea_r101_fpn_gn-head-align_ms-640-800-4xb4-2x_coco.py b/configs/foveabox/fovea_r101_fpn_gn-head-align_ms-640-800-4xb4-2x_coco.py index 1ab77bf7458..e1852d581fc 100644 --- a/configs/foveabox/fovea_r101_fpn_gn-head-align_ms-640-800-4xb4-2x_coco.py +++ b/configs/foveabox/fovea_r101_fpn_gn-head-align_ms-640-800-4xb4-2x_coco.py @@ -8,9 +8,7 @@ with_deform=True, norm_cfg=dict(type='GN', num_groups=32, requires_grad=True))) train_pipeline = [ - dict( - type='LoadImageFromFile', - file_client_args={{_base_.file_client_args}}), + dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), dict(type='LoadAnnotations', with_bbox=True), dict( type='RandomChoiceResize', diff --git a/configs/foveabox/fovea_r50_fpn_gn-head-align_ms-640-800-4xb4-2x_coco.py b/configs/foveabox/fovea_r50_fpn_gn-head-align_ms-640-800-4xb4-2x_coco.py index be240259f3a..5690bcae08c 100644 --- a/configs/foveabox/fovea_r50_fpn_gn-head-align_ms-640-800-4xb4-2x_coco.py +++ b/configs/foveabox/fovea_r50_fpn_gn-head-align_ms-640-800-4xb4-2x_coco.py @@ -4,9 +4,7 @@ with_deform=True, norm_cfg=dict(type='GN', num_groups=32, requires_grad=True))) train_pipeline = [ - dict( - type='LoadImageFromFile', - file_client_args={{_base_.file_client_args}}), + dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), dict(type='LoadAnnotations', with_bbox=True), dict( type='RandomChoiceResize', diff --git a/configs/fpg/faster-rcnn_r50_fpn_crop640-50e_coco.py b/configs/fpg/faster-rcnn_r50_fpn_crop640-50e_coco.py index 019105dbfac..46211de03f3 100644 --- a/configs/fpg/faster-rcnn_r50_fpn_crop640-50e_coco.py +++ b/configs/fpg/faster-rcnn_r50_fpn_crop640-50e_coco.py @@ -16,9 +16,7 @@ data_root = 'data/coco/' train_pipeline = [ - dict( - type='LoadImageFromFile', - file_client_args={{_base_.file_client_args}}), + dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), dict(type='LoadAnnotations', with_bbox=True), dict( type='RandomResize', @@ -35,9 +33,7 @@ ] test_pipeline = [ - dict( - type='LoadImageFromFile', - file_client_args={{_base_.file_client_args}}), + dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), dict(type='Resize', scale=image_size, keep_ratio=True), dict( type='PackDetInputs', diff --git a/configs/fpg/mask-rcnn_r50_fpn_crop640-50e_coco.py b/configs/fpg/mask-rcnn_r50_fpn_crop640-50e_coco.py index baaf9a4b1e5..08ca5b6ffd8 100644 --- a/configs/fpg/mask-rcnn_r50_fpn_crop640-50e_coco.py +++ b/configs/fpg/mask-rcnn_r50_fpn_crop640-50e_coco.py @@ -22,9 +22,7 @@ data_root = 'data/coco/' train_pipeline = [ - dict( - type='LoadImageFromFile', - file_client_args={{_base_.file_client_args}}), + dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), dict(type='LoadAnnotations', with_bbox=True, with_mask=True), dict( type='RandomResize', @@ -41,9 +39,7 @@ ] test_pipeline = [ - dict( - type='LoadImageFromFile', - file_client_args={{_base_.file_client_args}}), + dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), dict(type='Resize', scale=image_size, keep_ratio=True), dict( type='PackDetInputs', diff --git a/configs/fsaf/metafile.yml b/configs/fsaf/metafile.yml index 2d524b6aea7..daaad0d3a86 100644 --- a/configs/fsaf/metafile.yml +++ b/configs/fsaf/metafile.yml @@ -56,7 +56,7 @@ Models: - Task: Object Detection Dataset: COCO Metrics: - box AP: 39.3 (37.9) + box AP: 39.3 Weights: https://download.openmmlab.com/mmdetection/v2.0/fsaf/fsaf_r101_fpn_1x_coco/fsaf_r101_fpn_1x_coco-9e71098f.pth - Name: fsaf_x101-64x4d_fpn_1x_coco @@ -76,5 +76,5 @@ Models: - Task: Object Detection Dataset: COCO Metrics: - box AP: 42.4 (41.0) + box AP: 42.4 Weights: https://download.openmmlab.com/mmdetection/v2.0/fsaf/fsaf_x101_64x4d_fpn_1x_coco/fsaf_x101_64x4d_fpn_1x_coco-e3f6e6fd.pth diff --git a/configs/gfl/gfl_r50_fpn_ms-2x_coco.py b/configs/gfl/gfl_r50_fpn_ms-2x_coco.py index cb1137e01df..22770eb1019 100644 --- a/configs/gfl/gfl_r50_fpn_ms-2x_coco.py +++ b/configs/gfl/gfl_r50_fpn_ms-2x_coco.py @@ -17,9 +17,7 @@ # multi-scale training train_pipeline = [ - dict( - type='LoadImageFromFile', - file_client_args={{_base_.file_client_args}}), + dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), dict(type='LoadAnnotations', with_bbox=True), dict( type='RandomResize', scale=[(1333, 480), (1333, 800)], diff --git a/configs/guided_anchoring/ga-retinanet_r101-caffe_fpn_ms-2x.py b/configs/guided_anchoring/ga-retinanet_r101-caffe_fpn_ms-2x.py index 459a1900241..012e89b8338 100644 --- a/configs/guided_anchoring/ga-retinanet_r101-caffe_fpn_ms-2x.py +++ b/configs/guided_anchoring/ga-retinanet_r101-caffe_fpn_ms-2x.py @@ -1,9 +1,7 @@ _base_ = './ga-retinanet_r101-caffe_fpn_1x_coco.py' train_pipeline = [ - dict( - type='LoadImageFromFile', - file_client_args={{_base_.file_client_args}}), + dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), dict(type='LoadAnnotations', with_bbox=True), dict( type='RandomResize', scale=[(1333, 480), (1333, 960)], diff --git a/configs/hrnet/fcos_hrnetv2p-w32-gn-head_ms-640-800-4xb4-2x_coco.py b/configs/hrnet/fcos_hrnetv2p-w32-gn-head_ms-640-800-4xb4-2x_coco.py index 3c107c8f1b7..4c977bf31ed 100644 --- a/configs/hrnet/fcos_hrnetv2p-w32-gn-head_ms-640-800-4xb4-2x_coco.py +++ b/configs/hrnet/fcos_hrnetv2p-w32-gn-head_ms-640-800-4xb4-2x_coco.py @@ -7,9 +7,7 @@ bgr_to_rgb=False)) train_pipeline = [ - dict( - type='LoadImageFromFile', - file_client_args={{_base_.file_client_args}}), + dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), dict(type='LoadAnnotations', with_bbox=True), dict( type='RandomChoiceResize', diff --git a/configs/htc/htc_r50_fpn_1x_coco.py b/configs/htc/htc_r50_fpn_1x_coco.py index 03ddb61ab1d..3573f1f6980 100644 --- a/configs/htc/htc_r50_fpn_1x_coco.py +++ b/configs/htc/htc_r50_fpn_1x_coco.py @@ -20,9 +20,7 @@ type='CrossEntropyLoss', ignore_index=255, loss_weight=0.2)))) train_pipeline = [ - dict( - type='LoadImageFromFile', - file_client_args={{_base_.file_client_args}}), + dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), dict( type='LoadAnnotations', with_bbox=True, with_mask=True, with_seg=True), dict(type='Resize', scale=(1333, 800), keep_ratio=True), diff --git a/configs/instaboost/cascade-mask-rcnn_r50_fpn_instaboost-4x_coco.py b/configs/instaboost/cascade-mask-rcnn_r50_fpn_instaboost-4x_coco.py index 00165fb0342..f7736cf5756 100644 --- a/configs/instaboost/cascade-mask-rcnn_r50_fpn_instaboost-4x_coco.py +++ b/configs/instaboost/cascade-mask-rcnn_r50_fpn_instaboost-4x_coco.py @@ -1,9 +1,7 @@ _base_ = '../cascade_rcnn/cascade-mask-rcnn_r50_fpn_1x_coco.py' train_pipeline = [ - dict( - type='LoadImageFromFile', - file_client_args={{_base_.file_client_args}}), + dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), dict( type='InstaBoost', action_candidate=('normal', 'horizontal', 'skip'), diff --git a/configs/instaboost/mask-rcnn_r50_fpn_instaboost-4x_coco.py b/configs/instaboost/mask-rcnn_r50_fpn_instaboost-4x_coco.py index 4e90eda8387..0a8c9be81f0 100644 --- a/configs/instaboost/mask-rcnn_r50_fpn_instaboost-4x_coco.py +++ b/configs/instaboost/mask-rcnn_r50_fpn_instaboost-4x_coco.py @@ -1,9 +1,7 @@ _base_ = '../mask_rcnn/mask-rcnn_r50_fpn_1x_coco.py' train_pipeline = [ - dict( - type='LoadImageFromFile', - file_client_args={{_base_.file_client_args}}), + dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), dict( type='InstaBoost', action_candidate=('normal', 'horizontal', 'skip'), diff --git a/configs/lad/lad_r101-paa-r50_fpn_2xb8_coco_1x.py b/configs/lad/lad_r101-paa-r50_fpn_2xb8_coco_1x.py index 36681eb0f19..d61d08638a0 100644 --- a/configs/lad/lad_r101-paa-r50_fpn_2xb8_coco_1x.py +++ b/configs/lad/lad_r101-paa-r50_fpn_2xb8_coco_1x.py @@ -125,6 +125,3 @@ max_per_img=100)) train_dataloader = dict(batch_size=8, num_workers=4) optim_wrapper = dict(type='AmpOptimWrapper', optimizer=dict(lr=0.01)) - -# TODO: MMEngine does not support fp16 yet. -# fp16 = dict(loss_scale=512.) diff --git a/configs/lad/lad_r50-paa-r101_fpn_2xb8_coco_1x.py b/configs/lad/lad_r50-paa-r101_fpn_2xb8_coco_1x.py index 434bc77be77..f7eaf2bfba1 100644 --- a/configs/lad/lad_r50-paa-r101_fpn_2xb8_coco_1x.py +++ b/configs/lad/lad_r50-paa-r101_fpn_2xb8_coco_1x.py @@ -124,6 +124,3 @@ max_per_img=100)) train_dataloader = dict(batch_size=8, num_workers=4) optim_wrapper = dict(type='AmpOptimWrapper', optimizer=dict(lr=0.01)) - -# TODO: MMEngine does not support fp16 yet. -# fp16 = dict(loss_scale=512.) diff --git a/configs/ld/ld_r101-gflv1-r101-dcn_fpn_2x_coco.py b/configs/ld/ld_r101-gflv1-r101-dcn_fpn_2x_coco.py index 681c9e086c2..a7e928bdc23 100644 --- a/configs/ld/ld_r101-gflv1-r101-dcn_fpn_2x_coco.py +++ b/configs/ld/ld_r101-gflv1-r101-dcn_fpn_2x_coco.py @@ -38,9 +38,7 @@ # multi-scale training train_pipeline = [ - dict( - type='LoadImageFromFile', - file_client_args={{_base_.file_client_args}}), + dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), dict(type='LoadAnnotations', with_bbox=True), dict( type='RandomResize', scale=[(1333, 480), (1333, 800)], diff --git a/configs/legacy_1.x/cascade-mask-rcnn_r50_fpn_1x_coco_v1.py b/configs/legacy_1.x/cascade-mask-rcnn_r50_fpn_1x_coco_v1.py index 2aa3a757e15..f948a7a9c10 100644 --- a/configs/legacy_1.x/cascade-mask-rcnn_r50_fpn_1x_coco_v1.py +++ b/configs/legacy_1.x/cascade-mask-rcnn_r50_fpn_1x_coco_v1.py @@ -76,4 +76,3 @@ output_size=14, sampling_ratio=2, aligned=False)))) -dist_params = dict(backend='nccl', port=29515) diff --git a/configs/legacy_1.x/retinanet_r50-caffe_fpn_1x_coco_v1.py b/configs/legacy_1.x/retinanet_r50-caffe_fpn_1x_coco_v1.py index a63d248c435..49abc31a002 100644 --- a/configs/legacy_1.x/retinanet_r50-caffe_fpn_1x_coco_v1.py +++ b/configs/legacy_1.x/retinanet_r50-caffe_fpn_1x_coco_v1.py @@ -1,5 +1,12 @@ _base_ = './retinanet_r50_fpn_1x_coco_v1.py' model = dict( + data_preprocessor=dict( + type='DetDataPreprocessor', + # use caffe img_norm + mean=[102.9801, 115.9465, 122.7717], + std=[1.0, 1.0, 1.0], + bgr_to_rgb=False, + pad_size_divisor=32), backbone=dict( norm_cfg=dict(requires_grad=False), norm_eval=True, @@ -7,35 +14,3 @@ init_cfg=dict( type='Pretrained', checkpoint='open-mmlab://detectron/resnet50_caffe'))) -# use caffe img_norm -img_norm_cfg = dict( - mean=[102.9801, 115.9465, 122.7717], std=[1.0, 1.0, 1.0], to_rgb=False) -train_pipeline = [ - dict(type='LoadImageFromFile'), - dict(type='LoadAnnotations', with_bbox=True), - dict(type='Resize', img_scale=(1333, 800), keep_ratio=True), - dict(type='RandomFlip', flip_ratio=0.5), - dict(type='Normalize', **img_norm_cfg), - dict(type='Pad', size_divisor=32), - dict(type='DefaultFormatBundle'), - dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels']), -] -test_pipeline = [ - dict(type='LoadImageFromFile'), - dict( - type='MultiScaleFlipAug', - img_scale=(1333, 800), - flip=False, - transforms=[ - dict(type='Resize', keep_ratio=True), - dict(type='RandomFlip'), - dict(type='Normalize', **img_norm_cfg), - dict(type='Pad', size_divisor=32), - dict(type='ImageToTensor', keys=['img']), - dict(type='Collect', keys=['img']), - ]) -] -data = dict( - train=dict(pipeline=train_pipeline), - val=dict(pipeline=test_pipeline), - test=dict(pipeline=test_pipeline)) diff --git a/configs/legacy_1.x/ssd300_coco_v1.py b/configs/legacy_1.x/ssd300_coco_v1.py index 65ccc1e542c..e5ffc633a9b 100644 --- a/configs/legacy_1.x/ssd300_coco_v1.py +++ b/configs/legacy_1.x/ssd300_coco_v1.py @@ -18,67 +18,3 @@ type='LegacyDeltaXYWHBBoxCoder', target_means=[.0, .0, .0, .0], target_stds=[0.1, 0.1, 0.2, 0.2]))) -# dataset settings -dataset_type = 'CocoDataset' -data_root = 'data/coco/' -img_norm_cfg = dict(mean=[123.675, 116.28, 103.53], std=[1, 1, 1], to_rgb=True) -train_pipeline = [ - dict(type='LoadImageFromFile', to_float32=True), - dict(type='LoadAnnotations', with_bbox=True), - dict( - type='PhotoMetricDistortion', - brightness_delta=32, - contrast_range=(0.5, 1.5), - saturation_range=(0.5, 1.5), - hue_delta=18), - dict( - type='Expand', - mean=img_norm_cfg['mean'], - to_rgb=img_norm_cfg['to_rgb'], - ratio_range=(1, 4)), - dict( - type='MinIoURandomCrop', - min_ious=(0.1, 0.3, 0.5, 0.7, 0.9), - min_crop_size=0.3), - dict(type='Resize', img_scale=(300, 300), keep_ratio=False), - dict(type='Normalize', **img_norm_cfg), - dict(type='RandomFlip', flip_ratio=0.5), - dict(type='DefaultFormatBundle'), - dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels']), -] -test_pipeline = [ - dict(type='LoadImageFromFile'), - dict( - type='MultiScaleFlipAug', - img_scale=(300, 300), - flip=False, - transforms=[ - dict(type='Resize', keep_ratio=False), - dict(type='Normalize', **img_norm_cfg), - dict(type='ImageToTensor', keys=['img']), - dict(type='Collect', keys=['img']), - ]) -] -data = dict( - samples_per_gpu=8, - workers_per_gpu=3, - train=dict( - _delete_=True, - type='RepeatDataset', - times=5, - dataset=dict( - type=dataset_type, - ann_file=data_root + 'annotations/instances_train2017.json', - img_prefix=data_root + 'train2017/', - pipeline=train_pipeline)), - val=dict(pipeline=test_pipeline), - test=dict(pipeline=test_pipeline)) -# optimizer -optimizer = dict(type='SGD', lr=2e-3, momentum=0.9, weight_decay=5e-4) -optimizer_config = dict(_delete_=True) -dist_params = dict(backend='nccl', port=29555) - -# NOTE: `auto_scale_lr` is for automatically scaling LR, -# USER SHOULD NOT CHANGE ITS VALUES. -# base_batch_size = (8 GPUs) x (8 samples per GPU) -auto_scale_lr = dict(base_batch_size=64) diff --git a/configs/libra_rcnn/libra-fast-rcnn_r50_fpn_1x_coco.py b/configs/libra_rcnn/libra-fast-rcnn_r50_fpn_1x_coco.py index 9d4a4e41ce0..2efe440ce36 100644 --- a/configs/libra_rcnn/libra-fast-rcnn_r50_fpn_1x_coco.py +++ b/configs/libra_rcnn/libra-fast-rcnn_r50_fpn_1x_coco.py @@ -38,13 +38,15 @@ floor_thr=-1, floor_fraction=0, num_bins=3))))) -# dataset settings -dataset_type = 'CocoDataset' -data_root = 'data/coco/' -data = dict( - train=dict(proposal_file=data_root + - 'libra_proposals/rpn_r50_fpn_1x_train2017.pkl'), - val=dict(proposal_file=data_root + - 'libra_proposals/rpn_r50_fpn_1x_val2017.pkl'), - test=dict(proposal_file=data_root + - 'libra_proposals/rpn_r50_fpn_1x_val2017.pkl')) + +# MMEngine support the following two ways, users can choose +# according to convenience +# _base_.train_dataloader.dataset.proposal_file = 'libra_proposals/rpn_r50_fpn_1x_train2017.pkl' # noqa +train_dataloader = dict( + dataset=dict(proposal_file='libra_proposals/rpn_r50_fpn_1x_train2017.pkl')) + +# _base_.val_dataloader.dataset.proposal_file = 'libra_proposals/rpn_r50_fpn_1x_val2017.pkl' # noqa +# test_dataloader = _base_.val_dataloader +val_dataloader = dict( + dataset=dict(proposal_file='libra_proposals/rpn_r50_fpn_1x_val2017.pkl')) +test_dataloader = val_dataloader diff --git a/configs/lvis/metafile.yml b/configs/lvis/metafile.yml new file mode 100644 index 00000000000..f8def96c7e5 --- /dev/null +++ b/configs/lvis/metafile.yml @@ -0,0 +1,128 @@ +Models: + - Name: mask-rcnn_r50_fpn_sample1e-3_ms-2x_lvis-v0.5 + In Collection: Mask R-CNN + Config: configs/lvis/mask-rcnn_r50_fpn_sample1e-3_ms-2x_lvis-v0.5.py + Metadata: + Epochs: 24 + Results: + - Task: Object Detection + Dataset: LVIS v0.5 + Metrics: + box AP: 26.1 + - Task: Instance Segmentation + Dataset: LVIS v0.5 + Metrics: + mask AP: 25.9 + Weights: https://download.openmmlab.com/mmdetection/v2.0/lvis/mask_rcnn_r50_fpn_sample1e-3_mstrain_2x_lvis/mask_rcnn_r50_fpn_sample1e-3_mstrain_2x_lvis-dbd06831.pth + + - Name: mask-rcnn_r101_fpn_sample1e-3_ms-2x_lvis-v0.5 + In Collection: Mask R-CNN + Config: configs/lvis/mask-rcnn_r101_fpn_sample1e-3_ms-2x_lvis-v0.5.py + Metadata: + Epochs: 24 + Results: + - Task: Object Detection + Dataset: LVIS v0.5 + Metrics: + box AP: 27.1 + - Task: Instance Segmentation + Dataset: LVIS v0.5 + Metrics: + mask AP: 27.0 + Weights: https://download.openmmlab.com/mmdetection/v2.0/lvis/mask_rcnn_r101_fpn_sample1e-3_mstrain_2x_lvis/mask_rcnn_r101_fpn_sample1e-3_mstrain_2x_lvis-54582ee2.pth + + - Name: mask-rcnn_x101-32x4d_fpn_sample1e-3_ms-2x_lvis-v0.5 + In Collection: Mask R-CNN + Config: configs/lvis/mask-rcnn_x101-32x4d_fpn_sample1e-3_ms-2x_lvis-v0.5.py + Metadata: + Epochs: 24 + Results: + - Task: Object Detection + Dataset: LVIS v0.5 + Metrics: + box AP: 26.7 + - Task: Instance Segmentation + Dataset: LVIS v0.5 + Metrics: + mask AP: 26.9 + Weights: https://download.openmmlab.com/mmdetection/v2.0/lvis/mask_rcnn_x101_32x4d_fpn_sample1e-3_mstrain_2x_lvis/mask_rcnn_x101_32x4d_fpn_sample1e-3_mstrain_2x_lvis-3cf55ea2.pth + + - Name: mask-rcnn_x101-64x4d_fpn_sample1e-3_ms-2x_lvis-v0.5 + In Collection: Mask R-CNN + Config: configs/lvis/mask-rcnn_x101-64x4d_fpn_sample1e-3_ms-2x_lvis-v0.5.py + Metadata: + Epochs: 24 + Results: + - Task: Object Detection + Dataset: LVIS v0.5 + Metrics: + box AP: 26.4 + - Task: Instance Segmentation + Dataset: LVIS v0.5 + Metrics: + mask AP: 26.0 + Weights: https://download.openmmlab.com/mmdetection/v2.0/lvis/mask_rcnn_x101_64x4d_fpn_sample1e-3_mstrain_2x_lvis/mask_rcnn_x101_64x4d_fpn_sample1e-3_mstrain_2x_lvis-1c99a5ad.pth + + - Name: mask-rcnn_r50_fpn_sample1e-3_ms-1x_lvis-v1 + In Collection: Mask R-CNN + Config: configs/lvis/mask-rcnn_r50_fpn_sample1e-3_ms-1x_lvis-v1.py + Metadata: + Epochs: 12 + Results: + - Task: Object Detection + Dataset: LVIS v1 + Metrics: + box AP: 22.5 + - Task: Instance Segmentation + Dataset: LVIS v1 + Metrics: + mask AP: 21.7 + Weights: https://download.openmmlab.com/mmdetection/v2.0/lvis/mask_rcnn_r50_fpn_sample1e-3_mstrain_1x_lvis_v1/mask_rcnn_r50_fpn_sample1e-3_mstrain_1x_lvis_v1-aa78ac3d.pth + + - Name: mask-rcnn_r101_fpn_sample1e-3_ms-1x_lvis-v1 + In Collection: Mask R-CNN + Config: configs/lvis/mask-rcnn_r101_fpn_sample1e-3_ms-1x_lvis-v1.py + Metadata: + Epochs: 12 + Results: + - Task: Object Detection + Dataset: LVIS v1 + Metrics: + box AP: 24.6 + - Task: Instance Segmentation + Dataset: LVIS v1 + Metrics: + mask AP: 23.6 + Weights: https://download.openmmlab.com/mmdetection/v2.0/lvis/mask_rcnn_r101_fpn_sample1e-3_mstrain_1x_lvis_v1/mask_rcnn_r101_fpn_sample1e-3_mstrain_1x_lvis_v1-ec55ce32.pth + + - Name: mask-rcnn_x101-32x4d_fpn_sample1e-3_ms-1x_lvis-v1 + In Collection: Mask R-CNN + Config: configs/lvis/mask-rcnn_x101-32x4d_fpn_sample1e-3_ms-1x_lvis-v1.py + Metadata: + Epochs: 12 + Results: + - Task: Object Detection + Dataset: LVIS v1 + Metrics: + box AP: 26.7 + - Task: Instance Segmentation + Dataset: LVIS v1 + Metrics: + mask AP: 25.5 + Weights: https://download.openmmlab.com/mmdetection/v2.0/lvis/mask_rcnn_x101_32x4d_fpn_sample1e-3_mstrain_1x_lvis_v1/mask_rcnn_x101_32x4d_fpn_sample1e-3_mstrain_1x_lvis_v1-ebbc5c81.pth + + - Name: mask-rcnn_x101-64x4d_fpn_sample1e-3_ms-1x_lvis-v1 + In Collection: Mask R-CNN + Config: configs/lvis/mask-rcnn_x101-64x4d_fpn_sample1e-3_ms-1x_lvis-v1.py + Metadata: + Epochs: 12 + Results: + - Task: Object Detection + Dataset: LVIS v1 + Metrics: + box AP: 27.2 + - Task: Instance Segmentation + Dataset: LVIS v1 + Metrics: + mask AP: 25.8 + Weights: https://download.openmmlab.com/mmdetection/v2.0/lvis/mask_rcnn_x101_64x4d_fpn_sample1e-3_mstrain_1x_lvis_v1/mask_rcnn_x101_64x4d_fpn_sample1e-3_mstrain_1x_lvis_v1-43d9edfe.pth diff --git a/configs/mask2former/mask2former_r50_8xb2-lsj-50e_coco-panoptic.py b/configs/mask2former/mask2former_r50_8xb2-lsj-50e_coco-panoptic.py index 1a20244299b..c53e981bf0d 100644 --- a/configs/mask2former/mask2former_r50_8xb2-lsj-50e_coco-panoptic.py +++ b/configs/mask2former/mask2former_r50_8xb2-lsj-50e_coco-panoptic.py @@ -150,12 +150,16 @@ # dataset settings data_root = 'data/coco/' train_pipeline = [ - dict(type='LoadImageFromFile', to_float32=True), + dict( + type='LoadImageFromFile', + to_float32=True, + backend_args={{_base_.backend_args}}), dict( type='LoadPanopticAnnotations', with_bbox=True, with_mask=True, - with_seg=True), + with_seg=True, + backend_args={{_base_.backend_args}}), dict(type='RandomFlip', prob=0.5), # large scale jittering dict( @@ -179,12 +183,12 @@ type='CocoPanopticMetric', ann_file=data_root + 'annotations/panoptic_val2017.json', seg_prefix=data_root + 'annotations/panoptic_val2017/', - ), + backend_args={{_base_.backend_args}}), dict( type='CocoMetric', ann_file=data_root + 'annotations/instances_val2017.json', metric=['bbox', 'segm'], - ) + backend_args={{_base_.backend_args}}) ] test_evaluator = val_evaluator diff --git a/configs/mask2former/mask2former_r50_8xb2-lsj-50e_coco.py b/configs/mask2former/mask2former_r50_8xb2-lsj-50e_coco.py index 6bc9fd7472a..24a17f58c54 100644 --- a/configs/mask2former/mask2former_r50_8xb2-lsj-50e_coco.py +++ b/configs/mask2former/mask2former_r50_8xb2-lsj-50e_coco.py @@ -36,7 +36,10 @@ # dataset settings train_pipeline = [ - dict(type='LoadImageFromFile', to_float32=True), + dict( + type='LoadImageFromFile', + to_float32=True, + backend_args={{_base_.backend_args}}), dict(type='LoadAnnotations', with_bbox=True, with_mask=True), dict(type='RandomFlip', prob=0.5), # large scale jittering @@ -57,7 +60,10 @@ ] test_pipeline = [ - dict(type='LoadImageFromFile', to_float32=True), + dict( + type='LoadImageFromFile', + to_float32=True, + backend_args={{_base_.backend_args}}), dict(type='Resize', scale=(1333, 800), keep_ratio=True), # If you don't have a gt annotation, delete the pipeline dict(type='LoadAnnotations', with_bbox=True, with_mask=True), @@ -89,5 +95,6 @@ type='CocoMetric', ann_file=data_root + 'annotations/instances_val2017.json', metric=['bbox', 'segm'], - format_only=False) + format_only=False, + backend_args={{_base_.backend_args}}) test_evaluator = val_evaluator diff --git a/configs/mask2former/metafile.yml b/configs/mask2former/metafile.yml index 1de7a4e6821..3321239213f 100644 --- a/configs/mask2former/metafile.yml +++ b/configs/mask2former/metafile.yml @@ -36,7 +36,7 @@ Models: Dataset: COCO Metrics: PQ: 54.5 - Weights: https://download.openmmlab.com/mmdetection/v2.0/mask2former/mask2former_swin-s-p4-w7-224_lsj_8x2_50e_coco-panoptic/mask2former_swin-s-p4-w7-224_lsj_8x2_50e_coco-panoptic_20220329_225200-c7b94355.pth + Weights: https://download.openmmlab.com/mmdetection/v3.0/mask2former/mask2former_swin-s-p4-w7-224_8xb2-lsj-50e_coco-panoptic/mask2former_swin-s-p4-w7-224_8xb2-lsj-50e_coco-panoptic_20220329_225200-4a16ded7.pth - Name: mask2former_r101_8xb2-lsj-50e_coco In Collection: Mask2Former Config: configs/mask2former/mask2former_r101_8xb2-lsj-50e_coco.py @@ -52,7 +52,7 @@ Models: Dataset: COCO Metrics: mask AP: 44.0 - Weights: https://download.openmmlab.com/mmdetection/v2.0/mask2former/mask2former_r101_lsj_8x2_50e_coco/mask2former_r101_lsj_8x2_50e_coco_20220426_100250-c50b6fa6.pth + Weights: https://download.openmmlab.com/mmdetection/v3.0/mask2former/mask2former_r101_8xb2-lsj-50e_coco/mask2former_r101_8xb2-lsj-50e_coco_20220426_100250-ecf181e2.pth - Name: mask2former_r101_8xb2-lsj-50e_coco-panoptic In Collection: Mask2Former Config: configs/mask2former/mask2former_r101_8xb2-lsj-50e_coco-panoptic.py @@ -72,7 +72,7 @@ Models: Dataset: COCO Metrics: PQ: 52.4 - Weights: https://download.openmmlab.com/mmdetection/v2.0/mask2former/mask2former_r101_lsj_8x2_50e_coco-panoptic/mask2former_r101_lsj_8x2_50e_coco-panoptic_20220329_225104-c54e64c9.pth + Weights: https://download.openmmlab.com/mmdetection/v3.0/mask2former/mask2former_r101_8xb2-lsj-50e_coco-panoptic/mask2former_r101_8xb2-lsj-50e_coco-panoptic_20220329_225104-c74d4d71.pth - Name: mask2former_r50_8xb2-lsj-50e_coco-panoptic In Collection: Mask2Former Config: configs/mask2former/mask2former_r50_8xb2-lsj-50e_coco-panoptic.py @@ -83,16 +83,16 @@ Models: - Task: Object Detection Dataset: COCO Metrics: - box AP: 44.8 + box AP: 44.5 - Task: Instance Segmentation Dataset: COCO Metrics: - mask AP: 41.9 + mask AP: 41.8 - Task: Panoptic Segmentation Dataset: COCO Metrics: - PQ: 51.9 - Weights: https://download.openmmlab.com/mmdetection/v2.0/mask2former/mask2former_r50_lsj_8x2_50e_coco-panoptic/mask2former_r50_lsj_8x2_50e_coco-panoptic_20220326_224516-11a44721.pth + PQ: 52.0 + Weights: https://download.openmmlab.com/mmdetection/v3.0/mask2former/mask2former_r50_8xb2-lsj-50e_coco-panoptic/mask2former_r50_8xb2-lsj-50e_coco-panoptic_20230118_125535-54df384a.pth - Name: mask2former_swin-t-p4-w7-224_8xb2-lsj-50e_coco-panoptic In Collection: Mask2Former Config: configs/mask2former/mask2former_swin-t-p4-w7-224_8xb2-lsj-50e_coco-panoptic.py @@ -112,7 +112,7 @@ Models: Dataset: COCO Metrics: PQ: 53.4 - Weights: https://download.openmmlab.com/mmdetection/v2.0/mask2former/mask2former_swin-t-p4-w7-224_lsj_8x2_50e_coco-panoptic/mask2former_swin-t-p4-w7-224_lsj_8x2_50e_coco-panoptic_20220326_224553-fc567107.pth + Weights: https://download.openmmlab.com/mmdetection/v3.0/mask2former/mask2former_swin-t-p4-w7-224_8xb2-lsj-50e_coco-panoptic/mask2former_swin-t-p4-w7-224_8xb2-lsj-50e_coco-panoptic_20220326_224553-3ec9e0ae.pth - Name: mask2former_r50_8xb2-lsj-50e_coco In Collection: Mask2Former Config: configs/mask2former/mask2former_r50_8xb2-lsj-50e_coco.py @@ -128,7 +128,7 @@ Models: Dataset: COCO Metrics: mask AP: 42.9 - Weights: https://download.openmmlab.com/mmdetection/v2.0/mask2former/mask2former_r50_lsj_8x2_50e_coco/mask2former_r50_lsj_8x2_50e_coco_20220506_191028-8e96e88b.pth + Weights: https://download.openmmlab.com/mmdetection/v3.0/mask2former/mask2former_r50_8xb2-lsj-50e_coco/mask2former_r50_8xb2-lsj-50e_coco_20220506_191028-41b088b6.pth - Name: mask2former_swin-l-p4-w12-384-in21k_16xb1-lsj-100e_coco-panoptic In Collection: Mask2Former Config: configs/mask2former/mask2former_swin-l-p4-w12-384-in21k_16xb1-lsj-100e_coco-panoptic.py @@ -148,7 +148,7 @@ Models: Dataset: COCO Metrics: PQ: 57.6 - Weights: https://download.openmmlab.com/mmdetection/v2.0/mask2former/mask2former_swin-l-p4-w12-384-in21k_lsj_16x1_100e_coco-panoptic/mask2former_swin-l-p4-w12-384-in21k_lsj_16x1_100e_coco-panoptic_20220407_104949-d4919c44.pth + Weights: https://download.openmmlab.com/mmdetection/v3.0/mask2former/mask2former_swin-l-p4-w12-384-in21k_16xb1-lsj-100e_coco-panoptic/mask2former_swin-l-p4-w12-384-in21k_16xb1-lsj-100e_coco-panoptic_20220407_104949-82f8d28d.pth - Name: mask2former_swin-b-p4-w12-384-in21k_8xb2-lsj-50e_coco-panoptic In Collection: Mask2Former Config: configs/mask2former/mask2former_swin-b-p4-w12-384-in21k_8xb2-lsj-50e_coco-panoptic.py @@ -168,7 +168,7 @@ Models: Dataset: COCO Metrics: PQ: 56.3 - Weights: https://download.openmmlab.com/mmdetection/v2.0/mask2former/mask2former_swin-b-p4-w12-384-in21k_lsj_8x2_50e_coco-panoptic/mask2former_swin-b-p4-w12-384-in21k_lsj_8x2_50e_coco-panoptic_20220329_230021-3bb8b482.pth + Weights: https://download.openmmlab.com/mmdetection/v3.0/mask2former/mask2former_swin-b-p4-w12-384-in21k_8xb2-lsj-50e_coco-panoptic/mask2former_swin-b-p4-w12-384-in21k_8xb2-lsj-50e_coco-panoptic_20220329_230021-05ec7315.pth - Name: mask2former_swin-b-p4-w12-384_8xb2-lsj-50e_coco-panoptic In Collection: Mask2Former Config: configs/mask2former/mask2former_swin-b-p4-w12-384_8xb2-lsj-50e_coco-panoptic.py @@ -188,7 +188,7 @@ Models: Dataset: COCO Metrics: PQ: 55.1 - Weights: https://download.openmmlab.com/mmdetection/v2.0/mask2former/mask2former_swin-b-p4-w12-384_lsj_8x2_50e_coco-panoptic/mask2former_swin-b-p4-w12-384_lsj_8x2_50e_coco-panoptic_20220331_002244-c149a9e9.pth + Weights: https://download.openmmlab.com/mmdetection/v3.0/mask2former/mask2former_swin-b-p4-w12-384_8xb2-lsj-50e_coco-panoptic/mask2former_swin-b-p4-w12-384_8xb2-lsj-50e_coco-panoptic_20220331_002244-8a651d82.pth - Name: mask2former_swin-t-p4-w7-224_8xb2-lsj-50e_coco In Collection: Mask2Former Config: configs/mask2former/mask2former_swin-t-p4-w7-224_8xb2-lsj-50e_coco.py @@ -204,7 +204,7 @@ Models: Dataset: COCO Metrics: mask AP: 44.7 - Weights: https://download.openmmlab.com/mmdetection/v2.0/mask2former/mask2former_swin-t-p4-w7-224_lsj_8x2_50e_coco/mask2former_swin-t-p4-w7-224_lsj_8x2_50e_coco_20220508_091649-4a943037.pth + Weights: https://download.openmmlab.com/mmdetection/v3.0/mask2former/mask2former_swin-t-p4-w7-224_8xb2-lsj-50e_coco/mask2former_swin-t-p4-w7-224_8xb2-lsj-50e_coco_20220508_091649-01b0f990.pth - Name: mask2former_swin-s-p4-w7-224_8xb2-lsj-50e_coco In Collection: Mask2Former Config: configs/mask2former/mask2former_swin-s-p4-w7-224_8xb2-lsj-50e_coco.py @@ -220,4 +220,4 @@ Models: Dataset: COCO Metrics: mask AP: 46.1 - Weights: https://download.openmmlab.com/mmdetection/v2.0/mask2former/mask2former_swin-s-p4-w7-224_lsj_8x2_50e_coco/mask2former_swin-s-p4-w7-224_lsj_8x2_50e_coco_20220504_001756-743b7d99.pth + Weights: https://download.openmmlab.com/mmdetection/v3.0/mask2former/mask2former_swin-s-p4-w7-224_8xb2-lsj-50e_coco/mask2former_swin-s-p4-w7-224_8xb2-lsj-50e_coco_20220504_001756-c9d0c4f2.pth diff --git a/configs/mask_rcnn/mask-rcnn_r50-caffe_fpn_ms-1x_coco.py b/configs/mask_rcnn/mask-rcnn_r50-caffe_fpn_ms-1x_coco.py index 6c0f1bde7aa..7702ae14a9c 100644 --- a/configs/mask_rcnn/mask-rcnn_r50-caffe_fpn_ms-1x_coco.py +++ b/configs/mask_rcnn/mask-rcnn_r50-caffe_fpn_ms-1x_coco.py @@ -14,9 +14,7 @@ checkpoint='open-mmlab://detectron2/resnet50_caffe'))) train_pipeline = [ - dict( - type='LoadImageFromFile', - file_client_args={{_base_.file_client_args}}), + dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), dict(type='LoadAnnotations', with_bbox=True, with_mask=True), dict( type='RandomChoiceResize', diff --git a/configs/mask_rcnn/mask-rcnn_r50-caffe_fpn_ms-poly-1x_coco.py b/configs/mask_rcnn/mask-rcnn_r50-caffe_fpn_ms-poly-1x_coco.py index dd57d035f08..94d94dd3613 100644 --- a/configs/mask_rcnn/mask-rcnn_r50-caffe_fpn_ms-poly-1x_coco.py +++ b/configs/mask_rcnn/mask-rcnn_r50-caffe_fpn_ms-poly-1x_coco.py @@ -13,9 +13,7 @@ type='Pretrained', checkpoint='open-mmlab://detectron2/resnet50_caffe'))) train_pipeline = [ - dict( - type='LoadImageFromFile', - file_client_args={{_base_.file_client_args}}), + dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), dict( type='LoadAnnotations', with_bbox=True, diff --git a/configs/mask_rcnn/mask-rcnn_r50_fpn_1x-wandb_coco.py b/configs/mask_rcnn/mask-rcnn_r50_fpn_1x-wandb_coco.py index c5107210457..364e0aa42aa 100644 --- a/configs/mask_rcnn/mask-rcnn_r50_fpn_1x-wandb_coco.py +++ b/configs/mask_rcnn/mask-rcnn_r50_fpn_1x-wandb_coco.py @@ -1,27 +1,16 @@ -# TODO: Awaiting refactoring _base_ = [ '../_base_/models/mask-rcnn_r50_fpn.py', '../_base_/datasets/coco_instance.py', '../_base_/schedules/schedule_1x.py', '../_base_/default_runtime.py' ] -# Set evaluation interval -evaluation = dict(interval=2) -# Set checkpoint interval -checkpoint_config = dict(interval=4) +vis_backends = [dict(type='LocalVisBackend'), dict(type='WandBVisBackend')] +visualizer = dict(vis_backends=vis_backends) -# yapf:disable -log_config = dict( - interval=50, - hooks=[ - dict(type='TextLoggerHook'), - dict(type='MMDetWandbHook', - init_kwargs={ - 'project': 'mmdetection', - 'group': 'maskrcnn-r50-fpn-1x-coco' - }, - interval=50, - log_checkpoint=True, - log_checkpoint_metadata=True, - num_eval_images=100) - ]) +# MMEngine support the following two ways, users can choose +# according to convenience +# default_hooks = dict(checkpoint=dict(interval=4)) +_base_.default_hooks.checkpoint.interval = 4 + +# train_cfg = dict(val_interval=2) +_base_.train_cfg.val_interval = 2 diff --git a/configs/mask_rcnn/mask-rcnn_r50_fpn_poly-1x_coco.py b/configs/mask_rcnn/mask-rcnn_r50_fpn_poly-1x_coco.py index 193dcd1930f..826180ce0a8 100644 --- a/configs/mask_rcnn/mask-rcnn_r50_fpn_poly-1x_coco.py +++ b/configs/mask_rcnn/mask-rcnn_r50_fpn_poly-1x_coco.py @@ -5,9 +5,7 @@ ] train_pipeline = [ - dict( - type='LoadImageFromFile', - file_client_args={{_base_.file_client_args}}), + dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), dict( type='LoadAnnotations', with_bbox=True, diff --git a/configs/mask_rcnn/mask-rcnn_x101-32x8d_fpn_ms-poly-1x_coco.py b/configs/mask_rcnn/mask-rcnn_x101-32x8d_fpn_ms-poly-1x_coco.py index a743aaea952..6ee204d9000 100644 --- a/configs/mask_rcnn/mask-rcnn_x101-32x8d_fpn_ms-poly-1x_coco.py +++ b/configs/mask_rcnn/mask-rcnn_x101-32x8d_fpn_ms-poly-1x_coco.py @@ -22,9 +22,7 @@ checkpoint='open-mmlab://detectron2/resnext101_32x8d'))) train_pipeline = [ - dict( - type='LoadImageFromFile', - file_client_args={{_base_.file_client_args}}), + dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), dict( type='LoadAnnotations', with_bbox=True, diff --git a/configs/nas_fpn/retinanet_r50_fpn_crop640-50e_coco.py b/configs/nas_fpn/retinanet_r50_fpn_crop640-50e_coco.py index 6062a7601f4..11c34f6758a 100644 --- a/configs/nas_fpn/retinanet_r50_fpn_crop640-50e_coco.py +++ b/configs/nas_fpn/retinanet_r50_fpn_crop640-50e_coco.py @@ -24,9 +24,7 @@ # dataset settings train_pipeline = [ - dict( - type='LoadImageFromFile', - file_client_args={{_base_.file_client_args}}), + dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), dict(type='LoadAnnotations', with_bbox=True), dict( type='RandomResize', @@ -38,9 +36,7 @@ dict(type='PackDetInputs') ] test_pipeline = [ - dict( - type='LoadImageFromFile', - file_client_args={{_base_.file_client_args}}), + dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), dict(type='Resize', scale=(640, 640), keep_ratio=True), dict(type='LoadAnnotations', with_bbox=True), dict( diff --git a/configs/objects365/README.md b/configs/objects365/README.md index e54928b649a..fca0dbfc945 100644 --- a/configs/objects365/README.md +++ b/configs/objects365/README.md @@ -87,16 +87,16 @@ Objects 365 includes 11 categories of people, clothing, living room, bathroom, k ### Objects365 V1 -| Architecture | Backbone | Style | Lr schd | Mem (GB) | box AP | Config | Download | -| :----------: | :------: | :-----: | :-----: | :------: | :----: | :------------------------------------------------------------------------------------------------------------------------------: | :--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | -| Faster R-CNN | R-50 | pytorch | 1x | - | 19.6 | [config](https://github.com/open-mmlab/mmdetection/tree/3.x/configs/objects365/faster-rcnn_r50_fpn_16xb4-1x_objects365v1.py) | [model](https://download.openmmlab.com/mmdetection/v2.0/objects365/faster_rcnn_r50_fpn_16x4_1x_obj365v1/faster_rcnn_r50_fpn_16x4_1x_obj365v1_20221219_181226-9ff10f95.pth) \| [log](https://download.openmmlab.com/mmdetection/v2.0/objects365/faster_rcnn_r50_fpn_16x4_1x_obj365v1/faster_rcnn_r50_fpn_16x4_1x_obj365v1_20221219_181226.log.json) | -| Faster R-CNN | R-50 | pytorch | 1350K | - | 22.3 | [config](https://github.com/open-mmlab/mmdetection/tree/3.x/configs/objects365/faster-rcnn_r50-syncbn_fpn_1350k_objects365v1.py) | [model](https://download.openmmlab.com/mmdetection/v2.0/objects365/faster_rcnn_r50_fpn_syncbn_1350k_obj365v1/faster_rcnn_r50_fpn_syncbn_1350k_obj365v1_20220510_142457-337d8965.pth) \| [log](https://download.openmmlab.com/mmdetection/v2.0/objects365/faster_rcnn_r50_fpn_syncbn_1350k_obj365v1/faster_rcnn_r50_fpn_syncbn_1350k_obj365v1_20220510_142457.log.json) | -| Retinanet | R-50 | pytorch | 1x | - | 14.8 | [config](https://github.com/open-mmlab/mmdetection/tree/3.x/configs/objects365/retinanet_r50_fpn_1x_objects365v1.py) | [model](https://download.openmmlab.com/mmdetection/v2.0/objects365/retinanet_r50_fpn_1x_obj365v1/retinanet_r50_fpn_1x_obj365v1_20221219_181859-ba3e3dd5.pth) \| [log](https://download.openmmlab.com/mmdetection/v2.0/objects365/retinanet_r50_fpn_1x_obj365v1/retinanet_r50_fpn_1x_obj365v1_20221219_181859.log.json) | -| Retinanet | R-50 | pytorch | 1350K | - | 18.0 | [config](https://github.com/open-mmlab/mmdetection/tree/3.x/configs/objects365/retinanet_r50-syncbn_fpn_1350k_objects365v1.py) | [model](https://download.openmmlab.com/mmdetection/v2.0/objects365/retinanet_r50_fpn_syncbn_1350k_obj365v1/retinanet_r50_fpn_syncbn_1350k_obj365v1_20220513_111237-7517c576.pth) \| [log](https://download.openmmlab.com/mmdetection/v2.0/objects365/retinanet_r50_fpn_syncbn_1350k_obj365v1/retinanet_r50_fpn_syncbn_1350k_obj365v1_20220513_111237.log.json) | +| Architecture | Backbone | Style | Lr schd | Mem (GB) | box AP | Config | Download | +| :----------: | :------: | :-----: | :-----: | :------: | :----: | :-------------------------------------------------------------------------------------------------------------------------------: | :--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | +| Faster R-CNN | R-50 | pytorch | 1x | - | 19.6 | [config](https://github.com/open-mmlab/mmdetection/tree/main/configs/objects365/faster-rcnn_r50_fpn_16xb4-1x_objects365v1.py) | [model](https://download.openmmlab.com/mmdetection/v2.0/objects365/faster_rcnn_r50_fpn_16x4_1x_obj365v1/faster_rcnn_r50_fpn_16x4_1x_obj365v1_20221219_181226-9ff10f95.pth) \| [log](https://download.openmmlab.com/mmdetection/v2.0/objects365/faster_rcnn_r50_fpn_16x4_1x_obj365v1/faster_rcnn_r50_fpn_16x4_1x_obj365v1_20221219_181226.log.json) | +| Faster R-CNN | R-50 | pytorch | 1350K | - | 22.3 | [config](https://github.com/open-mmlab/mmdetection/tree/main/configs/objects365/faster-rcnn_r50-syncbn_fpn_1350k_objects365v1.py) | [model](https://download.openmmlab.com/mmdetection/v2.0/objects365/faster_rcnn_r50_fpn_syncbn_1350k_obj365v1/faster_rcnn_r50_fpn_syncbn_1350k_obj365v1_20220510_142457-337d8965.pth) \| [log](https://download.openmmlab.com/mmdetection/v2.0/objects365/faster_rcnn_r50_fpn_syncbn_1350k_obj365v1/faster_rcnn_r50_fpn_syncbn_1350k_obj365v1_20220510_142457.log.json) | +| Retinanet | R-50 | pytorch | 1x | - | 14.8 | [config](https://github.com/open-mmlab/mmdetection/tree/main/configs/objects365/retinanet_r50_fpn_1x_objects365v1.py) | [model](https://download.openmmlab.com/mmdetection/v2.0/objects365/retinanet_r50_fpn_1x_obj365v1/retinanet_r50_fpn_1x_obj365v1_20221219_181859-ba3e3dd5.pth) \| [log](https://download.openmmlab.com/mmdetection/v2.0/objects365/retinanet_r50_fpn_1x_obj365v1/retinanet_r50_fpn_1x_obj365v1_20221219_181859.log.json) | +| Retinanet | R-50 | pytorch | 1350K | - | 18.0 | [config](https://github.com/open-mmlab/mmdetection/tree/main/configs/objects365/retinanet_r50-syncbn_fpn_1350k_objects365v1.py) | [model](https://download.openmmlab.com/mmdetection/v2.0/objects365/retinanet_r50_fpn_syncbn_1350k_obj365v1/retinanet_r50_fpn_syncbn_1350k_obj365v1_20220513_111237-7517c576.pth) \| [log](https://download.openmmlab.com/mmdetection/v2.0/objects365/retinanet_r50_fpn_syncbn_1350k_obj365v1/retinanet_r50_fpn_syncbn_1350k_obj365v1_20220513_111237.log.json) | ### Objects365 V2 -| Architecture | Backbone | Style | Lr schd | Mem (GB) | box AP | Config | Download | -| :----------: | :------: | :-----: | :-----: | :------: | :----: | :--------------------------------------------------------------------------------------------------------------------------: | :------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | -| Faster R-CNN | R-50 | pytorch | 1x | - | 19.8 | [config](https://github.com/open-mmlab/mmdetection/tree/3.x/configs/objects365/faster-rcnn_r50_fpn_16xb4-1x_objects365v2.py) | [model](https://download.openmmlab.com/mmdetection/v2.0/objects365/faster_rcnn_r50_fpn_16x4_1x_obj365v2/faster_rcnn_r50_fpn_16x4_1x_obj365v2_20221220_175040-5910b015.pth) \| [log](https://download.openmmlab.com/mmdetection/v2.0/objects365/faster_rcnn_r50_fpn_16x4_1x_obj365v2/faster_rcnn_r50_fpn_16x4_1x_obj365v2_20221220_175040.log.json) | -| Retinanet | R-50 | pytorch | 1x | - | 16.7 | [config](https://github.com/open-mmlab/mmdetection/tree/3.x/configs/objects365/retinanet_r50_fpn_1x_objects365v2.py) | [model](https://download.openmmlab.com/mmdetection/v2.0/objects365/retinanet_r50_fpn_1x_obj365v2/retinanet_r50_fpn_1x_obj365v2_20221223_122105-d9b191f1.pth) \| [log](https://download.openmmlab.com/mmdetection/v2.0/objects365/retinanet_r50_fpn_1x_obj365v2/retinanet_r50_fpn_1x_obj365v2_20221223_122105.log.json) | +| Architecture | Backbone | Style | Lr schd | Mem (GB) | box AP | Config | Download | +| :----------: | :------: | :-----: | :-----: | :------: | :----: | :---------------------------------------------------------------------------------------------------------------------------: | :------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | +| Faster R-CNN | R-50 | pytorch | 1x | - | 19.8 | [config](https://github.com/open-mmlab/mmdetection/tree/main/configs/objects365/faster-rcnn_r50_fpn_16xb4-1x_objects365v2.py) | [model](https://download.openmmlab.com/mmdetection/v2.0/objects365/faster_rcnn_r50_fpn_16x4_1x_obj365v2/faster_rcnn_r50_fpn_16x4_1x_obj365v2_20221220_175040-5910b015.pth) \| [log](https://download.openmmlab.com/mmdetection/v2.0/objects365/faster_rcnn_r50_fpn_16x4_1x_obj365v2/faster_rcnn_r50_fpn_16x4_1x_obj365v2_20221220_175040.log.json) | +| Retinanet | R-50 | pytorch | 1x | - | 16.7 | [config](https://github.com/open-mmlab/mmdetection/tree/main/configs/objects365/retinanet_r50_fpn_1x_objects365v2.py) | [model](https://download.openmmlab.com/mmdetection/v2.0/objects365/retinanet_r50_fpn_1x_obj365v2/retinanet_r50_fpn_1x_obj365v2_20221223_122105-d9b191f1.pth) \| [log](https://download.openmmlab.com/mmdetection/v2.0/objects365/retinanet_r50_fpn_1x_obj365v2/retinanet_r50_fpn_1x_obj365v2_20221223_122105.log.json) | diff --git a/configs/openimages/ssd300_32xb8-36e_openimages.py b/configs/openimages/ssd300_32xb8-36e_openimages.py index 9847ef5302b..9cb51cae00a 100644 --- a/configs/openimages/ssd300_32xb8-36e_openimages.py +++ b/configs/openimages/ssd300_32xb8-36e_openimages.py @@ -11,9 +11,7 @@ data_root = 'data/OpenImages/' input_size = 300 train_pipeline = [ - dict( - type='LoadImageFromFile', - file_client_args={{_base_.file_client_args}}), + dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), dict(type='LoadAnnotations', with_bbox=True), dict( type='PhotoMetricDistortion', @@ -35,7 +33,7 @@ dict(type='PackDetInputs') ] test_pipeline = [ - dict(type='LoadImageFromFile'), + dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), dict(type='Resize', scale=(input_size, input_size), keep_ratio=False), # avoid bboxes being resized dict(type='LoadAnnotations', with_bbox=True), diff --git a/configs/paa/metafile.yml b/configs/paa/metafile.yml index a2a39ffd8ba..078b974971d 100644 --- a/configs/paa/metafile.yml +++ b/configs/paa/metafile.yml @@ -24,6 +24,7 @@ Models: Config: configs/paa/paa_r50_fpn_1x_coco.py Metadata: Training Memory (GB): 3.7 + Epochs: 12 Results: - Task: Object Detection Dataset: COCO @@ -36,6 +37,7 @@ Models: Config: configs/paa/paa_r50_fpn_1.5x_coco.py Metadata: Training Memory (GB): 3.7 + Epochs: 18 Results: - Task: Object Detection Dataset: COCO @@ -48,6 +50,7 @@ Models: Config: configs/paa/paa_r50_fpn_2x_coco.py Metadata: Training Memory (GB): 3.7 + Epochs: 24 Results: - Task: Object Detection Dataset: COCO @@ -60,6 +63,7 @@ Models: Config: configs/paa/paa_r50_fpn_ms-3x_coco.py Metadata: Training Memory (GB): 3.7 + Epochs: 36 Results: - Task: Object Detection Dataset: COCO @@ -72,6 +76,7 @@ Models: Config: configs/paa/paa_r101_fpn_1x_coco.py Metadata: Training Memory (GB): 6.2 + Epochs: 12 Results: - Task: Object Detection Dataset: COCO @@ -84,6 +89,7 @@ Models: Config: configs/paa/paa_r101_fpn_2x_coco.py Metadata: Training Memory (GB): 6.2 + Epochs: 24 Results: - Task: Object Detection Dataset: COCO @@ -96,6 +102,7 @@ Models: Config: configs/paa/paa_r101_fpn_ms-3x_coco.py Metadata: Training Memory (GB): 6.2 + Epochs: 36 Results: - Task: Object Detection Dataset: COCO diff --git a/configs/paa/paa_r50_fpn_ms-3x_coco.py b/configs/paa/paa_r50_fpn_ms-3x_coco.py index 803ceeca0ec..fed8b90a0fd 100644 --- a/configs/paa/paa_r50_fpn_ms-3x_coco.py +++ b/configs/paa/paa_r50_fpn_ms-3x_coco.py @@ -18,9 +18,7 @@ train_cfg = dict(max_epochs=max_epochs) train_pipeline = [ - dict( - type='LoadImageFromFile', - file_client_args={{_base_.file_client_args}}), + dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), dict(type='LoadAnnotations', with_bbox=True), dict( type='RandomResize', scale=[(1333, 640), (1333, 800)], diff --git a/configs/pascal_voc/faster-rcnn_r50-caffe-c4_ms-18k_voc0712.py b/configs/pascal_voc/faster-rcnn_r50-caffe-c4_ms-18k_voc0712.py index 7a3c34367d7..dddc0bbdf33 100644 --- a/configs/pascal_voc/faster-rcnn_r50-caffe-c4_ms-18k_voc0712.py +++ b/configs/pascal_voc/faster-rcnn_r50-caffe-c4_ms-18k_voc0712.py @@ -7,9 +7,7 @@ # dataset settings train_pipeline = [ - dict( - type='LoadImageFromFile', - file_client_args={{_base_.file_client_args}}), + dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), dict(type='LoadAnnotations', with_bbox=True), dict( type='RandomChoiceResize', @@ -21,9 +19,7 @@ dict(type='PackDetInputs') ] test_pipeline = [ - dict( - type='LoadImageFromFile', - file_client_args={{_base_.file_client_args}}), + dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), dict(type='Resize', scale=(1333, 800), keep_ratio=True), # avoid bboxes being resized dict(type='LoadAnnotations', with_bbox=True), @@ -45,14 +41,16 @@ ann_file='VOC2007/ImageSets/Main/trainval.txt', data_prefix=dict(sub_data_root='VOC2007/'), filter_cfg=dict(filter_empty_gt=True, min_size=32), - pipeline=train_pipeline), + pipeline=train_pipeline, + backend_args={{_base_.backend_args}}), dict( type='VOCDataset', data_root={{_base_.data_root}}, ann_file='VOC2012/ImageSets/Main/trainval.txt', data_prefix=dict(sub_data_root='VOC2012/'), filter_cfg=dict(filter_empty_gt=True, min_size=32), - pipeline=train_pipeline) + pipeline=train_pipeline, + backend_args={{_base_.backend_args}}) ])) val_dataloader = dict(dataset=dict(pipeline=test_pipeline)) diff --git a/configs/pascal_voc/faster-rcnn_r50_fpn_1x_voc0712-cocofmt.py b/configs/pascal_voc/faster-rcnn_r50_fpn_1x_voc0712-cocofmt.py index d8bfd043a2e..0b0aa41d67f 100644 --- a/configs/pascal_voc/faster-rcnn_r50_fpn_1x_voc0712-cocofmt.py +++ b/configs/pascal_voc/faster-rcnn_r50_fpn_1x_voc0712-cocofmt.py @@ -22,18 +22,14 @@ data_root = 'data/VOCdevkit/' train_pipeline = [ - dict( - type='LoadImageFromFile', - file_client_args={{_base_.file_client_args}}), + dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), dict(type='LoadAnnotations', with_bbox=True), dict(type='Resize', scale=(1000, 600), keep_ratio=True), dict(type='RandomFlip', prob=0.5), dict(type='PackDetInputs') ] test_pipeline = [ - dict( - type='LoadImageFromFile', - file_client_args={{_base_.file_client_args}}), + dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), dict(type='Resize', scale=(1000, 600), keep_ratio=True), # avoid bboxes being resized dict(type='LoadAnnotations', with_bbox=True), @@ -54,7 +50,8 @@ data_prefix=dict(img=''), metainfo=METAINFO, filter_cfg=dict(filter_empty_gt=True, min_size=32), - pipeline=train_pipeline))) + pipeline=train_pipeline, + backend_args={{_base_.backend_args}}))) val_dataloader = dict( dataset=dict( type=dataset_type, @@ -68,7 +65,8 @@ type='CocoMetric', ann_file=data_root + 'annotations/voc07_test.json', metric='bbox', - format_only=False) + format_only=False, + backend_args={{_base_.backend_args}}) test_evaluator = val_evaluator # training schedule, the dataset is repeated 3 times, so the diff --git a/configs/queryinst/metafile.yml b/configs/queryinst/metafile.yml index 07c3d035d59..3ea3b00a945 100644 --- a/configs/queryinst/metafile.yml +++ b/configs/queryinst/metafile.yml @@ -15,7 +15,7 @@ Collections: Title: 'Instances as Queries' README: configs/queryinst/README.md Code: - URL: https://github.com/open-mmlab/mmdetection/blob/master/mmdet/models/detectors/queryinst.py + URL: https://github.com/open-mmlab/mmdetection/blob/main/mmdet/models/detectors/queryinst.py Version: v2.18.0 Models: diff --git a/configs/queryinst/queryinst_r50_fpn_300-proposals_crop-ms-480-800-3x_coco.py b/configs/queryinst/queryinst_r50_fpn_300-proposals_crop-ms-480-800-3x_coco.py index 1f5ada6ead4..33ab061267b 100644 --- a/configs/queryinst/queryinst_r50_fpn_300-proposals_crop-ms-480-800-3x_coco.py +++ b/configs/queryinst/queryinst_r50_fpn_300-proposals_crop-ms-480-800-3x_coco.py @@ -9,9 +9,7 @@ # augmentation strategy originates from DETR. train_pipeline = [ - dict( - type='LoadImageFromFile', - file_client_args={{_base_.file_client_args}}), + dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), dict(type='LoadAnnotations', with_bbox=True, with_mask=True), dict(type='RandomFlip', prob=0.5), dict( diff --git a/configs/queryinst/queryinst_r50_fpn_ms-480-800-3x_coco.py b/configs/queryinst/queryinst_r50_fpn_ms-480-800-3x_coco.py index 4e4434982bc..6b99374ef43 100644 --- a/configs/queryinst/queryinst_r50_fpn_ms-480-800-3x_coco.py +++ b/configs/queryinst/queryinst_r50_fpn_ms-480-800-3x_coco.py @@ -1,9 +1,7 @@ _base_ = './queryinst_r50_fpn_1x_coco.py' train_pipeline = [ - dict( - type='LoadImageFromFile', - file_client_args={{_base_.file_client_args}}), + dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), dict(type='LoadAnnotations', with_bbox=True, with_mask=True), dict( type='RandomChoiceResize', diff --git a/configs/regnet/mask-rcnn_regnetx-3.2GF_fpn_ms-3x_coco.py b/configs/regnet/mask-rcnn_regnetx-3.2GF_fpn_ms-3x_coco.py index 3fc02ffbbdb..36482c939dc 100644 --- a/configs/regnet/mask-rcnn_regnetx-3.2GF_fpn_ms-3x_coco.py +++ b/configs/regnet/mask-rcnn_regnetx-3.2GF_fpn_ms-3x_coco.py @@ -27,9 +27,7 @@ num_outs=5)) train_pipeline = [ - dict( - type='LoadImageFromFile', - file_client_args={{_base_.file_client_args}}), + dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), dict(type='LoadAnnotations', with_bbox=True, with_mask=True), dict( type='RandomChoiceResize', diff --git a/configs/resnest/cascade-mask-rcnn_s50_fpn_syncbn-backbone+head_ms-1x_coco.py b/configs/resnest/cascade-mask-rcnn_s50_fpn_syncbn-backbone+head_ms-1x_coco.py index 25ddc7a1a60..c6ef41c05cd 100644 --- a/configs/resnest/cascade-mask-rcnn_s50_fpn_syncbn-backbone+head_ms-1x_coco.py +++ b/configs/resnest/cascade-mask-rcnn_s50_fpn_syncbn-backbone+head_ms-1x_coco.py @@ -83,9 +83,7 @@ mask_head=dict(norm_cfg=norm_cfg))) train_pipeline = [ - dict( - type='LoadImageFromFile', - file_client_args={{_base_.file_client_args}}), + dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), dict( type='LoadAnnotations', with_bbox=True, diff --git a/configs/resnest/cascade-rcnn_s50_fpn_syncbn-backbone+head_ms-range-1x_coco.py b/configs/resnest/cascade-rcnn_s50_fpn_syncbn-backbone+head_ms-range-1x_coco.py index 97a3970e8b2..7ce7b56320a 100644 --- a/configs/resnest/cascade-rcnn_s50_fpn_syncbn-backbone+head_ms-range-1x_coco.py +++ b/configs/resnest/cascade-rcnn_s50_fpn_syncbn-backbone+head_ms-range-1x_coco.py @@ -81,9 +81,7 @@ ], )) train_pipeline = [ - dict( - type='LoadImageFromFile', - file_client_args={{_base_.file_client_args}}), + dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), dict(type='LoadAnnotations', with_bbox=True), dict( type='RandomResize', scale=[(1333, 640), (1333, 800)], diff --git a/configs/resnest/faster-rcnn_s50_fpn_syncbn-backbone+head_ms-range-1x_coco.py b/configs/resnest/faster-rcnn_s50_fpn_syncbn-backbone+head_ms-range-1x_coco.py index f64dcdc2518..8f0ec6e07af 100644 --- a/configs/resnest/faster-rcnn_s50_fpn_syncbn-backbone+head_ms-range-1x_coco.py +++ b/configs/resnest/faster-rcnn_s50_fpn_syncbn-backbone+head_ms-range-1x_coco.py @@ -27,9 +27,7 @@ norm_cfg=norm_cfg))) train_pipeline = [ - dict( - type='LoadImageFromFile', - file_client_args={{_base_.file_client_args}}), + dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), dict(type='LoadAnnotations', with_bbox=True), dict( type='RandomResize', scale=[(1333, 640), (1333, 800)], diff --git a/configs/resnest/mask-rcnn_s50_fpn_syncbn-backbone+head_ms-1x_coco.py b/configs/resnest/mask-rcnn_s50_fpn_syncbn-backbone+head_ms-1x_coco.py index 309228fea62..c6f27000862 100644 --- a/configs/resnest/mask-rcnn_s50_fpn_syncbn-backbone+head_ms-1x_coco.py +++ b/configs/resnest/mask-rcnn_s50_fpn_syncbn-backbone+head_ms-1x_coco.py @@ -28,9 +28,7 @@ mask_head=dict(norm_cfg=norm_cfg))) train_pipeline = [ - dict( - type='LoadImageFromFile', - file_client_args={{_base_.file_client_args}}), + dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), dict( type='LoadAnnotations', with_bbox=True, diff --git a/configs/retinanet/retinanet_r50-caffe_fpn_ms-1x_coco.py b/configs/retinanet/retinanet_r50-caffe_fpn_ms-1x_coco.py index e42a52746ad..24b6d60078f 100644 --- a/configs/retinanet/retinanet_r50-caffe_fpn_ms-1x_coco.py +++ b/configs/retinanet/retinanet_r50-caffe_fpn_ms-1x_coco.py @@ -4,7 +4,7 @@ dict(type='LoadImageFromFile'), dict(type='LoadAnnotations', with_bbox=True), dict( - type='RandomResize', + type='RandomChoiceResize', scale=[(1333, 640), (1333, 672), (1333, 704), (1333, 736), (1333, 768), (1333, 800)], keep_ratio=True), diff --git a/configs/retinanet/retinanet_r50_fpn_amp-1x_coco.py b/configs/retinanet/retinanet_r50_fpn_amp-1x_coco.py index 6b6cebe48a1..acf5266337b 100644 --- a/configs/retinanet/retinanet_r50_fpn_amp-1x_coco.py +++ b/configs/retinanet/retinanet_r50_fpn_amp-1x_coco.py @@ -1,3 +1,6 @@ _base_ = './retinanet_r50_fpn_1x_coco.py' -# fp16 settings -fp16 = dict(loss_scale=512.) + +# MMEngine support the following two ways, users can choose +# according to convenience +# optim_wrapper = dict(type='AmpOptimWrapper') +_base_.optim_wrapper.type = 'AmpOptimWrapper' diff --git a/configs/retinanet/retinanet_tta.py b/configs/retinanet/retinanet_tta.py index d56563ea780..d0f37e0ab25 100644 --- a/configs/retinanet/retinanet_tta.py +++ b/configs/retinanet/retinanet_tta.py @@ -4,7 +4,7 @@ img_scales = [(1333, 800), (666, 400), (2000, 1200)] tta_pipeline = [ - dict(type='LoadImageFromFile', file_client_args=dict(backend='disk')), + dict(type='LoadImageFromFile', backend_args=None), dict( type='TestTimeAug', transforms=[[ diff --git a/configs/rpn/metafile.yml b/configs/rpn/metafile.yml new file mode 100644 index 00000000000..9796ead6d2e --- /dev/null +++ b/configs/rpn/metafile.yml @@ -0,0 +1,127 @@ +Collections: + - Name: RPN + Metadata: + Training Data: COCO + Training Techniques: + - SGD with Momentum + - Weight Decay + Training Resources: 8x V100 GPUs + Architecture: + - FPN + - ResNet + Paper: + URL: https://arxiv.org/abs/1506.01497 + Title: "Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks" + README: configs/rpn/README.md + Code: + URL: https://github.com/open-mmlab/mmdetection/blob/v2.0.0/mmdet/models/detectors/rpn.py#L6 + Version: v2.0.0 + +Models: + - Name: rpn_r50-caffe_fpn_1x_coco + In Collection: RPN + Config: configs/rpn/rpn_r50-caffe_fpn_1x_coco.py + Metadata: + Training Memory (GB): 3.5 + Training Resources: 8x V100 GPUs + Epochs: 12 + Results: + - Task: Object Detection + Dataset: COCO + Metrics: + AR@1000: 58.7 + Weights: https://download.openmmlab.com/mmdetection/v2.0/rpn/rpn_r50_caffe_fpn_1x_coco/rpn_r50_caffe_fpn_1x_coco_20200531-5b903a37.pth + + - Name: rpn_r50_fpn_1x_coco + In Collection: RPN + Config: configs/rpn/rpn_r50_fpn_1x_coco.py + Metadata: + Training Memory (GB): 3.8 + Training Resources: 8x V100 GPUs + Epochs: 12 + Results: + - Task: Object Detection + Dataset: COCO + Metrics: + AR@1000: 58.2 + Weights: https://download.openmmlab.com/mmdetection/v2.0/rpn/rpn_r50_fpn_1x_coco/rpn_r50_fpn_1x_coco_20200218-5525fa2e.pth + + - Name: rpn_r50_fpn_2x_coco + In Collection: RPN + Config: rpn_r50_fpn_2x_coco.py + Metadata: + Epochs: 24 + Results: + - Task: Object Detection + Dataset: COCO + Metrics: + AR@1000: 58.6 + Weights: https://download.openmmlab.com/mmdetection/v2.0/rpn/rpn_r50_fpn_2x_coco/rpn_r50_fpn_2x_coco_20200131-0728c9b3.pth + + - Name: rpn_r101-caffe_fpn_1x_coco + In Collection: RPN + Config: configs/rpn/rpn_r101-caffe_fpn_1x_coco.py + Metadata: + Training Memory (GB): 5.4 + Training Resources: 8x V100 GPUs + Epochs: 12 + Results: + - Task: Object Detection + Dataset: COCO + Metrics: + AR@1000: 60.0 + Weights: https://download.openmmlab.com/mmdetection/v2.0/rpn/rpn_r101_caffe_fpn_1x_coco/rpn_r101_caffe_fpn_1x_coco_20200531-0629a2e2.pth + + - Name: rpn_x101-32x4d_fpn_1x_coco + In Collection: RPN + Config: configs/rpn/rpn_x101-32x4d_fpn_1x_coco.py + Metadata: + Training Memory (GB): 7.0 + Training Resources: 8x V100 GPUs + Epochs: 12 + Results: + - Task: Object Detection + Dataset: COCO + Metrics: + AR@1000: 60.6 + Weights: https://download.openmmlab.com/mmdetection/v2.0/rpn/rpn_x101_32x4d_fpn_1x_coco/rpn_x101_32x4d_fpn_1x_coco_20200219-b02646c6.pth + + - Name: rpn_x101-32x4d_fpn_2x_coco + In Collection: RPN + Config: configs/rpn/rpn_x101-32x4d_fpn_2x_coco.py + Metadata: + Training Resources: 8x V100 GPUs + Epochs: 24 + Results: + - Task: Object Detection + Dataset: COCO + Metrics: + AR@1000: 61.1 + Weights: https://download.openmmlab.com/mmdetection/v2.0/rpn/rpn_x101_32x4d_fpn_2x_coco/rpn_x101_32x4d_fpn_2x_coco_20200208-d22bd0bb.pth + + - Name: rpn_x101-64x4d_fpn_1x_coco + In Collection: RPN + Config: configs/rpn/rpn_x101-64x4d_fpn_1x_coco.py + Metadata: + Training Memory (GB): 10.1 + Training Resources: 8x V100 GPUs + Epochs: 12 + Results: + - Task: Object Detection + Dataset: COCO + Metrics: + AR@1000: 61.0 + Weights: https://download.openmmlab.com/mmdetection/v2.0/rpn/rpn_x101_64x4d_fpn_1x_coco/rpn_x101_64x4d_fpn_1x_coco_20200208-cde6f7dd.pth + + - Name: rpn_x101-64x4d_fpn_2x_coco + In Collection: RPN + Config: configs/rpn/rpn_x101-64x4d_fpn_2x_coco.py + Metadata: + Training Resources: 8x V100 GPUs + Epochs: 24 + Results: + - Task: Object Detection + Dataset: COCO + Metrics: + AR@1000: 61.5 + Weights: https://download.openmmlab.com/mmdetection/v2.0/rpn/rpn_x101_64x4d_fpn_2x_coco/rpn_x101_64x4d_fpn_2x_coco_20200208-c65f524f.pth diff --git a/configs/rpn/rpn_r50_fpn_1x_coco.py b/configs/rpn/rpn_r50_fpn_1x_coco.py index 692ff9e6650..7fe88d395b8 100644 --- a/configs/rpn/rpn_r50_fpn_1x_coco.py +++ b/configs/rpn/rpn_r50_fpn_1x_coco.py @@ -17,7 +17,7 @@ # type='CocoMetric', # ann_file=data_root + 'annotations/instances_val2017.json', # metric='proposal_fast', -# file_client_args={{_base_.file_client_args}}, +# backend_args={{_base_.backend_args}}, # format_only=False) # ] diff --git a/configs/rtmdet/README.md b/configs/rtmdet/README.md index b17a916b022..02c95466cc7 100644 --- a/configs/rtmdet/README.md +++ b/configs/rtmdet/README.md @@ -115,9 +115,9 @@ Here is a basic example of deploy RTMDet with [MMDeploy-1.x](https://github.com/ ### Step1. Install MMDeploy -Before starting the deployment, please make sure you install MMDetection-3.x and MMDeploy-1.x correctly. +Before starting the deployment, please make sure you install MMDetection and MMDeploy-1.x correctly. -- Install MMDetection-3.x, please refer to the [MMDetection-3.x installation guide](https://mmdetection.readthedocs.io/en/3.x/get_started.html). +- Install MMDetection, please refer to the [MMDetection installation guide](https://mmdetection.readthedocs.io/en/latest/get_started.html). - Install MMDeploy-1.x, please refer to the [MMDeploy-1.x installation guide](https://mmdeploy.readthedocs.io/en/1.x/get_started.html#installation). If you want to deploy RTMDet with ONNXRuntime, TensorRT, or other inference engine, @@ -378,3 +378,76 @@ result = inference_model( img='demo/resources/det.jpg', device='cuda:0') ``` + +### Model Config + +In MMDetection's config, we use `model` to set up detection algorithm components. In addition to neural network components such as `backbone`, `neck`, etc, it also requires `data_preprocessor`, `train_cfg`, and `test_cfg`. `data_preprocessor` is responsible for processing a batch of data output by dataloader. `train_cfg`, and `test_cfg` in the model config are for training and testing hyperparameters of the components.Taking RTMDet as an example, we will introduce each field in the config according to different function modules: + +```python +model = dict( + type='RTMDet', # The name of detector + data_preprocessor=dict( # The config of data preprocessor, usually includes image normalization and padding + type='DetDataPreprocessor', # The type of the data preprocessor. Refer to https://mmdetection.readthedocs.io/en/latest/api.html#mmdet.models.data_preprocessors.DetDataPreprocessor + mean=[103.53, 116.28, 123.675], # Mean values used to pre-training the pre-trained backbone models, ordered in R, G, B + std=[57.375, 57.12, 58.395], # Standard variance used to pre-training the pre-trained backbone models, ordered in R, G, B + bgr_to_rgb=False, # whether to convert image from BGR to RGB + batch_augments=None), # Batch-level augmentations + backbone=dict( # The config of backbone + type='CSPNeXt', # The type of backbone network. Refer to https://mmdetection.readthedocs.io/en/latest/api.html#mmdet.models.backbones.CSPNeXt + arch='P5', # Architecture of CSPNeXt, from {P5, P6}. Defaults to P5 + expand_ratio=0.5, # Ratio to adjust the number of channels of the hidden layer. Defaults to 0.5 + deepen_factor=1, # Depth multiplier, multiply number of blocks in CSP layer by this amount. Defaults to 1.0 + widen_factor=1, # Width multiplier, multiply number of channels in each layer by this amount. Defaults to 1.0 + channel_attention=True, # Whether to add channel attention in each stage. Defaults to True + norm_cfg=dict(type='SyncBN'), # Dictionary to construct and config norm layer. Defaults to dict(type=’BN’, requires_grad=True) + act_cfg=dict(type='SiLU', inplace=True)), # Config dict for activation layer. Defaults to dict(type=’SiLU’) + neck=dict( + type='CSPNeXtPAFPN', # The type of neck is CSPNeXtPAFPN. Refer to https://mmdetection.readthedocs.io/en/latest/api.html#mmdet.models.necks.CSPNeXtPAFPN + in_channels=[256, 512, 1024], # Number of input channels per scale + out_channels=256, # Number of output channels (used at each scale) + num_csp_blocks=3, # Number of bottlenecks in CSPLayer. Defaults to 3 + expand_ratio=0.5, # Ratio to adjust the number of channels of the hidden layer. Default: 0.5 + norm_cfg=dict(type='SyncBN'), # Config dict for normalization layer. Default: dict(type=’BN’) + act_cfg=dict(type='SiLU', inplace=True)), # Config dict for activation layer. Default: dict(type=’Swish’) + bbox_head=dict( + type='RTMDetSepBNHead', # The type of bbox_head is RTMDetSepBNHead. RTMDetHead with separated BN layers and shared conv layers. Refer to https://mmdetection.readthedocs.io/en/latest/api.html#mmdet.models.dense_heads.RTMDetSepBNHead + num_classes=80, # Number of categories excluding the background category + in_channels=256, # Number of channels in the input feature map + stacked_convs=2, # Whether to share conv layers between stages. Defaults to True + feat_channels=256, # Feature channels of convolutional layers in the head + anchor_generator=dict( # The config of anchor generator + type='MlvlPointGenerator', # The methods use MlvlPointGenerator. Refer to https://github.com/open-mmlab/mmdetection/blob/main/mmdet/models/task_modules/prior_generators/point_generator.py#L92 + offset=0, # The offset of points, the value is normalized with corresponding stride. Defaults to 0.5 + strides=[8, 16, 32]), # Strides of anchors in multiple feature levels in order (w, h) + bbox_coder=dict(type='DistancePointBBoxCoder'), # Distance Point BBox coder.This coder encodes gt bboxes (x1, y1, x2, y2) into (top, bottom, left,right) and decode it back to the original. Refer to https://github.com/open-mmlab/mmdetection/blob/main/mmdet/models/task_modules/coders/distance_point_bbox_coder.py#L9 + loss_cls=dict( # Config of loss function for the classification branch + type='QualityFocalLoss', # Type of loss for classification branch. Refer to https://mmdetection.readthedocs.io/en/latest/api.html#mmdet.models.losses.QualityFocalLoss + use_sigmoid=True, # Whether sigmoid operation is conducted in QFL. Defaults to True + beta=2.0, # The beta parameter for calculating the modulating factor. Defaults to 2.0 + loss_weight=1.0), # Loss weight of current loss + loss_bbox=dict( # Config of loss function for the regression branch + type='GIoULoss', # Type of loss. Refer to https://mmdetection.readthedocs.io/en/latest/api.html#mmdet.models.losses.GIoULoss + loss_weight=2.0), # Loss weight of the regression branch + with_objectness=False, # Whether to add an objectness branch. Defaults to True + exp_on_reg=True, # Whether to use .exp() in regression + share_conv=True, # Whether to share conv layers between stages. Defaults to True + pred_kernel_size=1, # Kernel size of prediction layer. Defaults to 1 + norm_cfg=dict(type='SyncBN'), # Config dict for normalization layer. Defaults to dict(type='BN', momentum=0.03, eps=0.001) + act_cfg=dict(type='SiLU', inplace=True)), # Config dict for activation layer. Defaults to dict(type='SiLU') + train_cfg=dict( # Config of training hyperparameters for ATSS + assigner=dict( # Config of assigner + type='DynamicSoftLabelAssigner', # Type of assigner. DynamicSoftLabelAssigner computes matching between predictions and ground truth with dynamic soft label assignment. Refer to https://github.com/open-mmlab/mmdetection/blob/main/mmdet/models/task_modules/assigners/dynamic_soft_label_assigner.py#L40 + topk=13), # Select top-k predictions to calculate dynamic k best matches for each gt. Defaults to 13 + allowed_border=-1, # The border allowed after padding for valid anchors + pos_weight=-1, # The weight of positive samples during training + debug=False), # Whether to set the debug mode + test_cfg=dict( # Config for testing hyperparameters for ATSS + nms_pre=30000, # The number of boxes before NMS + min_bbox_size=0, # The allowed minimal box size + score_thr=0.001, # Threshold to filter out boxes + nms=dict( # Config of NMS in the second stage + type='nms', # Type of NMS + iou_threshold=0.65), # NMS threshold + max_per_img=300), # Max number of detections of each image +) +``` diff --git a/configs/rtmdet/classification/README.md b/configs/rtmdet/classification/README.md index dbfef4c7249..6aee2c61794 100644 --- a/configs/rtmdet/classification/README.md +++ b/configs/rtmdet/classification/README.md @@ -43,7 +43,7 @@ bash ./tools/dist_train.sh \ [optional arguments] ``` -More details can be found in [user guides](https://mmdetection.readthedocs.io/en/3.x/user_guides/train.html). +More details can be found in [user guides](https://mmdetection.readthedocs.io/en/latest/user_guides/train.html). ## Results and Models diff --git a/configs/rtmdet/rtmdet-ins_l_8xb32-300e_coco.py b/configs/rtmdet/rtmdet-ins_l_8xb32-300e_coco.py index 1ecacab8044..6b4b9240a64 100644 --- a/configs/rtmdet/rtmdet-ins_l_8xb32-300e_coco.py +++ b/configs/rtmdet/rtmdet-ins_l_8xb32-300e_coco.py @@ -32,9 +32,7 @@ ) train_pipeline = [ - dict( - type='LoadImageFromFile', - file_client_args={{_base_.file_client_args}}), + dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), dict( type='LoadAnnotations', with_bbox=True, @@ -67,9 +65,7 @@ train_dataloader = dict(pin_memory=True, dataset=dict(pipeline=train_pipeline)) train_pipeline_stage2 = [ - dict( - type='LoadImageFromFile', - file_client_args={{_base_.file_client_args}}), + dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), dict( type='LoadAnnotations', with_bbox=True, diff --git a/configs/rtmdet/rtmdet-ins_s_8xb32-300e_coco.py b/configs/rtmdet/rtmdet-ins_s_8xb32-300e_coco.py index 7785f2ff208..28bc21cc93b 100644 --- a/configs/rtmdet/rtmdet-ins_s_8xb32-300e_coco.py +++ b/configs/rtmdet/rtmdet-ins_s_8xb32-300e_coco.py @@ -10,9 +10,7 @@ bbox_head=dict(in_channels=128, feat_channels=128)) train_pipeline = [ - dict( - type='LoadImageFromFile', - file_client_args={{_base_.file_client_args}}), + dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), dict( type='LoadAnnotations', with_bbox=True, @@ -43,9 +41,7 @@ ] train_pipeline_stage2 = [ - dict( - type='LoadImageFromFile', - file_client_args={{_base_.file_client_args}}), + dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), dict( type='LoadAnnotations', with_bbox=True, diff --git a/configs/rtmdet/rtmdet-ins_tiny_8xb32-300e_coco.py b/configs/rtmdet/rtmdet-ins_tiny_8xb32-300e_coco.py index 33b62878027..954f911614e 100644 --- a/configs/rtmdet/rtmdet-ins_tiny_8xb32-300e_coco.py +++ b/configs/rtmdet/rtmdet-ins_tiny_8xb32-300e_coco.py @@ -12,9 +12,7 @@ bbox_head=dict(in_channels=96, feat_channels=96)) train_pipeline = [ - dict( - type='LoadImageFromFile', - file_client_args={{_base_.file_client_args}}), + dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), dict( type='LoadAnnotations', with_bbox=True, diff --git a/configs/rtmdet/rtmdet_l_8xb32-300e_coco.py b/configs/rtmdet/rtmdet_l_8xb32-300e_coco.py index fc623fcc635..e4c46aadbda 100644 --- a/configs/rtmdet/rtmdet_l_8xb32-300e_coco.py +++ b/configs/rtmdet/rtmdet_l_8xb32-300e_coco.py @@ -62,9 +62,7 @@ ) train_pipeline = [ - dict( - type='LoadImageFromFile', - file_client_args={{_base_.file_client_args}}), + dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), dict(type='LoadAnnotations', with_bbox=True), dict(type='CachedMosaic', img_scale=(640, 640), pad_val=114.0), dict( @@ -86,9 +84,7 @@ ] train_pipeline_stage2 = [ - dict( - type='LoadImageFromFile', - file_client_args={{_base_.file_client_args}}), + dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), dict(type='LoadAnnotations', with_bbox=True), dict( type='RandomResize', @@ -103,9 +99,7 @@ ] test_pipeline = [ - dict( - type='LoadImageFromFile', - file_client_args={{_base_.file_client_args}}), + dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), dict(type='Resize', scale=(640, 640), keep_ratio=True), dict(type='Pad', size=(640, 640), pad_val=dict(img=(114, 114, 114))), dict( diff --git a/configs/rtmdet/rtmdet_s_8xb32-300e_coco.py b/configs/rtmdet/rtmdet_s_8xb32-300e_coco.py index 355918147cb..cbf76247b74 100644 --- a/configs/rtmdet/rtmdet_s_8xb32-300e_coco.py +++ b/configs/rtmdet/rtmdet_s_8xb32-300e_coco.py @@ -10,9 +10,7 @@ bbox_head=dict(in_channels=128, feat_channels=128, exp_on_reg=False)) train_pipeline = [ - dict( - type='LoadImageFromFile', - file_client_args={{_base_.file_client_args}}), + dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), dict(type='LoadAnnotations', with_bbox=True), dict(type='CachedMosaic', img_scale=(640, 640), pad_val=114.0), dict( @@ -34,9 +32,7 @@ ] train_pipeline_stage2 = [ - dict( - type='LoadImageFromFile', - file_client_args={{_base_.file_client_args}}), + dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), dict(type='LoadAnnotations', with_bbox=True), dict( type='RandomResize', diff --git a/configs/rtmdet/rtmdet_tiny_8xb32-300e_coco.py b/configs/rtmdet/rtmdet_tiny_8xb32-300e_coco.py index e05c4b169c1..a686f4a7f0c 100644 --- a/configs/rtmdet/rtmdet_tiny_8xb32-300e_coco.py +++ b/configs/rtmdet/rtmdet_tiny_8xb32-300e_coco.py @@ -12,9 +12,7 @@ bbox_head=dict(in_channels=96, feat_channels=96, exp_on_reg=False)) train_pipeline = [ - dict( - type='LoadImageFromFile', - file_client_args={{_base_.file_client_args}}), + dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), dict(type='LoadAnnotations', with_bbox=True), dict( type='CachedMosaic', diff --git a/configs/rtmdet/rtmdet_tta.py b/configs/rtmdet/rtmdet_tta.py index f4e003541e9..f7adcbc712a 100644 --- a/configs/rtmdet/rtmdet_tta.py +++ b/configs/rtmdet/rtmdet_tta.py @@ -4,7 +4,7 @@ img_scales = [(640, 640), (320, 320), (960, 960)] tta_pipeline = [ - dict(type='LoadImageFromFile', file_client_args=dict(backend='disk')), + dict(type='LoadImageFromFile', backend_args=None), dict( type='TestTimeAug', transforms=[ diff --git a/configs/sabl/sabl-retinanet_r101-gn_fpn_ms-480-960-2x_coco.py b/configs/sabl/sabl-retinanet_r101-gn_fpn_ms-480-960-2x_coco.py index 6d6e932d177..dc7209aebad 100644 --- a/configs/sabl/sabl-retinanet_r101-gn_fpn_ms-480-960-2x_coco.py +++ b/configs/sabl/sabl-retinanet_r101-gn_fpn_ms-480-960-2x_coco.py @@ -54,9 +54,7 @@ debug=False)) # dataset settings train_pipeline = [ - dict( - type='LoadImageFromFile', - file_client_args={{_base_.file_client_args}}), + dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), dict(type='LoadAnnotations', with_bbox=True), dict( type='RandomResize', scale=[(1333, 480), (1333, 960)], diff --git a/configs/sabl/sabl-retinanet_r101-gn_fpn_ms-640-800-2x_coco.py b/configs/sabl/sabl-retinanet_r101-gn_fpn_ms-640-800-2x_coco.py index 083c0c129c6..ac5f6d9811d 100644 --- a/configs/sabl/sabl-retinanet_r101-gn_fpn_ms-640-800-2x_coco.py +++ b/configs/sabl/sabl-retinanet_r101-gn_fpn_ms-640-800-2x_coco.py @@ -54,9 +54,7 @@ debug=False)) # dataset settings train_pipeline = [ - dict( - type='LoadImageFromFile', - file_client_args={{_base_.file_client_args}}), + dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), dict(type='LoadAnnotations', with_bbox=True), dict( type='RandomResize', scale=[(1333, 480), (1333, 800)], diff --git a/configs/scnet/README.md b/configs/scnet/README.md index 090827d048a..08dbfa87f56 100644 --- a/configs/scnet/README.md +++ b/configs/scnet/README.md @@ -46,7 +46,7 @@ The results on COCO 2017val are shown in the below table. (results on test-dev a ### Notes -- Training hyper-parameters are identical to those of [HTC](https://github.com/open-mmlab/mmdetection/tree/master/configs/htc). +- Training hyper-parameters are identical to those of [HTC](https://github.com/open-mmlab/mmdetection/tree/main/configs/htc). - TTA means Test Time Augmentation, which applies horizontal flip and multi-scale testing. Refer to [config](./scnet_r50_fpn_1x_coco.py). ## Citation diff --git a/configs/seesaw_loss/cascade-mask-rcnn_r101_fpn_seesaw-loss_random-ms-2x_lvis-v1.py b/configs/seesaw_loss/cascade-mask-rcnn_r101_fpn_seesaw-loss_random-ms-2x_lvis-v1.py index 9bb8df4cfb3..2a1a87d4203 100644 --- a/configs/seesaw_loss/cascade-mask-rcnn_r101_fpn_seesaw-loss_random-ms-2x_lvis-v1.py +++ b/configs/seesaw_loss/cascade-mask-rcnn_r101_fpn_seesaw-loss_random-ms-2x_lvis-v1.py @@ -80,9 +80,7 @@ # dataset settings train_pipeline = [ - dict( - type='LoadImageFromFile', - file_client_args={{_base_.file_client_args}}), + dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), dict(type='LoadAnnotations', with_bbox=True, with_mask=True), dict( type='RandomChoiceResize', diff --git a/configs/seesaw_loss/cascade-mask-rcnn_r101_fpn_seesaw-loss_sample1e-3-ms-2x_lvis-v1.py b/configs/seesaw_loss/cascade-mask-rcnn_r101_fpn_seesaw-loss_sample1e-3-ms-2x_lvis-v1.py index dd02b596675..0e7b4df9136 100644 --- a/configs/seesaw_loss/cascade-mask-rcnn_r101_fpn_seesaw-loss_sample1e-3-ms-2x_lvis-v1.py +++ b/configs/seesaw_loss/cascade-mask-rcnn_r101_fpn_seesaw-loss_sample1e-3-ms-2x_lvis-v1.py @@ -80,9 +80,7 @@ # dataset settings train_pipeline = [ - dict( - type='LoadImageFromFile', - file_client_args={{_base_.file_client_args}}), + dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), dict(type='LoadAnnotations', with_bbox=True, with_mask=True), dict( type='RandomChoiceResize', diff --git a/configs/seesaw_loss/mask-rcnn_r50_fpn_seesaw-loss_random-ms-2x_lvis-v1.py b/configs/seesaw_loss/mask-rcnn_r50_fpn_seesaw-loss_random-ms-2x_lvis-v1.py index 6f103768235..25c646c9c75 100644 --- a/configs/seesaw_loss/mask-rcnn_r50_fpn_seesaw-loss_random-ms-2x_lvis-v1.py +++ b/configs/seesaw_loss/mask-rcnn_r50_fpn_seesaw-loss_random-ms-2x_lvis-v1.py @@ -23,9 +23,7 @@ # dataset settings train_pipeline = [ - dict( - type='LoadImageFromFile', - file_client_args={{_base_.file_client_args}}), + dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), dict(type='LoadAnnotations', with_bbox=True, with_mask=True), dict( type='RandomChoiceResize', diff --git a/configs/seesaw_loss/mask-rcnn_r50_fpn_seesaw-loss_sample1e-3-ms-2x_lvis-v1.py b/configs/seesaw_loss/mask-rcnn_r50_fpn_seesaw-loss_sample1e-3-ms-2x_lvis-v1.py index 3106cc55bc7..d60320e0b78 100644 --- a/configs/seesaw_loss/mask-rcnn_r50_fpn_seesaw-loss_sample1e-3-ms-2x_lvis-v1.py +++ b/configs/seesaw_loss/mask-rcnn_r50_fpn_seesaw-loss_sample1e-3-ms-2x_lvis-v1.py @@ -23,9 +23,7 @@ # dataset settings train_pipeline = [ - dict( - type='LoadImageFromFile', - file_client_args={{_base_.file_client_args}}), + dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), dict(type='LoadAnnotations', with_bbox=True, with_mask=True), dict( type='RandomChoiceResize', diff --git a/configs/selfsup_pretrain/mask-rcnn_r50-mocov2-pre_fpn_ms-2x_coco.py b/configs/selfsup_pretrain/mask-rcnn_r50-mocov2-pre_fpn_ms-2x_coco.py index c73bf9e1a17..ddaebf5558a 100644 --- a/configs/selfsup_pretrain/mask-rcnn_r50-mocov2-pre_fpn_ms-2x_coco.py +++ b/configs/selfsup_pretrain/mask-rcnn_r50-mocov2-pre_fpn_ms-2x_coco.py @@ -13,9 +13,7 @@ type='Pretrained', checkpoint='./mocov2_r50_800ep_pretrain.pth'))) train_pipeline = [ - dict( - type='LoadImageFromFile', - file_client_args={{_base_.file_client_args}}), + dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), dict(type='LoadAnnotations', with_bbox=True, with_mask=True), dict( type='RandomResize', scale=[(1333, 640), (1333, 800)], diff --git a/configs/selfsup_pretrain/mask-rcnn_r50-swav-pre_fpn_ms-2x_coco.py b/configs/selfsup_pretrain/mask-rcnn_r50-swav-pre_fpn_ms-2x_coco.py index 8182cab1936..c393e0b3604 100644 --- a/configs/selfsup_pretrain/mask-rcnn_r50-swav-pre_fpn_ms-2x_coco.py +++ b/configs/selfsup_pretrain/mask-rcnn_r50-swav-pre_fpn_ms-2x_coco.py @@ -13,9 +13,7 @@ type='Pretrained', checkpoint='./swav_800ep_pretrain.pth.tar'))) train_pipeline = [ - dict( - type='LoadImageFromFile', - file_client_args={{_base_.file_client_args}}), + dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), dict(type='LoadAnnotations', with_bbox=True, with_mask=True), dict( type='RandomResize', scale=[(1333, 640), (1333, 800)], diff --git a/configs/soft_teacher/metafile.yml b/configs/soft_teacher/metafile.yml new file mode 100644 index 00000000000..a9fb3c2e312 --- /dev/null +++ b/configs/soft_teacher/metafile.yml @@ -0,0 +1,18 @@ +Collections: + - Name: SoftTeacher + Metadata: + Training Data: COCO + Training Techniques: + - SGD with Momentum + - Weight Decay + Training Resources: 8x V100 GPUs + Architecture: + - FPN + - ResNet + Paper: + URL: https://arxiv.org/abs/2106.09018 + Title: "End-to-End Semi-Supervised Object Detection with Soft Teacher" + README: configs/soft_teacher/README.md + Code: + URL: https://github.com/open-mmlab/mmdetection/blob/v3.0.0rc1/mmdet/models/detectors/soft_teacher.py#L20 + Version: v3.0.0rc1 diff --git a/configs/solo/decoupled-solo-light_r50_fpn_3x_coco.py b/configs/solo/decoupled-solo-light_r50_fpn_3x_coco.py index 47b0cc1f09c..fc35df3c3cb 100644 --- a/configs/solo/decoupled-solo-light_r50_fpn_3x_coco.py +++ b/configs/solo/decoupled-solo-light_r50_fpn_3x_coco.py @@ -25,9 +25,7 @@ norm_cfg=dict(type='GN', num_groups=32, requires_grad=True))) train_pipeline = [ - dict( - type='LoadImageFromFile', - file_client_args={{_base_.file_client_args}}), + dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), dict(type='LoadAnnotations', with_bbox=True, with_mask=True), dict( type='RandomChoiceResize', @@ -38,9 +36,7 @@ dict(type='PackDetInputs') ] test_pipeline = [ - dict( - type='LoadImageFromFile', - file_client_args={{_base_.file_client_args}}), + dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), dict(type='Resize', scale=(852, 512), keep_ratio=True), dict(type='LoadAnnotations', with_bbox=True, with_mask=True), dict( diff --git a/configs/solo/solo_r50_fpn_3x_coco.py b/configs/solo/solo_r50_fpn_3x_coco.py index c30d41f6d92..98a9505538c 100644 --- a/configs/solo/solo_r50_fpn_3x_coco.py +++ b/configs/solo/solo_r50_fpn_3x_coco.py @@ -1,9 +1,7 @@ _base_ = './solo_r50_fpn_1x_coco.py' train_pipeline = [ - dict( - type='LoadImageFromFile', - file_client_args={{_base_.file_client_args}}), + dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), dict(type='LoadAnnotations', with_bbox=True, with_mask=True), dict( type='RandomChoiceResize', diff --git a/configs/solov2/solov2-light_r50_fpn_ms-3x_coco.py b/configs/solov2/solov2-light_r50_fpn_ms-3x_coco.py index eb1e854d5ae..cf0a7f779c0 100644 --- a/configs/solov2/solov2-light_r50_fpn_ms-3x_coco.py +++ b/configs/solov2/solov2-light_r50_fpn_ms-3x_coco.py @@ -10,9 +10,7 @@ # dataset settings train_pipeline = [ - dict( - type='LoadImageFromFile', - file_client_args={{_base_.file_client_args}}), + dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), dict(type='LoadAnnotations', with_bbox=True, with_mask=True), dict( type='RandomChoiceResize', @@ -23,9 +21,7 @@ dict(type='PackDetInputs') ] test_pipeline = [ - dict( - type='LoadImageFromFile', - file_client_args={{_base_.file_client_args}}), + dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), dict(type='Resize', scale=(448, 768), keep_ratio=True), dict(type='LoadAnnotations', with_bbox=True, with_mask=True), dict( diff --git a/configs/solov2/solov2_r50_fpn_ms-3x_coco.py b/configs/solov2/solov2_r50_fpn_ms-3x_coco.py index b51cff8e594..d6f09827efb 100644 --- a/configs/solov2/solov2_r50_fpn_ms-3x_coco.py +++ b/configs/solov2/solov2_r50_fpn_ms-3x_coco.py @@ -1,9 +1,7 @@ _base_ = './solov2_r50_fpn_1x_coco.py' train_pipeline = [ - dict( - type='LoadImageFromFile', - file_client_args={{_base_.file_client_args}}), + dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), dict(type='LoadAnnotations', with_bbox=True, with_mask=True), dict( type='RandomChoiceResize', @@ -17,7 +15,7 @@ # training schedule for 3x max_epochs = 36 -train_cfg = dict(by_epoch=True, max_epochs=max_epochs) +train_cfg = dict(max_epochs=max_epochs) # learning rate param_scheduler = [ diff --git a/configs/sparse_rcnn/sparse-rcnn_r50_fpn_300-proposals_crop-ms-480-800-3x_coco.py b/configs/sparse_rcnn/sparse-rcnn_r50_fpn_300-proposals_crop-ms-480-800-3x_coco.py index 98a7398f969..93edc0314b5 100644 --- a/configs/sparse_rcnn/sparse-rcnn_r50_fpn_300-proposals_crop-ms-480-800-3x_coco.py +++ b/configs/sparse_rcnn/sparse-rcnn_r50_fpn_300-proposals_crop-ms-480-800-3x_coco.py @@ -7,9 +7,7 @@ # augmentation strategy originates from DETR. train_pipeline = [ - dict( - type='LoadImageFromFile', - file_client_args={{_base_.file_client_args}}), + dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), dict(type='LoadAnnotations', with_bbox=True), dict(type='RandomFlip', prob=0.5), dict( diff --git a/configs/sparse_rcnn/sparse-rcnn_r50_fpn_ms-480-800-3x_coco.py b/configs/sparse_rcnn/sparse-rcnn_r50_fpn_ms-480-800-3x_coco.py index f7c7a4a4de5..156028d7cdd 100644 --- a/configs/sparse_rcnn/sparse-rcnn_r50_fpn_ms-480-800-3x_coco.py +++ b/configs/sparse_rcnn/sparse-rcnn_r50_fpn_ms-480-800-3x_coco.py @@ -1,9 +1,7 @@ _base_ = './sparse-rcnn_r50_fpn_1x_coco.py' train_pipeline = [ - dict( - type='LoadImageFromFile', - file_client_args={{_base_.file_client_args}}), + dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), dict(type='LoadAnnotations', with_bbox=True), dict( type='RandomChoiceResize', diff --git a/configs/ssd/ssd300_coco.py b/configs/ssd/ssd300_coco.py index 4ce1a3c314b..796d25c9053 100644 --- a/configs/ssd/ssd300_coco.py +++ b/configs/ssd/ssd300_coco.py @@ -6,7 +6,7 @@ # dataset settings input_size = 300 train_pipeline = [ - dict(type='LoadImageFromFile'), + dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), dict(type='LoadAnnotations', with_bbox=True), dict( type='Expand', @@ -28,7 +28,7 @@ dict(type='PackDetInputs') ] test_pipeline = [ - dict(type='LoadImageFromFile'), + dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), dict(type='Resize', scale=(input_size, input_size), keep_ratio=False), dict(type='LoadAnnotations', with_bbox=True), dict( @@ -50,7 +50,8 @@ ann_file='annotations/instances_train2017.json', data_prefix=dict(img='train2017/'), filter_cfg=dict(filter_empty_gt=True, min_size=32), - pipeline=train_pipeline))) + pipeline=train_pipeline, + backend_args={{_base_.backend_args}}))) val_dataloader = dict(batch_size=8, dataset=dict(pipeline=test_pipeline)) test_dataloader = val_dataloader diff --git a/configs/ssd/ssd512_coco.py b/configs/ssd/ssd512_coco.py index 16140be2d24..7acd6144202 100644 --- a/configs/ssd/ssd512_coco.py +++ b/configs/ssd/ssd512_coco.py @@ -20,7 +20,7 @@ # dataset settings train_pipeline = [ - dict(type='LoadImageFromFile'), + dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), dict(type='LoadAnnotations', with_bbox=True), dict( type='Expand', @@ -42,7 +42,7 @@ dict(type='PackDetInputs') ] test_pipeline = [ - dict(type='LoadImageFromFile'), + dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), dict(type='Resize', scale=(input_size, input_size), keep_ratio=False), dict(type='LoadAnnotations', with_bbox=True), dict( diff --git a/configs/strong_baselines/mask-rcnn_r50-caffe_fpn_rpn-2conv_4conv1fc_syncbn-all_lsj-100e_coco.py b/configs/strong_baselines/mask-rcnn_r50-caffe_fpn_rpn-2conv_4conv1fc_syncbn-all_lsj-100e_coco.py index 3f809cc5ad8..70e92a82e0c 100644 --- a/configs/strong_baselines/mask-rcnn_r50-caffe_fpn_rpn-2conv_4conv1fc_syncbn-all_lsj-100e_coco.py +++ b/configs/strong_baselines/mask-rcnn_r50-caffe_fpn_rpn-2conv_4conv1fc_syncbn-all_lsj-100e_coco.py @@ -37,9 +37,7 @@ mask_head=dict(norm_cfg=head_norm_cfg))) train_pipeline = [ - dict( - type='LoadImageFromFile', - file_client_args={{_base_.file_client_args}}), + dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), dict(type='LoadAnnotations', with_bbox=True, with_mask=True), dict( type='RandomResize', @@ -57,9 +55,7 @@ dict(type='PackDetInputs') ] test_pipeline = [ - dict( - type='LoadImageFromFile', - file_client_args={{_base_.file_client_args}}), + dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), dict(type='Resize', scale=(1333, 800), keep_ratio=True), dict(type='LoadAnnotations', with_bbox=True, with_mask=True), dict( diff --git a/configs/strong_baselines/metafile.yml b/configs/strong_baselines/metafile.yml new file mode 100644 index 00000000000..f72c07e64b6 --- /dev/null +++ b/configs/strong_baselines/metafile.yml @@ -0,0 +1,24 @@ +Models: + - Name: mask-rcnn_r50-caffe_fpn_rpn-2conv_4conv1fc_syncbn-all_lsj-100e_coco + In Collection: Mask R-CNN + Config: configs/strong_baselines/mask-rcnn_r50-caffe_fpn_rpn-2conv_4conv1fc_syncbn-all_lsj-100e_coco.py + Metadata: + Epochs: 100 + Training Data: COCO + Training Techniques: + - SGD with Momentum + - Weight Decay + - LSJ + Training Resources: 8x V100 GPUs + Architecture: + - ResNet + - FPN + Results: + - Task: Object Detection + Dataset: COCO + Metrics: + box AP: 44.7 + - Task: Instance Segmentation + Dataset: COCO + Metrics: + box AP: 40.4 diff --git a/configs/swin/mask-rcnn_swin-t-p4-w7_fpn_ms-crop-3x_coco.py b/configs/swin/mask-rcnn_swin-t-p4-w7_fpn_ms-crop-3x_coco.py index 37448b0b77d..7024b73249c 100644 --- a/configs/swin/mask-rcnn_swin-t-p4-w7_fpn_ms-crop-3x_coco.py +++ b/configs/swin/mask-rcnn_swin-t-p4-w7_fpn_ms-crop-3x_coco.py @@ -30,9 +30,7 @@ # augmentation strategy originates from DETR / Sparse RCNN train_pipeline = [ - dict( - type='LoadImageFromFile', - file_client_args={{_base_.file_client_args}}), + dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), dict(type='LoadAnnotations', with_bbox=True, with_mask=True), dict(type='RandomFlip', prob=0.5), dict( diff --git a/configs/tood/tood_r50_fpn_ms-2x_coco.py b/configs/tood/tood_r50_fpn_ms-2x_coco.py index 93d1d47521d..ffb296dccee 100644 --- a/configs/tood/tood_r50_fpn_ms-2x_coco.py +++ b/configs/tood/tood_r50_fpn_ms-2x_coco.py @@ -19,9 +19,7 @@ # multi-scale training train_pipeline = [ - dict( - type='LoadImageFromFile', - file_client_args={{_base_.file_client_args}}), + dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), dict(type='LoadAnnotations', with_bbox=True), dict( type='RandomResize', scale=[(1333, 480), (1333, 800)], diff --git a/configs/tridentnet/tridentnet_r50-caffe_ms-1x_coco.py b/configs/tridentnet/tridentnet_r50-caffe_ms-1x_coco.py index a3a88908b9e..806d20b90c9 100644 --- a/configs/tridentnet/tridentnet_r50-caffe_ms-1x_coco.py +++ b/configs/tridentnet/tridentnet_r50-caffe_ms-1x_coco.py @@ -1,9 +1,7 @@ _base_ = 'tridentnet_r50-caffe_1x_coco.py' train_pipeline = [ - dict( - type='LoadImageFromFile', - file_client_args={{_base_.file_client_args}}), + dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), dict(type='LoadAnnotations', with_bbox=True), dict( type='RandomChoiceResize', diff --git a/configs/vfnet/vfnet_r50_fpn_1x_coco.py b/configs/vfnet/vfnet_r50_fpn_1x_coco.py index d45e5824086..99bc3b5f4c7 100644 --- a/configs/vfnet/vfnet_r50_fpn_1x_coco.py +++ b/configs/vfnet/vfnet_r50_fpn_1x_coco.py @@ -64,18 +64,14 @@ # data setting train_pipeline = [ - dict( - type='LoadImageFromFile', - file_client_args={{_base_.file_client_args}}), + dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), dict(type='LoadAnnotations', with_bbox=True), dict(type='Resize', scale=(1333, 800), keep_ratio=True), dict(type='RandomFlip', prob=0.5), dict(type='PackDetInputs') ] test_pipeline = [ - dict( - type='LoadImageFromFile', - file_client_args={{_base_.file_client_args}}), + dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), dict(type='Resize', scale=(1333, 800), keep_ratio=True), dict(type='LoadAnnotations', with_bbox=True), dict( diff --git a/configs/vfnet/vfnet_r50_fpn_ms-2x_coco.py b/configs/vfnet/vfnet_r50_fpn_ms-2x_coco.py index 95ce40fa1ac..0f8eed298e8 100644 --- a/configs/vfnet/vfnet_r50_fpn_ms-2x_coco.py +++ b/configs/vfnet/vfnet_r50_fpn_ms-2x_coco.py @@ -1,8 +1,6 @@ _base_ = './vfnet_r50_fpn_1x_coco.py' train_pipeline = [ - dict( - type='LoadImageFromFile', - file_client_args={{_base_.file_client_args}}), + dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), dict(type='LoadAnnotations', with_bbox=True), dict( type='RandomResize', scale=[(1333, 480), (1333, 960)], @@ -11,9 +9,7 @@ dict(type='PackDetInputs') ] test_pipeline = [ - dict( - type='LoadImageFromFile', - file_client_args={{_base_.file_client_args}}), + dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), dict(type='Resize', scale=(1333, 800), keep_ratio=True), dict(type='LoadAnnotations', with_bbox=True), dict( diff --git a/configs/wider_face/retinanet_r50_fpn_1x_widerface.py b/configs/wider_face/retinanet_r50_fpn_1x_widerface.py new file mode 100644 index 00000000000..78067255f8f --- /dev/null +++ b/configs/wider_face/retinanet_r50_fpn_1x_widerface.py @@ -0,0 +1,10 @@ +_base_ = [ + '../_base_/models/retinanet_r50_fpn.py', + '../_base_/datasets/wider_face.py', '../_base_/schedules/schedule_1x.py', + '../_base_/default_runtime.py' +] +# model settings +model = dict(bbox_head=dict(num_classes=1)) +# optimizer +optim_wrapper = dict( + optimizer=dict(type='SGD', lr=0.01, momentum=0.9, weight_decay=0.0001)) diff --git a/configs/wider_face/ssd300_24e_widerface.py b/configs/wider_face/ssd300_24e_widerface.py deleted file mode 100644 index cb16dae0ae3..00000000000 --- a/configs/wider_face/ssd300_24e_widerface.py +++ /dev/null @@ -1,20 +0,0 @@ -_base_ = [ - '../_base_/models/ssd300.py', '../_base_/datasets/wider_face.py', - '../_base_/default_runtime.py' -] -model = dict(bbox_head=dict(num_classes=1)) -# optimizer -optimizer = dict(type='SGD', lr=0.012, momentum=0.9, weight_decay=5e-4) -optimizer_config = dict() -# learning policy -lr_config = dict( - policy='step', - warmup='linear', - warmup_iters=1000, - warmup_ratio=0.001, - step=[16, 20]) -# runtime settings -runner = dict(type='EpochBasedRunner', max_epochs=24) -log_config = dict(interval=1) - -# TODO add auto-scale-lr after a series of experiments diff --git a/configs/wider_face/ssd300_8xb32-24e_widerface.py b/configs/wider_face/ssd300_8xb32-24e_widerface.py new file mode 100644 index 00000000000..02c3c927f78 --- /dev/null +++ b/configs/wider_face/ssd300_8xb32-24e_widerface.py @@ -0,0 +1,64 @@ +_base_ = [ + '../_base_/models/ssd300.py', '../_base_/datasets/wider_face.py', + '../_base_/default_runtime.py', '../_base_/schedules/schedule_2x.py' +] +model = dict(bbox_head=dict(num_classes=1)) + +train_pipeline = [ + dict(type='LoadImageFromFile', backend_args=_base_.backend_args), + dict(type='LoadAnnotations', with_bbox=True), + dict( + type='PhotoMetricDistortion', + brightness_delta=32, + contrast_range=(0.5, 1.5), + saturation_range=(0.5, 1.5), + hue_delta=18), + dict( + type='Expand', + mean={{_base_.model.data_preprocessor.mean}}, + to_rgb={{_base_.model.data_preprocessor.bgr_to_rgb}}, + ratio_range=(1, 4)), + dict( + type='MinIoURandomCrop', + min_ious=(0.1, 0.3, 0.5, 0.7, 0.9), + min_crop_size=0.3), + dict(type='Resize', scale=(300, 300), keep_ratio=False), + dict(type='RandomFlip', prob=0.5), + dict(type='PackDetInputs') +] + +test_pipeline = [ + dict(type='LoadImageFromFile', backend_args=_base_.backend_args), + dict(type='Resize', scale=(300, 300), keep_ratio=False), + dict(type='LoadAnnotations', with_bbox=True), + dict( + type='PackDetInputs', + meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape', + 'scale_factor')) +] + +dataset_type = 'WIDERFaceDataset' +data_root = 'data/WIDERFace/' +train_dataloader = dict( + batch_size=32, num_workers=8, dataset=dict(pipeline=train_pipeline)) + +val_dataloader = dict(dataset=dict(pipeline=test_pipeline)) +test_dataloader = val_dataloader + +# learning rate +param_scheduler = [ + dict( + type='LinearLR', start_factor=0.001, by_epoch=False, begin=0, + end=1000), + dict(type='MultiStepLR', by_epoch=True, milestones=[16, 20], gamma=0.1) +] + +# optimizer +optim_wrapper = dict( + optimizer=dict(lr=0.012, momentum=0.9, weight_decay=5e-4), + clip_grad=dict(max_norm=35, norm_type=2)) + +# NOTE: `auto_scale_lr` is for automatically scaling LR, +# USER SHOULD NOT CHANGE ITS VALUES. +# base_batch_size = (8 GPUs) x (32 samples per GPU) +auto_scale_lr = dict(base_batch_size=256) diff --git a/configs/yolact/metafile.yml b/configs/yolact/metafile.yml index 6b01a94c9bd..9ca76b3d391 100644 --- a/configs/yolact/metafile.yml +++ b/configs/yolact/metafile.yml @@ -24,6 +24,7 @@ Models: Metadata: Training Resources: 1x V100 GPU Batch Size: 8 + Epochs: 55 inference time (ms/im): - value: 23.53 hardware: V100 @@ -43,6 +44,7 @@ Models: Config: configs/yolact/yolact_r50_8xb8-55e_coco.py Metadata: Batch Size: 64 + Epochs: 55 inference time (ms/im): - value: 23.53 hardware: V100 @@ -63,6 +65,7 @@ Models: Metadata: Training Resources: 1x V100 GPU Batch Size: 8 + Epochs: 55 inference time (ms/im): - value: 29.85 hardware: V100 diff --git a/configs/yolact/yolact_r50_1xb8-55e_coco.py b/configs/yolact/yolact_r50_1xb8-55e_coco.py index 4866f04ddf4..b7dabf1548a 100644 --- a/configs/yolact/yolact_r50_1xb8-55e_coco.py +++ b/configs/yolact/yolact_r50_1xb8-55e_coco.py @@ -95,9 +95,7 @@ mask_thr_binary=0.5)) # dataset settings train_pipeline = [ - dict( - type='LoadImageFromFile', - file_client_args={{_base_.file_client_args}}), + dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), dict(type='LoadAnnotations', with_bbox=True, with_mask=True), dict(type='FilterAnnotations', min_gt_bbox_wh=(4.0, 4.0)), dict( @@ -120,7 +118,7 @@ dict(type='PackDetInputs') ] test_pipeline = [ - dict(type='LoadImageFromFile'), + dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), dict(type='Resize', scale=(input_size, input_size), keep_ratio=False), dict(type='LoadAnnotations', with_bbox=True, with_mask=True), dict( diff --git a/configs/yolo/yolov3_d53_8xb8-320-273e_coco.py b/configs/yolo/yolov3_d53_8xb8-320-273e_coco.py index f1ae4248a8d..a3d08dd7706 100644 --- a/configs/yolo/yolov3_d53_8xb8-320-273e_coco.py +++ b/configs/yolo/yolov3_d53_8xb8-320-273e_coco.py @@ -1,17 +1,8 @@ _base_ = './yolov3_d53_8xb8-ms-608-273e_coco.py' -# dataset settings -# file_client_args = dict( -# backend='petrel', -# path_mapping=dict({ -# './data/': 's3://openmmlab/datasets/detection/', -# 'data/': 's3://openmmlab/datasets/detection/' -# })) - -file_client_args = dict(backend='disk') input_size = (320, 320) train_pipeline = [ - dict(type='LoadImageFromFile', file_client_args=file_client_args), + dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), dict(type='LoadAnnotations', with_bbox=True), # `mean` and `to_rgb` should be the same with the `preprocess_cfg` dict(type='Expand', mean=[0, 0, 0], to_rgb=True, ratio_range=(1, 2)), @@ -25,7 +16,7 @@ dict(type='PackDetInputs') ] test_pipeline = [ - dict(type='LoadImageFromFile', file_client_args=file_client_args), + dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), dict(type='Resize', scale=input_size, keep_ratio=True), dict(type='LoadAnnotations', with_bbox=True), dict( diff --git a/configs/yolo/yolov3_d53_8xb8-ms-416-273e_coco.py b/configs/yolo/yolov3_d53_8xb8-ms-416-273e_coco.py index be098c8352d..ca0127e83ed 100644 --- a/configs/yolo/yolov3_d53_8xb8-ms-416-273e_coco.py +++ b/configs/yolo/yolov3_d53_8xb8-ms-416-273e_coco.py @@ -1,15 +1,7 @@ _base_ = './yolov3_d53_8xb8-ms-608-273e_coco.py' -# dataset settings -# file_client_args = dict( -# backend='petrel', -# path_mapping=dict({ -# './data/': 's3://openmmlab/datasets/detection/', -# 'data/': 's3://openmmlab/datasets/detection/' -# })) -file_client_args = dict(backend='disk') train_pipeline = [ - dict(type='LoadImageFromFile', file_client_args=file_client_args), + dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), dict(type='LoadAnnotations', with_bbox=True), # `mean` and `to_rgb` should be the same with the `preprocess_cfg` dict(type='Expand', mean=[0, 0, 0], to_rgb=True, ratio_range=(1, 2)), @@ -23,7 +15,7 @@ dict(type='PackDetInputs') ] test_pipeline = [ - dict(type='LoadImageFromFile', file_client_args=file_client_args), + dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), dict(type='Resize', scale=(416, 416), keep_ratio=True), dict(type='LoadAnnotations', with_bbox=True), dict( diff --git a/configs/yolo/yolov3_d53_8xb8-ms-608-273e_coco.py b/configs/yolo/yolov3_d53_8xb8-ms-608-273e_coco.py index 287e09485cb..d4a36dfdaaf 100644 --- a/configs/yolo/yolov3_d53_8xb8-ms-608-273e_coco.py +++ b/configs/yolo/yolov3_d53_8xb8-ms-608-273e_coco.py @@ -66,16 +66,23 @@ dataset_type = 'CocoDataset' data_root = 'data/coco/' -# file_client_args = dict( +# Example to use different file client +# Method 1: simply set the data root and let the file I/O module +# automatically infer from prefix (not support LMDB and Memcache yet) + +# data_root = 's3://openmmlab/datasets/detection/coco/' + +# Method 2: Use `backend_args`, `file_client_args` in versions before 3.0.0rc6 +# backend_args = dict( # backend='petrel', # path_mapping=dict({ # './data/': 's3://openmmlab/datasets/detection/', # 'data/': 's3://openmmlab/datasets/detection/' # })) -file_client_args = dict(backend='disk') +backend_args = None train_pipeline = [ - dict(type='LoadImageFromFile', file_client_args=file_client_args), + dict(type='LoadImageFromFile', backend_args=backend_args), dict(type='LoadAnnotations', with_bbox=True), dict( type='Expand', @@ -92,7 +99,7 @@ dict(type='PackDetInputs') ] test_pipeline = [ - dict(type='LoadImageFromFile', file_client_args=file_client_args), + dict(type='LoadImageFromFile', backend_args=backend_args), dict(type='Resize', scale=(608, 608), keep_ratio=True), dict(type='LoadAnnotations', with_bbox=True), dict( @@ -113,7 +120,8 @@ ann_file='annotations/instances_train2017.json', data_prefix=dict(img='train2017/'), filter_cfg=dict(filter_empty_gt=True, min_size=32), - pipeline=train_pipeline)) + pipeline=train_pipeline, + backend_args=backend_args)) val_dataloader = dict( batch_size=1, num_workers=2, @@ -126,13 +134,15 @@ ann_file='annotations/instances_val2017.json', data_prefix=dict(img='val2017/'), test_mode=True, - pipeline=test_pipeline)) + pipeline=test_pipeline, + backend_args=backend_args)) test_dataloader = val_dataloader val_evaluator = dict( type='CocoMetric', ann_file=data_root + 'annotations/instances_val2017.json', - metric='bbox') + metric='bbox', + backend_args=backend_args) test_evaluator = val_evaluator train_cfg = dict(max_epochs=273, val_interval=7) diff --git a/configs/yolo/yolov3_mobilenetv2_8xb24-320-300e_coco.py b/configs/yolo/yolov3_mobilenetv2_8xb24-320-300e_coco.py index a8eb5dd1647..07b39373432 100644 --- a/configs/yolo/yolov3_mobilenetv2_8xb24-320-300e_coco.py +++ b/configs/yolo/yolov3_mobilenetv2_8xb24-320-300e_coco.py @@ -9,17 +9,9 @@ [(10, 15), (24, 36), (72, 42)]]))) # yapf:enable -# file_client_args = dict( -# backend='petrel', -# path_mapping=dict({ -# './data/': 's3://openmmlab/datasets/detection/', -# 'data/': 's3://openmmlab/datasets/detection/' -# })) -file_client_args = dict(backend='disk') - input_size = (320, 320) train_pipeline = [ - dict(type='LoadImageFromFile', file_client_args=file_client_args), + dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), dict(type='LoadAnnotations', with_bbox=True), # `mean` and `to_rgb` should be the same with the `preprocess_cfg` dict( @@ -37,7 +29,7 @@ dict(type='PackDetInputs') ] test_pipeline = [ - dict(type='LoadImageFromFile', file_client_args=file_client_args), + dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), dict(type='Resize', scale=input_size, keep_ratio=True), dict(type='LoadAnnotations', with_bbox=True), dict( diff --git a/configs/yolo/yolov3_mobilenetv2_8xb24-ms-416-300e_coco.py b/configs/yolo/yolov3_mobilenetv2_8xb24-ms-416-300e_coco.py index 67116f4000f..9a161b66fe9 100644 --- a/configs/yolo/yolov3_mobilenetv2_8xb24-ms-416-300e_coco.py +++ b/configs/yolo/yolov3_mobilenetv2_8xb24-ms-416-300e_coco.py @@ -67,16 +67,23 @@ dataset_type = 'CocoDataset' data_root = 'data/coco/' -# file_client_args = dict( +# Example to use different file client +# Method 1: simply set the data root and let the file I/O module +# automatically infer from prefix (not support LMDB and Memcache yet) + +# data_root = 's3://openmmlab/datasets/detection/coco/' + +# Method 2: Use `backend_args`, `file_client_args` in versions before 3.0.0rc6 +# backend_args = dict( # backend='petrel', # path_mapping=dict({ # './data/': 's3://openmmlab/datasets/detection/', # 'data/': 's3://openmmlab/datasets/detection/' # })) -file_client_args = dict(backend='disk') +backend_args = None train_pipeline = [ - dict(type='LoadImageFromFile', file_client_args=file_client_args), + dict(type='LoadImageFromFile', backend_args=backend_args), dict(type='LoadAnnotations', with_bbox=True), dict( type='Expand', @@ -93,7 +100,7 @@ dict(type='PackDetInputs') ] test_pipeline = [ - dict(type='LoadImageFromFile', file_client_args=file_client_args), + dict(type='LoadImageFromFile', backend_args=backend_args), dict(type='Resize', scale=(416, 416), keep_ratio=True), dict(type='LoadAnnotations', with_bbox=True), dict( @@ -117,7 +124,8 @@ ann_file='annotations/instances_train2017.json', data_prefix=dict(img='train2017/'), filter_cfg=dict(filter_empty_gt=True, min_size=32), - pipeline=train_pipeline))) + pipeline=train_pipeline, + backend_args=backend_args))) val_dataloader = dict( batch_size=24, num_workers=4, @@ -130,13 +138,15 @@ ann_file='annotations/instances_val2017.json', data_prefix=dict(img='val2017/'), test_mode=True, - pipeline=test_pipeline)) + pipeline=test_pipeline, + backend_args=backend_args)) test_dataloader = val_dataloader val_evaluator = dict( type='CocoMetric', ann_file=data_root + 'annotations/instances_val2017.json', - metric='bbox') + metric='bbox', + backend_args=backend_args) test_evaluator = val_evaluator train_cfg = dict(max_epochs=30) diff --git a/configs/yolof/yolof_r50-c5_8xb8-1x_coco.py b/configs/yolof/yolof_r50-c5_8xb8-1x_coco.py index b2637799712..5ea228e3e32 100644 --- a/configs/yolof/yolof_r50-c5_8xb8-1x_coco.py +++ b/configs/yolof/yolof_r50-c5_8xb8-1x_coco.py @@ -89,9 +89,7 @@ ] train_pipeline = [ - dict( - type='LoadImageFromFile', - file_client_args={{_base_.file_client_args}}), + dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), dict(type='LoadAnnotations', with_bbox=True), dict(type='Resize', scale=(1333, 800), keep_ratio=True), dict(type='RandomFlip', prob=0.5), @@ -99,9 +97,7 @@ dict(type='PackDetInputs') ] test_pipeline = [ - dict( - type='LoadImageFromFile', - file_client_args={{_base_.file_client_args}}), + dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), dict(type='Resize', scale=(1333, 800), keep_ratio=True), dict(type='LoadAnnotations', with_bbox=True), dict( diff --git a/configs/yolox/yolox_s_8xb8-300e_coco.py b/configs/yolox/yolox_s_8xb8-300e_coco.py index 0e6bb2d1dc8..3e324eb5b99 100644 --- a/configs/yolox/yolox_s_8xb8-300e_coco.py +++ b/configs/yolox/yolox_s_8xb8-300e_coco.py @@ -73,13 +73,20 @@ data_root = 'data/coco/' dataset_type = 'CocoDataset' -# file_client_args = dict( +# Example to use different file client +# Method 1: simply set the data root and let the file I/O module +# automatically infer from prefix (not support LMDB and Memcache yet) + +# data_root = 's3://openmmlab/datasets/detection/coco/' + +# Method 2: Use `backend_args`, `file_client_args` in versions before 3.0.0rc6 +# backend_args = dict( # backend='petrel', # path_mapping=dict({ # './data/': 's3://openmmlab/datasets/detection/', # 'data/': 's3://openmmlab/datasets/detection/' # })) -file_client_args = dict(backend='disk') +backend_args = None train_pipeline = [ dict(type='Mosaic', img_scale=img_scale, pad_val=114.0), @@ -120,14 +127,15 @@ ann_file='annotations/instances_train2017.json', data_prefix=dict(img='train2017/'), pipeline=[ - dict(type='LoadImageFromFile', file_client_args=file_client_args), + dict(type='LoadImageFromFile', backend_args=backend_args), dict(type='LoadAnnotations', with_bbox=True) ], - filter_cfg=dict(filter_empty_gt=False, min_size=32)), + filter_cfg=dict(filter_empty_gt=False, min_size=32), + backend_args=backend_args), pipeline=train_pipeline) test_pipeline = [ - dict(type='LoadImageFromFile', file_client_args=file_client_args), + dict(type='LoadImageFromFile', backend_args=backend_args), dict(type='Resize', scale=img_scale, keep_ratio=True), dict( type='Pad', @@ -158,13 +166,15 @@ ann_file='annotations/instances_val2017.json', data_prefix=dict(img='val2017/'), test_mode=True, - pipeline=test_pipeline)) + pipeline=test_pipeline, + backend_args=backend_args)) test_dataloader = val_dataloader val_evaluator = dict( type='CocoMetric', ann_file=data_root + 'annotations/instances_val2017.json', - metric='bbox') + metric='bbox', + backend_args=backend_args) test_evaluator = val_evaluator # training settings diff --git a/configs/yolox/yolox_tiny_8xb8-300e_coco.py b/configs/yolox/yolox_tiny_8xb8-300e_coco.py index b15480bed0a..86f7e9a6191 100644 --- a/configs/yolox/yolox_tiny_8xb8-300e_coco.py +++ b/configs/yolox/yolox_tiny_8xb8-300e_coco.py @@ -15,14 +15,6 @@ img_scale = (640, 640) # width, height -# file_client_args = dict( -# backend='petrel', -# path_mapping=dict({ -# './data/': 's3://openmmlab/datasets/detection/', -# 'data/': 's3://openmmlab/datasets/detection/' -# })) -file_client_args = dict(backend='disk') - train_pipeline = [ dict(type='Mosaic', img_scale=img_scale, pad_val=114.0), dict( @@ -44,7 +36,7 @@ ] test_pipeline = [ - dict(type='LoadImageFromFile', file_client_args=file_client_args), + dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), dict(type='Resize', scale=(416, 416), keep_ratio=True), dict( type='Pad', diff --git a/configs/yolox/yolox_tta.py b/configs/yolox/yolox_tta.py index 8e86f26f5ac..e65244be6e1 100644 --- a/configs/yolox/yolox_tta.py +++ b/configs/yolox/yolox_tta.py @@ -4,7 +4,7 @@ img_scales = [(640, 640), (320, 320), (960, 960)] tta_pipeline = [ - dict(type='LoadImageFromFile', file_client_args=dict(backend='disk')), + dict(type='LoadImageFromFile', backend_args=None), dict( type='TestTimeAug', transforms=[ diff --git a/docker/Dockerfile b/docker/Dockerfile index 4c804044c7a..2737ec0efce 100644 --- a/docker/Dockerfile +++ b/docker/Dockerfile @@ -29,11 +29,11 @@ RUN apt-get update \ # Install MMEngine and MMCV RUN pip install openmim && \ - mim install "mmengine>=0.6.0" "mmcv>=2.0.0rc4" + mim install "mmengine>=0.7.1" "mmcv>=2.0.0rc4" # Install MMDetection RUN conda clean --all \ - && git clone https://github.com/open-mmlab/mmdetection.git -b 3.x /mmdetection \ + && git clone https://github.com/open-mmlab/mmdetection.git /mmdetection \ && cd /mmdetection \ && pip install --no-cache-dir -e . diff --git a/docker/serve/Dockerfile b/docker/serve/Dockerfile index 7a215f935ab..9a6a7784a2f 100644 --- a/docker/serve/Dockerfile +++ b/docker/serve/Dockerfile @@ -4,7 +4,7 @@ ARG CUDNN="8" FROM pytorch/pytorch:${PYTORCH}-cuda${CUDA}-cudnn${CUDNN}-devel ARG MMCV="2.0.0rc4" -ARG MMDET="3.0.0rc6" +ARG MMDET="3.0.0" ENV PYTHONUNBUFFERED TRUE diff --git a/docker/serve_cn/Dockerfile b/docker/serve_cn/Dockerfile index 7812d8b7198..b1dfb00b869 100644 --- a/docker/serve_cn/Dockerfile +++ b/docker/serve_cn/Dockerfile @@ -4,7 +4,7 @@ ARG CUDNN="8" FROM pytorch/pytorch:${PYTORCH}-cuda${CUDA}-cudnn${CUDNN}-devel ARG MMCV="2.0.0rc4" -ARG MMDET="3.0.0rc6" +ARG MMDET="3.0.0" ENV PYTHONUNBUFFERED TRUE diff --git a/docs/en/advanced_guides/customize_transforms.md b/docs/en/advanced_guides/customize_transforms.md index 870861b7d74..5fe84e9f7c9 100644 --- a/docs/en/advanced_guides/customize_transforms.md +++ b/docs/en/advanced_guides/customize_transforms.md @@ -32,7 +32,7 @@ custom_imports = dict(imports=['path.to.my_pipeline'], allow_failed_imports=False) train_pipeline = [ - dict(type='LoadImageFromFile', file_client_args=file_client_args), + dict(type='LoadImageFromFile'), dict(type='LoadAnnotations', with_bbox=True), dict(type='Resize', scale=(1333, 800), keep_ratio=True), dict(type='RandomFlip', prob=0.5), diff --git a/docs/en/advanced_guides/how_to.md b/docs/en/advanced_guides/how_to.md index f038f297445..8b19fc9db5b 100644 --- a/docs/en/advanced_guides/how_to.md +++ b/docs/en/advanced_guides/how_to.md @@ -37,7 +37,7 @@ model = dict( MMClassification also provides a wrapper for the PyTorch Image Models (timm) backbone network, users can directly use the backbone network in timm through MMClassification. Suppose you want to use [EfficientNet-B1](../../../configs/timm_example/retinanet_timm-efficientnet-b1_fpn_1x_coco.py) as the backbone network of RetinaNet, the example config is as the following. ```python -# https://github.com/open-mmlab/mmdetection/blob/dev-3.x/configs/timm_example/retinanet_timm-efficientnet-b1_fpn_1x_coco.py +# https://github.com/open-mmlab/mmdetection/blob/main/configs/timm_example/retinanet_timm-efficientnet-b1_fpn_1x_coco.py _base_ = [ '../_base_/models/retinanet_r50_fpn.py', diff --git a/docs/en/advanced_guides/transforms.md b/docs/en/advanced_guides/transforms.md index 8820a3cf129..4db036ae5c2 100644 --- a/docs/en/advanced_guides/transforms.md +++ b/docs/en/advanced_guides/transforms.md @@ -1,4 +1,4 @@ -# Data Transforms +# Data Transforms (Need to update) ## Design of Data transforms pipeline @@ -17,7 +17,7 @@ Here is a pipeline example for Faster R-CNN. ```python train_pipeline = [ # Training data processing pipeline - dict(type='LoadImageFromFile'), # First pipeline to load images from file path + dict(type='LoadImageFromFile', backend_args=backend_args), # First pipeline to load images from file path dict( type='LoadAnnotations', # Second pipeline to load annotations for current image with_bbox=True), # Whether to use bounding box, True for detection @@ -32,7 +32,7 @@ train_pipeline = [ # Training data processing pipeline dict(type='PackDetInputs') # Pipeline that formats the annotation data and decides which keys in the data should be packed into data_samples ] test_pipeline = [ # Testing data processing pipeline - dict(type='LoadImageFromFile', file_client_args=file_client_args), # First pipeline to load images from file path + dict(type='LoadImageFromFile', backend_args=backend_args), # First pipeline to load images from file path dict(type='Resize', scale=(1333, 800), keep_ratio=True), # Pipeline that resize the images dict( type='PackDetInputs', # Pipeline that formats the annotation data and decides which keys in the data should be packed into data_samples diff --git a/docs/en/conf.py b/docs/en/conf.py index e902e3fa8b1..d2beaf1e5c1 100644 --- a/docs/en/conf.py +++ b/docs/en/conf.py @@ -2,7 +2,7 @@ # # This file only contains a selection of the most common options. For a full # list see the documentation: -# https://www.sphinx-doc.org/en/master/usage/configuration.html +# https://www.sphinx-doc.org/en/main/usage/configuration.html # -- Path setup -------------------------------------------------------------- @@ -67,7 +67,7 @@ def get_version(): '.md': 'markdown', } -# The master toctree document. +# The main toctree document. master_doc = 'index' # List of patterns, relative to source directory, that match files and diff --git a/docs/en/get_started.md b/docs/en/get_started.md index 303da496ae6..31260abf8f6 100644 --- a/docs/en/get_started.md +++ b/docs/en/get_started.md @@ -44,7 +44,7 @@ We recommend that users follow our best practices to install MMDetection. Howeve ```shell pip install -U openmim mim install mmengine -mim install "mmcv>=2.0.0rc1" +mim install "mmcv>=2.0.0" ``` **Note:** In MMCV-v2.x, `mmcv-full` is rename to `mmcv`, if you want to install `mmcv` without CUDA ops, you can use `mim install "mmcv-lite>=2.0.0rc1"` to install the lite version. @@ -54,8 +54,7 @@ mim install "mmcv>=2.0.0rc1" Case a: If you develop and run mmdet directly, install it from source: ```shell -git clone https://github.com/open-mmlab/mmdetection.git -b 3.x -# "-b 3.x" means checkout to the `3.x` branch. +git clone https://github.com/open-mmlab/mmdetection.git cd mmdetection pip install -v -e . # "-v" means verbose, or more output @@ -66,7 +65,7 @@ pip install -v -e . Case b: If you use mmdet as a dependency or third-party package, install it with MIM: ```shell -mim install "mmdet>=3.0.0rc0" +mim install mmdet ``` ## Verify the installation @@ -138,7 +137,7 @@ To install MMCV with pip instead of MIM, please follow [MMCV installation guides For example, the following command installs MMCV built for PyTorch 1.12.x and CUDA 11.6. ```shell -pip install "mmcv>=2.0.0rc1" -f https://download.openmmlab.com/mmcv/dist/cu116/torch1.12.0/index.html +pip install "mmcv>=2.0.0" -f https://download.openmmlab.com/mmcv/dist/cu116/torch1.12.0/index.html ``` #### Install on CPU-only platforms @@ -180,13 +179,13 @@ thus we only need to install MMEngine, MMCV, and MMDetection with the following ```shell !pip3 install openmim !mim install mmengine -!mim install "mmcv>=2.0.0rc1,<2.1.0" +!mim install "mmcv>=2.0.0,<2.1.0" ``` **Step 2.** Install MMDetection from the source. ```shell -!git clone https://github.com/open-mmlab/mmdetection.git -b 3.x +!git clone https://github.com/open-mmlab/mmdetection.git %cd mmdetection !pip install -e . ``` @@ -196,7 +195,7 @@ thus we only need to install MMEngine, MMCV, and MMDetection with the following ```python import mmdet print(mmdet.__version__) -# Example output: 3.0.0rc0, or an another version. +# Example output: 3.0.0, or an another version. ``` ```{note} diff --git a/docs/en/index.rst b/docs/en/index.rst index 285954487bb..32c5952a4ae 100644 --- a/docs/en/index.rst +++ b/docs/en/index.rst @@ -24,7 +24,7 @@ Welcome to MMDetection's documentation! :maxdepth: 1 :caption: Migration - migration.md + migration/migration.md .. toctree:: :maxdepth: 1 diff --git a/docs/en/migration/api_and_registry_migration.md b/docs/en/migration/api_and_registry_migration.md new file mode 100644 index 00000000000..72bfd3aec8e --- /dev/null +++ b/docs/en/migration/api_and_registry_migration.md @@ -0,0 +1 @@ +# Migrate API and Registry from MMDetection 2.x to 3.x diff --git a/docs/en/migration/config_migration.md b/docs/en/migration/config_migration.md new file mode 100644 index 00000000000..20fe0bb7e0f --- /dev/null +++ b/docs/en/migration/config_migration.md @@ -0,0 +1,819 @@ +# Migrate Configuration File from MMDetection 2.x to 3.x + +The configuration file of MMDetection 3.x has undergone significant changes in comparison to the 2.x version. This document explains how to migrate 2.x configuration files to 3.x. + +In the previous tutorial [Learn about Configs](../user_guides/config.md), we used Mask R-CNN as an example to introduce the configuration file structure of MMDetection 3.x. Here, we will follow the same structure to demonstrate how to migrate 2.x configuration files to 3.x. + +## Model Configuration + +There have been no major changes to the model configuration in 3.x compared to 2.x. For the model's backbone, neck, head, as well as train_cfg and test_cfg, the parameters remain the same as in version 2.x. + +On the other hand, we have added the `DataPreprocessor` module in MMDetection 3.x. The configuration for the `DataPreprocessor` module is located in `model.data_preprocessor`. It is used to preprocess the input data, such as normalizing input images and padding images of different sizes into batches, and loading images from memory to VRAM. This configuration replaces the `Normalize` and `Pad` modules in `train_pipeline` and `test_pipeline` of the earlier version. + + + + + + + + + +
2.x Config + +```python +# Image normalization parameters +img_norm_cfg = dict( + mean=[123.675, 116.28, 103.53], + std=[58.395, 57.12, 57.375], + to_rgb=True) +pipeline=[ + ..., + dict(type='Normalize', **img_norm_cfg), + dict(type='Pad', size_divisor=32), # Padding the image to multiples of 32 + ... +] +``` + +
2.x Config + +```python +model = dict( + data_preprocessor=dict( + type='DetDataPreprocessor', + # Image normalization parameters + mean=[123.675, 116.28, 103.53], + std=[58.395, 57.12, 57.375], + bgr_to_rgb=True, + # Image padding parameters + pad_mask=True, # In instance segmentation, the mask needs to be padded + pad_size_divisor=32) # Padding the image to multiples of 32 +) + +``` + +
+ +## Dataset and Evaluator Configuration + +The dataset and evaluator configurations have undergone major changes compared to version 2.x. We will introduce how to migrate from version 2.x to version 3.x from three aspects: Dataloader and Dataset, Data transform pipeline, and Evaluator configuration. + +### Dataloader and Dataset Configuration + +In the new version, we set the data loading settings consistent with PyTorch's official DataLoader, +making it easier for users to understand and get started with. +We put the data loading settings for training, validation, and testing separately in `train_dataloader`, `val_dataloader`, and `test_dataloader`. +Users can set different parameters for these dataloaders. +The input parameters are basically the same as those required by [PyTorch DataLoader](https://pytorch.org/docs/stable/data.html?highlight=dataloader#torch.utils.data.DataLoader). + +This way, we put the unconfigurable parameters in version 2.x, such as `sampler`, `batch_sampler`, and `persistent_workers`, in the configuration file, so that users can set dataloader parameters more flexibly. + +Users can set the dataset configuration through `train_dataloader.dataset`, `val_dataloader.dataset`, and `test_dataloader.dataset`, which correspond to `data.train`, `data.val`, and `data.test` in version 2.x. + + + + + + + + + +
2.x Config + +```python +data = dict( + samples_per_gpu=2, + workers_per_gpu=2, + train=dict( + type=dataset_type, + ann_file=data_root + 'annotations/instances_train2017.json', + img_prefix=data_root + 'train2017/', + pipeline=train_pipeline), + val=dict( + type=dataset_type, + ann_file=data_root + 'annotations/instances_val2017.json', + img_prefix=data_root + 'val2017/', + pipeline=test_pipeline), + test=dict( + type=dataset_type, + ann_file=data_root + 'annotations/instances_val2017.json', + img_prefix=data_root + 'val2017/', + pipeline=test_pipeline)) +``` + +
3.x Config + +```python +train_dataloader = dict( + batch_size=2, + num_workers=2, + persistent_workers=True, # Avoid recreating subprocesses after each iteration + sampler=dict(type='DefaultSampler', shuffle=True), # Default sampler, supports both distributed and non-distributed training + batch_sampler=dict(type='AspectRatioBatchSampler'), # Default batch_sampler, used to ensure that images in the batch have similar aspect ratios, so as to better utilize graphics memory + dataset=dict( + type=dataset_type, + data_root=data_root, + ann_file='annotations/instances_train2017.json', + data_prefix=dict(img='train2017/'), + filter_cfg=dict(filter_empty_gt=True, min_size=32), + pipeline=train_pipeline)) +# In version 3.x, validation and test dataloaders can be configured independently +val_dataloader = dict( + batch_size=1, + num_workers=2, + persistent_workers=True, + drop_last=False, + sampler=dict(type='DefaultSampler', shuffle=False), + dataset=dict( + type=dataset_type, + data_root=data_root, + ann_file='annotations/instances_val2017.json', + data_prefix=dict(img='val2017/'), + test_mode=True, + pipeline=test_pipeline)) +test_dataloader = val_dataloader # The configuration of the testing dataloader is the same as that of the validation dataloader, which is omitted here + +``` + +
+ +### Data Transform Pipeline Configuration + +As mentioned earlier, we have separated the normalization and padding configurations for images from the `train_pipeline` and `test_pipeline`, and have placed them in `model.data_preprocessor` instead. Hence, in the 3.x version of the pipeline, we no longer require the `Normalize` and `Pad` transforms. + +At the same time, we have also refactored the transform responsible for packing the data format, and have merged the `Collect` and `DefaultFormatBundle` transforms into `PackDetInputs`. This transform is responsible for packing the data from the data pipeline into the input format of the model. For more details on the input format conversion, please refer to the [data flow documentation](../advanced_guides/data_flow.md). + +Below, we will use the `train_pipeline` of Mask R-CNN as an example, to demonstrate how to migrate from the 2.x configuration to the 3.x configuration: + + + + + + + + + +
2.x Config + +```python +img_norm_cfg = dict( + mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True) +train_pipeline = [ + dict(type='LoadImageFromFile'), + dict(type='LoadAnnotations', with_bbox=True), + dict(type='Resize', img_scale=(1333, 800), keep_ratio=True), + dict(type='RandomFlip', flip_ratio=0.5), + dict(type='Normalize', **img_norm_cfg), + dict(type='Pad', size_divisor=32), + dict(type='DefaultFormatBundle'), + dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels']), +] +``` + +
3.x Config + +```python +train_pipeline = [ + dict(type='LoadImageFromFile'), + dict(type='LoadAnnotations', with_bbox=True), + dict(type='Resize', scale=(1333, 800), keep_ratio=True), + dict(type='RandomFlip', prob=0.5), + dict(type='PackDetInputs') +] +``` + +
+ +For the `test_pipeline`, apart from removing the `Normalize` and `Pad` transforms, we have also separated the data augmentation for testing (TTA) from the normal testing process, and have removed `MultiScaleFlipAug`. For more information on how to use the new TTA version, please refer to the [TTA documentation](../advanced_guides/tta.md). + +Below, we will again use the `test_pipeline` of Mask R-CNN as an example, to demonstrate how to migrate from the 2.x configuration to the 3.x configuration: + + + + + + + + + +
2.x Config + +```python +test_pipeline = [ + dict(type='LoadImageFromFile'), + dict( + type='MultiScaleFlipAug', + img_scale=(1333, 800), + flip=False, + transforms=[ + dict(type='Resize', keep_ratio=True), + dict(type='RandomFlip'), + dict(type='Normalize', **img_norm_cfg), + dict(type='Pad', size_divisor=32), + dict(type='ImageToTensor', keys=['img']), + dict(type='Collect', keys=['img']), + ]) +] +``` + +
3.x Config + +```python +test_pipeline = [ + dict(type='LoadImageFromFile'), + dict(type='Resize', scale=(1333, 800), keep_ratio=True), + dict( + type='PackDetInputs', + meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape', + 'scale_factor')) +] +``` + +
+ +In addition, we have also refactored some data augmentation transforms. The following table lists the mapping between the transforms used in the 2.x version and the 3.x version: + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Name2.x Config3.x Config
Resize + +```python +dict(type='Resize', + img_scale=(1333, 800), + keep_ratio=True) +``` + + + +```python +dict(type='Resize', + scale=(1333, 800), + keep_ratio=True) +``` + +
RandomResize + +```python +dict( + type='Resize', + img_scale=[ + (1333, 640), (1333, 800)], + multiscale_mode='range', + keep_ratio=True) +``` + + + +```python +dict( + type='RandomResize', + scale=[ + (1333, 640), (1333, 800)], + keep_ratio=True) +``` + +
RandomChoiceResize + +```python +dict( + type='Resize', + img_scale=[ + (1333, 640), (1333, 672), + (1333, 704), (1333, 736), + (1333, 768), (1333, 800)], + multiscale_mode='value', + keep_ratio=True) +``` + + + +```python +dict( + type='RandomChoiceResize', + scales=[ + (1333, 640), (1333, 672), + (1333, 704), (1333, 736), + (1333, 768), (1333, 800)], + keep_ratio=True) +``` + +
RandomFlip + +```python +dict(type='RandomFlip', flip_ratio=0.5) +``` + + + +```python +dict(type='RandomFlip', prob=0.5) +``` + +
+ +### 评测器配置 + +In version 3.x, model accuracy evaluation is no longer tied to the dataset, but is instead accomplished through the use of an Evaluator. +The Evaluator configuration is divided into two parts: `val_evaluator` and `test_evaluator`. The `val_evaluator` is used for validation dataset evaluation, while the `test_evaluator` is used for testing dataset evaluation. +This corresponds to the `evaluation` field in version 2.x. + +The following table shows the corresponding relationship between Evaluators in version 2.x and 3.x. + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Metric Name2.x Config3.x Config
COCO + +```python +data = dict( + val=dict( + type='CocoDataset', + ann_file=data_root + 'annotations/instances_val2017.json')) +evaluation = dict(metric=['bbox', 'segm']) +``` + + + +```python +val_evaluator = dict( + type='CocoMetric', + ann_file=data_root + 'annotations/instances_val2017.json', + metric=['bbox', 'segm'], + format_only=False) +``` + +
Pascal VOC + +```python +data = dict( + val=dict( + type=dataset_type, + ann_file=data_root + 'VOC2007/ImageSets/Main/test.txt')) +evaluation = dict(metric='mAP') +``` + + + +```python +val_evaluator = dict( + type='VOCMetric', + metric='mAP', + eval_mode='11points') +``` + +
OpenImages + +```python +data = dict( + val=dict( + type='OpenImagesDataset', + ann_file=data_root + 'annotations/validation-annotations-bbox.csv', + img_prefix=data_root + 'OpenImages/validation/', + label_file=data_root + 'annotations/class-descriptions-boxable.csv', + hierarchy_file=data_root + + 'annotations/bbox_labels_600_hierarchy.json', + meta_file=data_root + 'annotations/validation-image-metas.pkl', + image_level_ann_file=data_root + + 'annotations/validation-annotations-human-imagelabels-boxable.csv')) +evaluation = dict(interval=1, metric='mAP') +``` + + + +```python +val_evaluator = dict( + type='OpenImagesMetric', + iou_thrs=0.5, + ioa_thrs=0.5, + use_group_of=True, + get_supercategory=True) +``` + +
CityScapes + +```python +data = dict( + val=dict( + type='CityScapesDataset', + ann_file=data_root + + 'annotations/instancesonly_filtered_gtFine_val.json', + img_prefix=data_root + 'leftImg8bit/val/', + pipeline=test_pipeline)) +evaluation = dict(metric=['bbox', 'segm']) +``` + + + +```python +val_evaluator = [ + dict( + type='CocoMetric', + ann_file=data_root + + 'annotations/instancesonly_filtered_gtFine_val.json', + metric=['bbox', 'segm']), + dict( + type='CityScapesMetric', + ann_file=data_root + + 'annotations/instancesonly_filtered_gtFine_val.json', + seg_prefix=data_root + '/gtFine/val', + outfile_prefix='./work_dirs/cityscapes_metric/instance') +] +``` + +
+ +## Configuration for Training and Testing + + + + + + + + + +
2.x Config + +```python +runner = dict( + type='EpochBasedRunner', # Type of training loop + max_epochs=12) # Maximum number of training epochs +evaluation = dict(interval=2) # Interval for evaluation, check the performance every 2 epochs +``` + +
3.x Config + +```python +train_cfg = dict( + type='EpochBasedTrainLoop', # Type of training loop, please refer to https://github.com/open-mmlab/mmengine/blob/main/mmengine/runner/loops.py + max_epochs=12, # Maximum number of training epochs + val_interval=2) # Interval for validation, check the performance every 2 epochs +val_cfg = dict(type='ValLoop') # Type of validation loop +test_cfg = dict(type='TestLoop') # Type of testing loop +``` + +
+ +## Optimization Configuration + +The configuration for optimizer and gradient clipping is moved to the `optim_wrapper` field. +The following table shows the correspondences for optimizer configuration between 2.x version and 3.x version: + + + + + + + + + +
2.x Config + +```python +optimizer = dict( + type='SGD', # Optimizer: Stochastic Gradient Descent + lr=0.02, # Base learning rate + momentum=0.9, # SGD with momentum + weight_decay=0.0001) # Weight decay +optimizer_config = dict(grad_clip=None) # Configuration for gradient clipping, set to None to disable +``` + +
3.x Config + +```python +optim_wrapper = dict( # Configuration for the optimizer wrapper + type='OptimWrapper', # Type of optimizer wrapper, you can switch to AmpOptimWrapper to enable mixed precision training + optimizer=dict( # Optimizer configuration, supports various PyTorch optimizers, please refer to https://pytorch.org/docs/stable/optim.html#algorithms + type='SGD', # SGD + lr=0.02, # Base learning rate + momentum=0.9, # SGD with momentum + weight_decay=0.0001), # Weight decay + clip_grad=None, # Configuration for gradient clipping, set to None to disable. For usage, please see https://mmengine.readthedocs.io/en/latest/tutorials/optimizer.html + ) +``` + +
+ +The configuration for learning rate is also moved from the `lr_config` field to the `param_scheduler` field. The `param_scheduler` configuration is more similar to PyTorch's learning rate scheduler and more flexible. The following table shows the correspondences for learning rate configuration between 2.x version and 3.x version: + + + + + + + + + +
2.x Config + +```python +lr_config = dict( + policy='step', # Use multi-step learning rate strategy during training + warmup='linear', # Use linear learning rate warmup + warmup_iters=500, # End warmup at iteration 500 + warmup_ratio=0.001, # Coefficient for learning rate warmup + step=[8, 11], # Learning rate decay at which epochs + gamma=0.1) # Learning rate decay coefficient + +``` + +
3.x Config + +```python +param_scheduler = [ + dict( + type='LinearLR', # Use linear learning rate warmup + start_factor=0.001, # Coefficient for learning rate warmup + by_epoch=False, # Update the learning rate during warmup at each iteration + begin=0, # Starting from the first iteration + end=500), # End at the 500th iteration + dict( + type='MultiStepLR', # Use multi-step learning rate strategy during training + by_epoch=True, # Update the learning rate at each epoch + begin=0, # Starting from the first epoch + end=12, # Ending at the 12th epoch + milestones=[8, 11], # Learning rate decay at which epochs + gamma=0.1) # Learning rate decay coefficient +] + +``` + +
+ +For information on how to migrate other learning rate adjustment policies, please refer to the [learning rate migration document of MMEngine](https://mmengine.readthedocs.io/zh_CN/latest/migration/param_scheduler.html). + +## Migration of Other Configurations + +### Configuration for Saving Checkpoints + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Function2.x Config3.x Config
Set Save Interval + +```python +checkpoint_config = dict( + interval=1) +``` + + + +```python +default_hooks = dict( + checkpoint=dict( + type='CheckpointHook', + interval=1)) +``` + +
Save Best Model + +```python +evaluation = dict( + save_best='auto') +``` + + + +```python +default_hooks = dict( + checkpoint=dict( + type='CheckpointHook', + save_best='auto')) +``` + +
Keep Latest Model + +```python +checkpoint_config = dict( + max_keep_ckpts=3) +``` + + + +```python +default_hooks = dict( + checkpoint=dict( + type='CheckpointHook', + max_keep_ckpts=3)) +``` + +
+ +### Logging Configuration + +In MMDetection 3.x, the logging and visualization of the log are carried out respectively by the logger and visualizer in MMEngine. The following table shows the comparison between the configuration of printing logs and visualizing logs in MMDetection 2.x and 3.x. + + + + + + + + + + + + + + + + + + + + + + + + +
Function2.x Config3.x Config
Set Log Printing Interval + +```python +log_config = dict(interval=50) +``` + + + +```python +default_hooks = dict( + logger=dict(type='LoggerHook', interval=50)) +# Optional: set moving average window size +log_processor = dict( + type='LogProcessor', window_size=50) +``` + +
Use TensorBoard or WandB to visualize logs + +```python +log_config = dict( + interval=50, + hooks=[ + dict(type='TextLoggerHook'), + dict(type='TensorboardLoggerHook'), + dict(type='MMDetWandbHook', + init_kwargs={ + 'project': 'mmdetection', + 'group': 'maskrcnn-r50-fpn-1x-coco' + }, + interval=50, + log_checkpoint=True, + log_checkpoint_metadata=True, + num_eval_images=100) + ]) +``` + + + +```python +vis_backends = [ + dict(type='LocalVisBackend'), + dict(type='TensorboardVisBackend'), + dict(type='WandbVisBackend', + init_kwargs={ + 'project': 'mmdetection', + 'group': 'maskrcnn-r50-fpn-1x-coco' + }) +] +visualizer = dict( + type='DetLocalVisualizer', + vis_backends=vis_backends, + name='visualizer') +``` + +
+ +For visualization-related tutorials, please refer to [Visualization Tutorial](../user_guides/visualization.md) of MMDetection. + +### Runtime Configuration + +The runtime configuration fields in version 3.x have been adjusted, and the specific correspondence is as follows: + + + + + + + + + + + + + + + + +
2.x Config3.x Config
+ +```python +cudnn_benchmark = False +opencv_num_threads = 0 +mp_start_method = 'fork' +dist_params = dict(backend='nccl') +log_level = 'INFO' +load_from = None +resume_from = None + + +``` + + + +```python +env_cfg = dict( + cudnn_benchmark=False, + mp_cfg=dict(mp_start_method='fork', + opencv_num_threads=0), + dist_cfg=dict(backend='nccl')) +log_level = 'INFO' +load_from = None +resume = False +``` + +
diff --git a/docs/en/migration/dataset_migration.md b/docs/en/migration/dataset_migration.md new file mode 100644 index 00000000000..75d093298e0 --- /dev/null +++ b/docs/en/migration/dataset_migration.md @@ -0,0 +1 @@ +# Migrate dataset from MMDetection 2.x to 3.x diff --git a/docs/en/migration/migration.md b/docs/en/migration/migration.md new file mode 100644 index 00000000000..ec6a2f891b1 --- /dev/null +++ b/docs/en/migration/migration.md @@ -0,0 +1,12 @@ +# Migrating from MMDetection 2.x to 3.x + +MMDetection 3.x is a significant update that includes many changes to API and configuration files. This document aims to help users migrate from MMDetection 2.x to 3.x. +We divided the migration guide into the following sections: + +- [Configuration file migration](./config_migration.md) +- [API and Registry migration](./api_and_registry_migration.md) +- [Dataset migration](./dataset_migration.md) +- [Model migration](./model_migration.md) +- [Frequently Asked Questions](./migration_faq.md) + +If you encounter any problems during the migration process, feel free to raise an issue. We also welcome contributions to this document. diff --git a/docs/en/migration/migration_faq.md b/docs/en/migration/migration_faq.md new file mode 100644 index 00000000000..a6e3c356c27 --- /dev/null +++ b/docs/en/migration/migration_faq.md @@ -0,0 +1 @@ +# Migration FAQ diff --git a/docs/en/migration/model_migration.md b/docs/en/migration/model_migration.md new file mode 100644 index 00000000000..04e280879fc --- /dev/null +++ b/docs/en/migration/model_migration.md @@ -0,0 +1 @@ +# Migrate models from MMDetection 2.x to 3.x diff --git a/docs/en/model_zoo.md b/docs/en/model_zoo.md index fcacdb0f35a..15dd7b2fb5b 100644 --- a/docs/en/model_zoo.md +++ b/docs/en/model_zoo.md @@ -10,7 +10,7 @@ We only use aliyun to maintain the model zoo since MMDetection V2.0. The model z - We use distributed training. - All pytorch-style pretrained backbones on ImageNet are from PyTorch model zoo, caffe-style pretrained backbones are converted from the newly released model from detectron2. - For fair comparison with other codebases, we report the GPU memory as the maximum value of `torch.cuda.max_memory_allocated()` for all 8 GPUs. Note that this value is usually less than what `nvidia-smi` shows. -- We report the inference time as the total time of network forwarding and post-processing, excluding the data loading time. Results are obtained with the script [benchmark.py](https://github.com/open-mmlab/mmdetection/blob/master/tools/analysis_tools/benchmark.py) which computes the average time on 2000 images. +- We report the inference time as the total time of network forwarding and post-processing, excluding the data loading time. Results are obtained with the script [benchmark.py](https://github.com/open-mmlab/mmdetection/blob/main/tools/analysis_tools/benchmark.py) which computes the average time on 2000 images. ## ImageNet Pretrained Models @@ -37,244 +37,244 @@ The detailed table of the commonly used backbone models in MMDetection is listed ### RPN -Please refer to [RPN](https://github.com/open-mmlab/mmdetection/blob/master/configs/rpn) for details. +Please refer to [RPN](https://github.com/open-mmlab/mmdetection/blob/main/configs/rpn) for details. ### Faster R-CNN -Please refer to [Faster R-CNN](https://github.com/open-mmlab/mmdetection/blob/master/configs/faster_rcnn) for details. +Please refer to [Faster R-CNN](https://github.com/open-mmlab/mmdetection/blob/main/configs/faster_rcnn) for details. ### Mask R-CNN -Please refer to [Mask R-CNN](https://github.com/open-mmlab/mmdetection/blob/master/configs/mask_rcnn) for details. +Please refer to [Mask R-CNN](https://github.com/open-mmlab/mmdetection/blob/main/configs/mask_rcnn) for details. ### Fast R-CNN (with pre-computed proposals) -Please refer to [Fast R-CNN](https://github.com/open-mmlab/mmdetection/blob/master/configs/fast_rcnn) for details. +Please refer to [Fast R-CNN](https://github.com/open-mmlab/mmdetection/blob/main/configs/fast_rcnn) for details. ### RetinaNet -Please refer to [RetinaNet](https://github.com/open-mmlab/mmdetection/blob/master/configs/retinanet) for details. +Please refer to [RetinaNet](https://github.com/open-mmlab/mmdetection/blob/main/configs/retinanet) for details. ### Cascade R-CNN and Cascade Mask R-CNN -Please refer to [Cascade R-CNN](https://github.com/open-mmlab/mmdetection/blob/master/configs/cascade_rcnn) for details. +Please refer to [Cascade R-CNN](https://github.com/open-mmlab/mmdetection/blob/main/configs/cascade_rcnn) for details. ### Hybrid Task Cascade (HTC) -Please refer to [HTC](https://github.com/open-mmlab/mmdetection/blob/master/configs/htc) for details. +Please refer to [HTC](https://github.com/open-mmlab/mmdetection/blob/main/configs/htc) for details. ### SSD -Please refer to [SSD](https://github.com/open-mmlab/mmdetection/blob/master/configs/ssd) for details. +Please refer to [SSD](https://github.com/open-mmlab/mmdetection/blob/main/configs/ssd) for details. ### Group Normalization (GN) -Please refer to [Group Normalization](https://github.com/open-mmlab/mmdetection/blob/master/configs/gn) for details. +Please refer to [Group Normalization](https://github.com/open-mmlab/mmdetection/blob/main/configs/gn) for details. ### Weight Standardization -Please refer to [Weight Standardization](https://github.com/open-mmlab/mmdetection/blob/master/configs/gn+ws) for details. +Please refer to [Weight Standardization](https://github.com/open-mmlab/mmdetection/blob/main/configs/gn+ws) for details. ### Deformable Convolution v2 -Please refer to [Deformable Convolutional Networks](https://github.com/open-mmlab/mmdetection/blob/master/configs/dcn) for details. +Please refer to [Deformable Convolutional Networks](https://github.com/open-mmlab/mmdetection/blob/main/configs/dcn) for details. ### CARAFE: Content-Aware ReAssembly of FEatures -Please refer to [CARAFE](https://github.com/open-mmlab/mmdetection/blob/master/configs/carafe) for details. +Please refer to [CARAFE](https://github.com/open-mmlab/mmdetection/blob/main/configs/carafe) for details. ### Instaboost -Please refer to [Instaboost](https://github.com/open-mmlab/mmdetection/blob/master/configs/instaboost) for details. +Please refer to [Instaboost](https://github.com/open-mmlab/mmdetection/blob/main/configs/instaboost) for details. ### Libra R-CNN -Please refer to [Libra R-CNN](https://github.com/open-mmlab/mmdetection/blob/master/configs/libra_rcnn) for details. +Please refer to [Libra R-CNN](https://github.com/open-mmlab/mmdetection/blob/main/configs/libra_rcnn) for details. ### Guided Anchoring -Please refer to [Guided Anchoring](https://github.com/open-mmlab/mmdetection/blob/master/configs/guided_anchoring) for details. +Please refer to [Guided Anchoring](https://github.com/open-mmlab/mmdetection/blob/main/configs/guided_anchoring) for details. ### FCOS -Please refer to [FCOS](https://github.com/open-mmlab/mmdetection/blob/master/configs/fcos) for details. +Please refer to [FCOS](https://github.com/open-mmlab/mmdetection/blob/main/configs/fcos) for details. ### FoveaBox -Please refer to [FoveaBox](https://github.com/open-mmlab/mmdetection/blob/master/configs/foveabox) for details. +Please refer to [FoveaBox](https://github.com/open-mmlab/mmdetection/blob/main/configs/foveabox) for details. ### RepPoints -Please refer to [RepPoints](https://github.com/open-mmlab/mmdetection/blob/master/configs/reppoints) for details. +Please refer to [RepPoints](https://github.com/open-mmlab/mmdetection/blob/main/configs/reppoints) for details. ### FreeAnchor -Please refer to [FreeAnchor](https://github.com/open-mmlab/mmdetection/blob/master/configs/free_anchor) for details. +Please refer to [FreeAnchor](https://github.com/open-mmlab/mmdetection/blob/main/configs/free_anchor) for details. ### Grid R-CNN (plus) -Please refer to [Grid R-CNN](https://github.com/open-mmlab/mmdetection/blob/master/configs/grid_rcnn) for details. +Please refer to [Grid R-CNN](https://github.com/open-mmlab/mmdetection/blob/main/configs/grid_rcnn) for details. ### GHM -Please refer to [GHM](https://github.com/open-mmlab/mmdetection/blob/master/configs/ghm) for details. +Please refer to [GHM](https://github.com/open-mmlab/mmdetection/blob/main/configs/ghm) for details. ### GCNet -Please refer to [GCNet](https://github.com/open-mmlab/mmdetection/blob/master/configs/gcnet) for details. +Please refer to [GCNet](https://github.com/open-mmlab/mmdetection/blob/main/configs/gcnet) for details. ### HRNet -Please refer to [HRNet](https://github.com/open-mmlab/mmdetection/blob/master/configs/hrnet) for details. +Please refer to [HRNet](https://github.com/open-mmlab/mmdetection/blob/main/configs/hrnet) for details. ### Mask Scoring R-CNN -Please refer to [Mask Scoring R-CNN](https://github.com/open-mmlab/mmdetection/blob/master/configs/ms_rcnn) for details. +Please refer to [Mask Scoring R-CNN](https://github.com/open-mmlab/mmdetection/blob/main/configs/ms_rcnn) for details. ### Train from Scratch -Please refer to [Rethinking ImageNet Pre-training](https://github.com/open-mmlab/mmdetection/blob/master/configs/scratch) for details. +Please refer to [Rethinking ImageNet Pre-training](https://github.com/open-mmlab/mmdetection/blob/main/configs/scratch) for details. ### NAS-FPN -Please refer to [NAS-FPN](https://github.com/open-mmlab/mmdetection/blob/master/configs/nas_fpn) for details. +Please refer to [NAS-FPN](https://github.com/open-mmlab/mmdetection/blob/main/configs/nas_fpn) for details. ### ATSS -Please refer to [ATSS](https://github.com/open-mmlab/mmdetection/blob/master/configs/atss) for details. +Please refer to [ATSS](https://github.com/open-mmlab/mmdetection/blob/main/configs/atss) for details. ### FSAF -Please refer to [FSAF](https://github.com/open-mmlab/mmdetection/blob/master/configs/fsaf) for details. +Please refer to [FSAF](https://github.com/open-mmlab/mmdetection/blob/main/configs/fsaf) for details. ### RegNetX -Please refer to [RegNet](https://github.com/open-mmlab/mmdetection/blob/master/configs/regnet) for details. +Please refer to [RegNet](https://github.com/open-mmlab/mmdetection/blob/main/configs/regnet) for details. ### Res2Net -Please refer to [Res2Net](https://github.com/open-mmlab/mmdetection/blob/master/configs/res2net) for details. +Please refer to [Res2Net](https://github.com/open-mmlab/mmdetection/blob/main/configs/res2net) for details. ### GRoIE -Please refer to [GRoIE](https://github.com/open-mmlab/mmdetection/blob/master/configs/groie) for details. +Please refer to [GRoIE](https://github.com/open-mmlab/mmdetection/blob/main/configs/groie) for details. ### Dynamic R-CNN -Please refer to [Dynamic R-CNN](https://github.com/open-mmlab/mmdetection/blob/master/configs/dynamic_rcnn) for details. +Please refer to [Dynamic R-CNN](https://github.com/open-mmlab/mmdetection/blob/main/configs/dynamic_rcnn) for details. ### PointRend -Please refer to [PointRend](https://github.com/open-mmlab/mmdetection/blob/master/configs/point_rend) for details. +Please refer to [PointRend](https://github.com/open-mmlab/mmdetection/blob/main/configs/point_rend) for details. ### DetectoRS -Please refer to [DetectoRS](https://github.com/open-mmlab/mmdetection/blob/master/configs/detectors) for details. +Please refer to [DetectoRS](https://github.com/open-mmlab/mmdetection/blob/main/configs/detectors) for details. ### Generalized Focal Loss -Please refer to [Generalized Focal Loss](https://github.com/open-mmlab/mmdetection/blob/master/configs/gfl) for details. +Please refer to [Generalized Focal Loss](https://github.com/open-mmlab/mmdetection/blob/main/configs/gfl) for details. ### CornerNet -Please refer to [CornerNet](https://github.com/open-mmlab/mmdetection/blob/master/configs/cornernet) for details. +Please refer to [CornerNet](https://github.com/open-mmlab/mmdetection/blob/main/configs/cornernet) for details. ### YOLOv3 -Please refer to [YOLOv3](https://github.com/open-mmlab/mmdetection/blob/master/configs/yolo) for details. +Please refer to [YOLOv3](https://github.com/open-mmlab/mmdetection/blob/main/configs/yolo) for details. ### PAA -Please refer to [PAA](https://github.com/open-mmlab/mmdetection/blob/master/configs/paa) for details. +Please refer to [PAA](https://github.com/open-mmlab/mmdetection/blob/main/configs/paa) for details. ### SABL -Please refer to [SABL](https://github.com/open-mmlab/mmdetection/blob/master/configs/sabl) for details. +Please refer to [SABL](https://github.com/open-mmlab/mmdetection/blob/main/configs/sabl) for details. ### CentripetalNet -Please refer to [CentripetalNet](https://github.com/open-mmlab/mmdetection/blob/master/configs/centripetalnet) for details. +Please refer to [CentripetalNet](https://github.com/open-mmlab/mmdetection/blob/main/configs/centripetalnet) for details. ### ResNeSt -Please refer to [ResNeSt](https://github.com/open-mmlab/mmdetection/blob/master/configs/resnest) for details. +Please refer to [ResNeSt](https://github.com/open-mmlab/mmdetection/blob/main/configs/resnest) for details. ### DETR -Please refer to [DETR](https://github.com/open-mmlab/mmdetection/blob/master/configs/detr) for details. +Please refer to [DETR](https://github.com/open-mmlab/mmdetection/blob/main/configs/detr) for details. ### Deformable DETR -Please refer to [Deformable DETR](https://github.com/open-mmlab/mmdetection/blob/master/configs/deformable_detr) for details. +Please refer to [Deformable DETR](https://github.com/open-mmlab/mmdetection/blob/main/configs/deformable_detr) for details. ### AutoAssign -Please refer to [AutoAssign](https://github.com/open-mmlab/mmdetection/blob/master/configs/autoassign) for details. +Please refer to [AutoAssign](https://github.com/open-mmlab/mmdetection/blob/main/configs/autoassign) for details. ### YOLOF -Please refer to [YOLOF](https://github.com/open-mmlab/mmdetection/blob/master/configs/yolof) for details. +Please refer to [YOLOF](https://github.com/open-mmlab/mmdetection/blob/main/configs/yolof) for details. ### Seesaw Loss -Please refer to [Seesaw Loss](https://github.com/open-mmlab/mmdetection/blob/master/configs/seesaw_loss) for details. +Please refer to [Seesaw Loss](https://github.com/open-mmlab/mmdetection/blob/main/configs/seesaw_loss) for details. ### CenterNet -Please refer to [CenterNet](https://github.com/open-mmlab/mmdetection/blob/master/configs/centernet) for details. +Please refer to [CenterNet](https://github.com/open-mmlab/mmdetection/blob/main/configs/centernet) for details. ### YOLOX -Please refer to [YOLOX](https://github.com/open-mmlab/mmdetection/blob/master/configs/yolox) for details. +Please refer to [YOLOX](https://github.com/open-mmlab/mmdetection/blob/main/configs/yolox) for details. ### PVT -Please refer to [PVT](https://github.com/open-mmlab/mmdetection/blob/master/configs/pvt) for details. +Please refer to [PVT](https://github.com/open-mmlab/mmdetection/blob/main/configs/pvt) for details. ### SOLO -Please refer to [SOLO](https://github.com/open-mmlab/mmdetection/blob/master/configs/solo) for details. +Please refer to [SOLO](https://github.com/open-mmlab/mmdetection/blob/main/configs/solo) for details. ### QueryInst -Please refer to [QueryInst](https://github.com/open-mmlab/mmdetection/blob/master/configs/queryinst) for details. +Please refer to [QueryInst](https://github.com/open-mmlab/mmdetection/blob/main/configs/queryinst) for details. ### PanopticFPN -Please refer to [PanopticFPN](https://github.com/open-mmlab/mmdetection/blob/master/configs/panoptic_fpn) for details. +Please refer to [PanopticFPN](https://github.com/open-mmlab/mmdetection/blob/main/configs/panoptic_fpn) for details. ### MaskFormer -Please refer to [MaskFormer](https://github.com/open-mmlab/mmdetection/blob/master/configs/maskformer) for details. +Please refer to [MaskFormer](https://github.com/open-mmlab/mmdetection/blob/main/configs/maskformer) for details. ### DyHead -Please refer to [DyHead](https://github.com/open-mmlab/mmdetection/blob/master/configs/dyhead) for details. +Please refer to [DyHead](https://github.com/open-mmlab/mmdetection/blob/main/configs/dyhead) for details. ### Mask2Former -Please refer to [Mask2Former](https://github.com/open-mmlab/mmdetection/blob/master/configs/mask2former) for details. +Please refer to [Mask2Former](https://github.com/open-mmlab/mmdetection/blob/main/configs/mask2former) for details. ### Efficientnet -Please refer to [Efficientnet](https://github.com/open-mmlab/mmdetection/blob/master/configs/efficientnet) for details. +Please refer to [Efficientnet](https://github.com/open-mmlab/mmdetection/blob/main/configs/efficientnet) for details. ### Other datasets -We also benchmark some methods on [PASCAL VOC](https://github.com/open-mmlab/mmdetection/blob/master/configs/pascal_voc), [Cityscapes](https://github.com/open-mmlab/mmdetection/blob/master/configs/cityscapes), [OpenImages](https://github.com/open-mmlab/mmdetection/blob/master/configs/openimages) and [WIDER FACE](https://github.com/open-mmlab/mmdetection/blob/master/configs/wider_face). +We also benchmark some methods on [PASCAL VOC](https://github.com/open-mmlab/mmdetection/blob/main/configs/pascal_voc), [Cityscapes](https://github.com/open-mmlab/mmdetection/blob/main/configs/cityscapes), [OpenImages](https://github.com/open-mmlab/mmdetection/blob/main/configs/openimages) and [WIDER FACE](https://github.com/open-mmlab/mmdetection/blob/main/configs/wider_face). ### Pre-trained Models -We also train [Faster R-CNN](https://github.com/open-mmlab/mmdetection/blob/master/configs/faster_rcnn) and [Mask R-CNN](https://github.com/open-mmlab/mmdetection/blob/master/configs/mask_rcnn) using ResNet-50 and [RegNetX-3.2G](https://github.com/open-mmlab/mmdetection/blob/master/configs/regnet) with multi-scale training and longer schedules. These models serve as strong pre-trained models for downstream tasks for convenience. +We also train [Faster R-CNN](https://github.com/open-mmlab/mmdetection/blob/main/configs/faster_rcnn) and [Mask R-CNN](https://github.com/open-mmlab/mmdetection/blob/main/configs/mask_rcnn) using ResNet-50 and [RegNetX-3.2G](https://github.com/open-mmlab/mmdetection/blob/main/configs/regnet) with multi-scale training and longer schedules. These models serve as strong pre-trained models for downstream tasks for convenience. ## Speed benchmark ### Training Speed benchmark -We provide [analyze_logs.py](https://github.com/open-mmlab/mmdetection/blob/master/tools/analysis_tools/analyze_logs.py) to get average time of iteration in training. You can find examples in [Log Analysis](https://mmdetection.readthedocs.io/en/latest/useful_tools.html#log-analysis). +We provide [analyze_logs.py](https://github.com/open-mmlab/mmdetection/blob/main/tools/analysis_tools/analyze_logs.py) to get average time of iteration in training. You can find examples in [Log Analysis](https://mmdetection.readthedocs.io/en/latest/useful_tools.html#log-analysis). -We compare the training speed of Mask R-CNN with some other popular frameworks (The data is copied from [detectron2](https://github.com/facebookresearch/detectron2/blob/master/docs/notes/benchmarks.md/)). -For mmdetection, we benchmark with [mask_rcnn_r50_caffe_fpn_poly_1x_coco_v1.py](https://github.com/open-mmlab/mmdetection/blob/master/configs/mask_rcnn/mask_rcnn_r50_caffe_fpn_poly_1x_coco_v1.py), which should have the same setting with [mask_rcnn_R_50_FPN_noaug_1x.yaml](https://github.com/facebookresearch/detectron2/blob/master/configs/Detectron1-Comparisons/mask_rcnn_R_50_FPN_noaug_1x.yaml) of detectron2. +We compare the training speed of Mask R-CNN with some other popular frameworks (The data is copied from [detectron2](https://github.com/facebookresearch/detectron2/blob/main/docs/notes/benchmarks.md/)). +For mmdetection, we benchmark with [mask-rcnn_r50-caffe_fpn_poly-1x_coco_v1.py](https://github.com/open-mmlab/mmdetection/blob/main/configs/mask_rcnn/mask-rcnn_r50-caffe_fpn_poly-1x_coco_v1.py), which should have the same setting with [mask_rcnn_R_50_FPN_noaug_1x.yaml](https://github.com/facebookresearch/detectron2/blob/main/configs/Detectron1-Comparisons/mask_rcnn_R_50_FPN_noaug_1x.yaml) of detectron2. We also provide the [checkpoint](https://download.openmmlab.com/mmdetection/v2.0/benchmark/mask_rcnn_r50_caffe_fpn_poly_1x_coco_no_aug/mask_rcnn_r50_caffe_fpn_poly_1x_coco_no_aug_compare_20200518-10127928.pth) and [training log](https://download.openmmlab.com/mmdetection/v2.0/benchmark/mask_rcnn_r50_caffe_fpn_poly_1x_coco_no_aug/mask_rcnn_r50_caffe_fpn_poly_1x_coco_no_aug_20200518_105755.log.json) for reference. The throughput is computed as the average throughput in iterations 100-500 to skip GPU warmup time. | Implementation | Throughput (img/s) | @@ -289,7 +289,7 @@ We also provide the [checkpoint](https://download.openmmlab.com/mmdetection/v2.0 ### Inference Speed Benchmark -We provide [benchmark.py](https://github.com/open-mmlab/mmdetection/blob/master/tools/analysis_tools/benchmark.py) to benchmark the inference latency. +We provide [benchmark.py](https://github.com/open-mmlab/mmdetection/blob/main/tools/analysis_tools/benchmark.py) to benchmark the inference latency. The script benchmarkes the model with 2000 images and calculates the average time ignoring first 5 times. You can change the output log interval (defaults: 50) by setting `LOG-INTERVAL`. ```shell @@ -319,11 +319,11 @@ For fair comparison, we install and run both frameworks on the same machine. ### Performance -| Type | Lr schd | Detectron2 | mmdetection | Download | -| -------------------------------------------------------------------------------------------------------------------------------------- | ------- | -------------------------------------------------------------------------------------------------------------------------------------- | ----------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -| [Faster R-CNN](https://github.com/open-mmlab/mmdetection/blob/master/configs/faster_rcnn/faster_rcnn_r50_caffe_fpn_mstrain_1x_coco.py) | 1x | [37.9](https://github.com/facebookresearch/detectron2/blob/master/configs/COCO-Detection/faster_rcnn_R_50_FPN_1x.yaml) | 38.0 | [model](https://download.openmmlab.com/mmdetection/v2.0/benchmark/faster_rcnn_r50_caffe_fpn_mstrain_1x_coco/faster_rcnn_r50_caffe_fpn_mstrain_1x_coco-5324cff8.pth) \| [log](https://download.openmmlab.com/mmdetection/v2.0/benchmark/faster_rcnn_r50_caffe_fpn_mstrain_1x_coco/faster_rcnn_r50_caffe_fpn_mstrain_1x_coco_20200429_234554.log.json) | -| [Mask R-CNN](https://github.com/open-mmlab/mmdetection/blob/master/configs/mask_rcnn/mask_rcnn_r50_caffe_fpn_mstrain-poly_1x_coco.py) | 1x | [38.6 & 35.2](https://github.com/facebookresearch/detectron2/blob/master/configs/COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_1x.yaml) | 38.8 & 35.4 | [model](https://download.openmmlab.com/mmdetection/v2.0/benchmark/mask_rcnn_r50_caffe_fpn_mstrain-poly_1x_coco/mask_rcnn_r50_caffe_fpn_mstrain-poly_1x_coco-dbecf295.pth) \| [log](https://download.openmmlab.com/mmdetection/v2.0/benchmark/mask_rcnn_r50_caffe_fpn_mstrain-poly_1x_coco/mask_rcnn_r50_caffe_fpn_mstrain-poly_1x_coco_20200430_054239.log.json) | -| [Retinanet](https://github.com/open-mmlab/mmdetection/blob/master/configs/retinanet/retinanet_r50_caffe_fpn_mstrain_1x_coco.py) | 1x | [36.5](https://github.com/facebookresearch/detectron2/blob/master/configs/COCO-Detection/retinanet_R_50_FPN_1x.yaml) | 37.0 | [model](https://download.openmmlab.com/mmdetection/v2.0/benchmark/retinanet_r50_caffe_fpn_mstrain_1x_coco/retinanet_r50_caffe_fpn_mstrain_1x_coco-586977a0.pth) \| [log](https://download.openmmlab.com/mmdetection/v2.0/benchmark/retinanet_r50_caffe_fpn_mstrain_1x_coco/retinanet_r50_caffe_fpn_mstrain_1x_coco_20200430_014748.log.json) | +| Type | Lr schd | Detectron2 | mmdetection | Download | +| ------------------------------------------------------------------------------------------------------------------------------- | ------- | ------------------------------------------------------------------------------------------------------------------------------------ | ----------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| [Faster R-CNN](https://github.com/open-mmlab/mmdetection/blob/main/configs/faster_rcnn/faster-rcnn_r50-caffe_fpn_ms-1x_coco.py) | 1x | [37.9](https://github.com/facebookresearch/detectron2/blob/main/configs/COCO-Detection/faster_rcnn_R_50_FPN_1x.yaml) | 38.0 | [model](https://download.openmmlab.com/mmdetection/v2.0/benchmark/faster_rcnn_r50_caffe_fpn_mstrain_1x_coco/faster_rcnn_r50_caffe_fpn_mstrain_1x_coco-5324cff8.pth) \| [log](https://download.openmmlab.com/mmdetection/v2.0/benchmark/faster_rcnn_r50_caffe_fpn_mstrain_1x_coco/faster_rcnn_r50_caffe_fpn_mstrain_1x_coco_20200429_234554.log.json) | +| [Mask R-CNN](https://github.com/open-mmlab/mmdetection/blob/main/configs/mask_rcnn/mask-rcnn_r50-caffe_fpn_ms-poly-1x_coco.py) | 1x | [38.6 & 35.2](https://github.com/facebookresearch/detectron2/blob/main/configs/COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_1x.yaml) | 38.8 & 35.4 | [model](https://download.openmmlab.com/mmdetection/v2.0/benchmark/mask_rcnn_r50_caffe_fpn_mstrain-poly_1x_coco/mask_rcnn_r50_caffe_fpn_mstrain-poly_1x_coco-dbecf295.pth) \| [log](https://download.openmmlab.com/mmdetection/v2.0/benchmark/mask_rcnn_r50_caffe_fpn_mstrain-poly_1x_coco/mask_rcnn_r50_caffe_fpn_mstrain-poly_1x_coco_20200430_054239.log.json) | +| [Retinanet](https://github.com/open-mmlab/mmdetection/blob/main/configs/retinanet/retinanet_r50-caffe_fpn_ms-1x_coco.py) | 1x | [36.5](https://github.com/facebookresearch/detectron2/blob/master/configs/COCO-Detection/retinanet_R_50_FPN_1x.yaml) | 37.0 | [model](https://download.openmmlab.com/mmdetection/v2.0/benchmark/retinanet_r50_caffe_fpn_mstrain_1x_coco/retinanet_r50_caffe_fpn_mstrain_1x_coco-586977a0.pth) \| [log](https://download.openmmlab.com/mmdetection/v2.0/benchmark/retinanet_r50_caffe_fpn_mstrain_1x_coco/retinanet_r50_caffe_fpn_mstrain_1x_coco_20200430_014748.log.json) | ### Training Speed diff --git a/docs/en/notes/changelog.md b/docs/en/notes/changelog.md index 4e8bb27a742..ded9dc30189 100644 --- a/docs/en/notes/changelog.md +++ b/docs/en/notes/changelog.md @@ -1,5 +1,51 @@ # Changelog of v3.x +## v3.0.0 (6/4/2023) + +### Highlights + +- Support Semi-automatic annotation Base [Label-Studio](../../../projects/LabelStudio) (#10039) +- Support [EfficientDet](../../../projects/EfficientDet) in projects (#9810) + +### New Features + +- File I/O migration and reconstruction (#9709) +- Release DINO Swin-L 36e model (#9927) + +### Bug Fixes + +- Fix benchmark script (#9865) +- Fix the crop method of PolygonMasks (#9858) +- Fix Albu augmentation with the mask shape (#9918) +- Fix `RTMDetIns` prior generator device error (#9964) +- Fix `img_shape` in data pipeline (#9966) +- Fix cityscapes import error (#9984) +- Fix `solov2_r50_fpn_ms-3x_coco.py` config error (#10030) +- Fix Conditional DETR AP and Log (#9889) +- Fix accepting an unexpected argument local-rank in PyTorch 2.0 (#10050) +- Fix `common/ms_3x_coco-instance.py` config error (#10056) +- Fix compute flops error (#10051) +- Delete `data_root` in `CocoOccludedSeparatedMetric` to fix bug (#9969) +- Unifying metafile.yml (#9849) + +### Improvements + +- Added BoxInst r101 config (#9967) +- Added config migration guide (#9960) +- Added more social networking links (#10021) +- Added RTMDet config introduce (#10042) +- Added visualization docs (#9938, #10058) +- Refined data_prepare docs (#9935) +- Added support for setting the cache_size_limit parameter of dynamo in PyTorch 2.0 (#10054) +- Updated coco_metric.py (#10033) +- Update type hint (#10040) + +### Contributors + +A total of 19 developers contributed to this release. + +Thanks @IRONICBo, @vansin, @RangeKing, @Ghlerrix, @okotaku, @JosonChan1998, @zgzhengSE, @bobo0810, @yechenzh, @Zheng-LinXiao, @LYMDLUT, @yarkable, @xiejiajiannb, @chhluo, @BIGWangYuDong, @RangiLy, @zwhus, @hhaAndroid, @ZwwWayne + ## v3.0.0rc6 (24/2/2023) ### Highlights diff --git a/docs/en/notes/changelog_v2.x.md b/docs/en/notes/changelog_v2.x.md index af2e048f7b2..2b3a230c0d9 100644 --- a/docs/en/notes/changelog_v2.x.md +++ b/docs/en/notes/changelog_v2.x.md @@ -133,7 +133,7 @@ Thanks @ZwwWayne, @DarthThomas, @solyaH, @LutingWang, @chenxinfeng4, @Czm369, @C data=dict(train_dataloader=dict(class_aware_sampler=dict(num_sample_class=1)))) ``` - in the config to use `ClassAwareSampler`. Examples can be found in [the configs of OpenImages Dataset](https://github.com/open-mmlab/mmdetection/tree/master/configs/openimages/faster_rcnn_r50_fpn_32x2_cas_1x_openimages.py). (#7436) + in the config to use `ClassAwareSampler`. Examples can be found in [the configs of OpenImages Dataset](https://github.com/open-mmlab/mmdetection/tree/main/configs/openimages/faster_rcnn_r50_fpn_32x2_cas_1x_openimages.py). (#7436) - Support automatically scaling LR according to GPU number and samples per GPU. (#7482) In each config, there is a corresponding config of auto-scaling LR as below, diff --git a/docs/en/notes/compatibility.md b/docs/en/notes/compatibility.md index a545a495fd3..26325e249dc 100644 --- a/docs/en/notes/compatibility.md +++ b/docs/en/notes/compatibility.md @@ -75,7 +75,7 @@ MMDetection v2.12.0 relies on the newest features in MMCV 1.3.3, including `Base ### Unified model initialization -To unify the parameter initialization in OpenMMLab projects, MMCV supports `BaseModule` that accepts `init_cfg` to allow the modules' parameters initialized in a flexible and unified manner. Now the users need to explicitly call `model.init_weights()` in the training script to initialize the model (as in [here](https://github.com/open-mmlab/mmdetection/blob/master/tools/train.py#L162), previously this was handled by the detector. **The downstream projects must update their model initialization accordingly to use MMDetection v2.12.0**. Please refer to PR #4750 for details. +To unify the parameter initialization in OpenMMLab projects, MMCV supports `BaseModule` that accepts `init_cfg` to allow the modules' parameters initialized in a flexible and unified manner. Now the users need to explicitly call `model.init_weights()` in the training script to initialize the model (as in [here](https://github.com/open-mmlab/mmdetection/blob/main/tools/train.py#L162), previously this was handled by the detector. **The downstream projects must update their model initialization accordingly to use MMDetection v2.12.0**. Please refer to PR #4750 for details. ### Unified model registry diff --git a/docs/en/notes/faq.md b/docs/en/notes/faq.md index f93b4a84f47..aa473c2f3da 100644 --- a/docs/en/notes/faq.md +++ b/docs/en/notes/faq.md @@ -1,28 +1,65 @@ # Frequently Asked Questions -We list some common troubles faced by many users and their corresponding solutions here. Feel free to enrich the list if you find any frequent issues and have ways to help others to solve them. If the contents here do not cover your issue, please create an issue using the [provided templates](https://github.com/open-mmlab/mmdetection/blob/master/.github/ISSUE_TEMPLATE/error-report.md/) and make sure you fill in all required information in the template. +We list some common troubles faced by many users and their corresponding solutions here. Feel free to enrich the list if you find any frequent issues and have ways to help others to solve them. If the contents here do not cover your issue, please create an issue using the [provided templates](https://github.com/open-mmlab/mmdetection/blob/main/.github/ISSUE_TEMPLATE/error-report.md/) and make sure you fill in all required information in the template. + +## PyTorch 2.0 Support + +The vast majority of algorithms in MMDetection now support PyTorch 2.0 and its `torch.compile` function. Users only need to install MMDetection 3.0.0rc7 or later versions to enjoy this feature. If any unsupported algorithms are found during use, please feel free to give us feedback. We also welcome contributions from the community to benchmark the speed improvement brought by using the `torch.compile` function. + +To enable the `torch.compile` function, simply add `--cfg-options compile=True` after `train.py` or `test.py`. For example, to enable `torch.compile` for RTMDet, you can use the following command: + +```shell +# Single GPU +python tools/train.py configs/rtmdet/rtmdet_s_8xb32-300e_coco.py --cfg-options compile=True + +# Single node multiple GPUs +./tools/dist_train.sh configs/rtmdet/rtmdet_s_8xb32-300e_coco.py 8 --cfg-options compile=True + +# Single node multiple GPUs + AMP +./tools/dist_train.sh configs/rtmdet/rtmdet_s_8xb32-300e_coco.py 8 --cfg-options compile=True --amp +``` + +It is important to note that PyTorch 2.0's support for dynamic shapes is not yet fully developed. In most object detection algorithms, not only are the input shapes dynamic, but the loss calculation and post-processing parts are also dynamic. This can lead to slower training speeds when using the `torch.compile` function. Therefore, if you wish to enable the `torch.compile` function, you should follow these principles: + +1. Input images to the network are fixed shape, not multi-scale +2. set `torch._dynamo.config.cache_size_limit` parameter. TorchDynamo will convert and cache the Python bytecode, and the compiled functions will be stored in the cache. When the next check finds that the function needs to be recompiled, the function will be recompiled and cached. However, if the number of recompilations exceeds the maximum value set (64), the function will no longer be cached or recompiled. As mentioned above, the loss calculation and post-processing parts of the object detection algorithm are also dynamically calculated, and these functions need to be recompiled every time. Therefore, setting the `torch._dynamo.config.cache_size_limit` parameter to a smaller value can effectively reduce the compilation time + +In MMDetection, you can set the `torch._dynamo.config.cache_size_limit` parameter through the environment variable `DYNAMO_CACHE_SIZE_LIMIT`. For example, the command is as follows: + +```shell +# Single GPU +export DYNAMO_CACHE_SIZE_LIMIT = 4 +python tools/train.py configs/rtmdet/rtmdet_s_8xb32-300e_coco.py --cfg-options compile=True + +# Single node multiple GPUs +export DYNAMO_CACHE_SIZE_LIMIT = 4 +./tools/dist_train.sh configs/rtmdet/rtmdet_s_8xb32-300e_coco.py 8 --cfg-options compile=True +``` + +About the common questions about PyTorch 2.0's dynamo, you can refer to [here](https://pytorch.org/docs/stable/dynamo/faq.html) ## Installation -- Compatibility issue between MMCV and MMDetection; "ConvWS is already registered in conv layer"; "AssertionError: MMCV==xxx is used but incompatible. Please install mmcv>=xxx, \<=xxx." +Compatibility issue between MMCV and MMDetection; "ConvWS is already registered in conv layer"; "AssertionError: MMCV==xxx is used but incompatible. Please install mmcv>=xxx, \<=xxx." - Compatible MMDetection, MMEngine, and MMCV versions are shown as below. Please choose the correct version of MMCV to avoid installation issues. +Compatible MMDetection, MMEngine, and MMCV versions are shown as below. Please choose the correct version of MMCV to avoid installation issues. - | MMDetection version | MMCV version | MMEngine version | - | :-----------------: | :---------------------: | :----------------------: | - | 3.x | mmcv>=2.0.0rc4, \<2.1.0 | mmengine>=0.6.0, \<1.0.0 | - | 3.0.0rc6 | mmcv>=2.0.0rc4, \<2.1.0 | mmengine>=0.6.0, \<1.0.0 | - | 3.0.0rc5 | mmcv>=2.0.0rc1, \<2.1.0 | mmengine>=0.3.0, \<1.0.0 | - | 3.0.0rc4 | mmcv>=2.0.0rc1, \<2.1.0 | mmengine>=0.3.0, \<1.0.0 | - | 3.0.0rc3 | mmcv>=2.0.0rc1, \<2.1.0 | mmengine>=0.3.0, \<1.0.0 | - | 3.0.0rc2 | mmcv>=2.0.0rc1, \<2.1.0 | mmengine>=0.1.0, \<1.0.0 | - | 3.0.0rc1 | mmcv>=2.0.0rc1, \<2.1.0 | mmengine>=0.1.0, \<1.0.0 | - | 3.0.0rc0 | mmcv>=2.0.0rc1, \<2.1.0 | mmengine>=0.1.0, \<1.0.0 | +| MMDetection version | MMCV version | MMEngine version | +| :-----------------: | :---------------------: | :----------------------: | +| main | mmcv>=2.0.0, \<2.1.0 | mmengine>=0.7.1, \<1.0.0 | +| 3.x | mmcv>=2.0.0, \<2.1.0 | mmengine>=0.7.1, \<1.0.0 | +| 3.0.0rc6 | mmcv>=2.0.0rc4, \<2.1.0 | mmengine>=0.6.0, \<1.0.0 | +| 3.0.0rc5 | mmcv>=2.0.0rc1, \<2.1.0 | mmengine>=0.3.0, \<1.0.0 | +| 3.0.0rc4 | mmcv>=2.0.0rc1, \<2.1.0 | mmengine>=0.3.0, \<1.0.0 | +| 3.0.0rc3 | mmcv>=2.0.0rc1, \<2.1.0 | mmengine>=0.3.0, \<1.0.0 | +| 3.0.0rc2 | mmcv>=2.0.0rc1, \<2.1.0 | mmengine>=0.1.0, \<1.0.0 | +| 3.0.0rc1 | mmcv>=2.0.0rc1, \<2.1.0 | mmengine>=0.1.0, \<1.0.0 | +| 3.0.0rc0 | mmcv>=2.0.0rc1, \<2.1.0 | mmengine>=0.1.0, \<1.0.0 | - **Note:** +**Note:** - 1. If you want to install mmdet-v2.x, the compatible MMDetection and MMCV versions table can be found at [here](https://mmdetection.readthedocs.io/en/stable/faq.html#installation). Please choose the correct version of MMCV to avoid installation issues. - 2. In MMCV-v2.x, `mmcv-full` is rename to `mmcv`, if you want to install `mmcv` without CUDA ops, you can install `mmcv-lite`. +1. If you want to install mmdet-v2.x, the compatible MMDetection and MMCV versions table can be found at [here](https://mmdetection.readthedocs.io/en/stable/faq.html#installation). Please choose the correct version of MMCV to avoid installation issues. +2. In MMCV-v2.x, `mmcv-full` is rename to `mmcv`, if you want to install `mmcv` without CUDA ops, you can install `mmcv-lite`. - "No module named 'mmcv.ops'"; "No module named 'mmcv.\_ext'". @@ -169,7 +206,7 @@ We list some common troubles faced by many users and their corresponding solutio - Save the best model - It can be turned on by configuring `default_hooks = dict(checkpoint=dict(type='CheckpointHook', interval=1, save_best='auto'),`. In the case of the `auto` parameter, the first key in the returned evaluation result will be used as the basis for selecting the best model. You can also directly set the key in the evaluation result to manually set it, for example, `save_best='mAP'`. + It can be turned on by configuring `default_hooks = dict(checkpoint=dict(type='CheckpointHook', interval=1, save_best='auto'),`. In the case of the `auto` parameter, the first key in the returned evaluation result will be used as the basis for selecting the best model. You can also directly set the key in the evaluation result to manually set it, for example, `save_best='coco/bbox_mAP'`. ## Evaluation diff --git a/docs/en/overview.md b/docs/en/overview.md index f78b658f017..7c7d96b7087 100644 --- a/docs/en/overview.md +++ b/docs/en/overview.md @@ -42,11 +42,13 @@ Here is a detailed step-by-step guide to learn more about MMDetection: 2. Refer to the below tutorials for the basic usage of MMDetection. - - [Train and Test](https://mmdetection.readthedocs.io/en/dev-3.x/user_guides/index.html#train-test) + - [Train and Test](https://mmdetection.readthedocs.io/en/latest/user_guides/index.html#train-test) - - [Useful Tools](https://mmdetection.readthedocs.io/en/dev-3.x/user_guides/index.html#useful-tools) + - [Useful Tools](https://mmdetection.readthedocs.io/en/latest/user_guides/index.html#useful-tools) 3. Refer to the below tutorials to dive deeper: - - [Basic Concepts](https://mmdetection.readthedocs.io/en/dev-3.x/advanced_guides/index.html#basic-concepts) - - [Component Customization](https://mmdetection.readthedocs.io/en/dev-3.x/advanced_guides/index.html#component-customization) + - [Basic Concepts](https://mmdetection.readthedocs.io/en/latest/advanced_guides/index.html#basic-concepts) + - [Component Customization](https://mmdetection.readthedocs.io/en/latest/advanced_guides/index.html#component-customization) + +4. For users of MMDetection 2.x version, we provide a guide to help you adapt to the new version. You can find it in the [migration guide](./migration/migration.md). diff --git a/docs/en/stat.py b/docs/en/stat.py index 44f03b6616c..f0589e337e0 100755 --- a/docs/en/stat.py +++ b/docs/en/stat.py @@ -6,7 +6,7 @@ import numpy as np -url_prefix = 'https://github.com/open-mmlab/mmdetection/blob/3.x/configs' +url_prefix = 'https://github.com/open-mmlab/mmdetection/blob/main/configs' files = sorted(glob.glob('../../configs/*/README.md')) diff --git a/docs/en/user_guides/config.md b/docs/en/user_guides/config.md index d08b2a731eb..69bd91194e0 100644 --- a/docs/en/user_guides/config.md +++ b/docs/en/user_guides/config.md @@ -14,14 +14,14 @@ In MMDetection's config, we use `model` to set up detection algorithm components model = dict( type='MaskRCNN', # The name of detector data_preprocessor=dict( # The config of data preprocessor, usually includes image normalization and padding - type='DetDataPreprocessor', # The type of the data preprocessor, refer to https://mmdetection.readthedocs.io/en/3.x/api.html#mmdet.models.data_preprocessors.DetDataPreprocessor + type='DetDataPreprocessor', # The type of the data preprocessor, refer to https://mmdetection.readthedocs.io/en/latest/api.html#mmdet.models.data_preprocessors.DetDataPreprocessor mean=[123.675, 116.28, 103.53], # Mean values used to pre-training the pre-trained backbone models, ordered in R, G, B std=[58.395, 57.12, 57.375], # Standard variance used to pre-training the pre-trained backbone models, ordered in R, G, B bgr_to_rgb=True, # whether to convert image from BGR to RGB pad_mask=True, # whether to pad instance masks pad_size_divisor=32), # The size of padded image should be divisible by ``pad_size_divisor`` backbone=dict( # The config of backbone - type='ResNet', # The type of backbone network. Refer to https://mmdetection.readthedocs.io/en/3.x/api.html#mmdet.models.backbones.ResNet + type='ResNet', # The type of backbone network. Refer to https://mmdetection.readthedocs.io/en/latest/api.html#mmdet.models.backbones.ResNet depth=50, # The depth of backbone, usually it is 50 or 101 for ResNet and ResNext backbones. num_stages=4, # Number of stages of the backbone. out_indices=(0, 1, 2, 3), # The index of output feature maps produced in each stage @@ -33,34 +33,34 @@ model = dict( style='pytorch', # The style of backbone, 'pytorch' means that stride 2 layers are in 3x3 Conv, 'caffe' means stride 2 layers are in 1x1 Convs. init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet50')), # The ImageNet pretrained backbone to be loaded neck=dict( - type='FPN', # The neck of detector is FPN. We also support 'NASFPN', 'PAFPN', etc. Refer to https://mmdetection.readthedocs.io/en/3.x/api.html#mmdet.models.necks.FPN for more details. + type='FPN', # The neck of detector is FPN. We also support 'NASFPN', 'PAFPN', etc. Refer to https://mmdetection.readthedocs.io/en/latest/api.html#mmdet.models.necks.FPN for more details. in_channels=[256, 512, 1024, 2048], # The input channels, this is consistent with the output channels of backbone out_channels=256, # The output channels of each level of the pyramid feature map num_outs=5), # The number of output scales rpn_head=dict( - type='RPNHead', # The type of RPN head is 'RPNHead', we also support 'GARPNHead', etc. Refer to https://mmdetection.readthedocs.io/en/3.x/api.html#mmdet.models.dense_heads.RPNHead for more details. + type='RPNHead', # The type of RPN head is 'RPNHead', we also support 'GARPNHead', etc. Refer to https://mmdetection.readthedocs.io/en/latest/api.html#mmdet.models.dense_heads.RPNHead for more details. in_channels=256, # The input channels of each input feature map, this is consistent with the output channels of neck feat_channels=256, # Feature channels of convolutional layers in the head. anchor_generator=dict( # The config of anchor generator - type='AnchorGenerator', # Most of methods use AnchorGenerator, SSD Detectors uses `SSDAnchorGenerator`. Refer to https://github.com/open-mmlab/mmdetection/blob/3.x/mmdet/models/task_modules/prior_generators/anchor_generator.py#L18 for more details + type='AnchorGenerator', # Most of methods use AnchorGenerator, SSD Detectors uses `SSDAnchorGenerator`. Refer to https://github.com/open-mmlab/mmdetection/blob/main/mmdet/models/task_modules/prior_generators/anchor_generator.py#L18 for more details scales=[8], # Basic scale of the anchor, the area of the anchor in one position of a feature map will be scale * base_sizes ratios=[0.5, 1.0, 2.0], # The ratio between height and width. strides=[4, 8, 16, 32, 64]), # The strides of the anchor generator. This is consistent with the FPN feature strides. The strides will be taken as base_sizes if base_sizes is not set. bbox_coder=dict( # Config of box coder to encode and decode the boxes during training and testing - type='DeltaXYWHBBoxCoder', # Type of box coder. 'DeltaXYWHBBoxCoder' is applied for most of the methods. Refer to https://github.com/open-mmlab/mmdetection/blob/3.x/mmdet/models/task_modules/coders/delta_xywh_bbox_coder.py#L13 for more details. + type='DeltaXYWHBBoxCoder', # Type of box coder. 'DeltaXYWHBBoxCoder' is applied for most of the methods. Refer to https://github.com/open-mmlab/mmdetection/blob/main/mmdet/models/task_modules/coders/delta_xywh_bbox_coder.py#L13 for more details. target_means=[0.0, 0.0, 0.0, 0.0], # The target means used to encode and decode boxes target_stds=[1.0, 1.0, 1.0, 1.0]), # The standard variance used to encode and decode boxes loss_cls=dict( # Config of loss function for the classification branch - type='CrossEntropyLoss', # Type of loss for classification branch, we also support FocalLoss etc. Refer to https://github.com/open-mmlab/mmdetection/blob/3.x/mmdet/models/losses/cross_entropy_loss.py#L201 for more details + type='CrossEntropyLoss', # Type of loss for classification branch, we also support FocalLoss etc. Refer to https://github.com/open-mmlab/mmdetection/blob/main/mmdet/models/losses/cross_entropy_loss.py#L201 for more details use_sigmoid=True, # RPN usually performs two-class classification, so it usually uses the sigmoid function. loss_weight=1.0), # Loss weight of the classification branch. loss_bbox=dict( # Config of loss function for the regression branch. - type='L1Loss', # Type of loss, we also support many IoU Losses and smooth L1-loss, etc. Refer to https://github.com/open-mmlab/mmdetection/blob/3.x/mmdet/models/losses/smooth_l1_loss.py#L56 for implementation. + type='L1Loss', # Type of loss, we also support many IoU Losses and smooth L1-loss, etc. Refer to https://github.com/open-mmlab/mmdetection/blob/main/mmdet/models/losses/smooth_l1_loss.py#L56 for implementation. loss_weight=1.0)), # Loss weight of the regression branch. roi_head=dict( # RoIHead encapsulates the second stage of two-stage/cascade detectors. type='StandardRoIHead', bbox_roi_extractor=dict( # RoI feature extractor for bbox regression. - type='SingleRoIExtractor', # Type of the RoI feature extractor, most of methods uses SingleRoIExtractor. Refer to https://github.com/open-mmlab/mmdetection/blob/3.x/mmdet/models/roi_heads/roi_extractors/single_level_roi_extractor.py#L13 for details. + type='SingleRoIExtractor', # Type of the RoI feature extractor, most of methods uses SingleRoIExtractor. Refer to https://github.com/open-mmlab/mmdetection/blob/main/mmdet/models/roi_heads/roi_extractors/single_level_roi_extractor.py#L13 for details. roi_layer=dict( # Config of RoI Layer type='RoIAlign', # Type of RoI Layer, DeformRoIPoolingPack and ModulatedDeformRoIPoolingPack are also supported. Refer to https://mmcv.readthedocs.io/en/latest/api.html#mmcv.ops.RoIAlign for details. output_size=7, # The output size of feature maps. @@ -68,7 +68,7 @@ model = dict( out_channels=256, # output channels of the extracted feature. featmap_strides=[4, 8, 16, 32]), # Strides of multi-scale feature maps. It should be consistent with the architecture of the backbone. bbox_head=dict( # Config of box head in the RoIHead. - type='Shared2FCBBoxHead', # Type of the bbox head, Refer to https://github.com/open-mmlab/mmdetection/blob/3.x/mmdet/models/roi_heads/bbox_heads/convfc_bbox_head.py#L220 for implementation details. + type='Shared2FCBBoxHead', # Type of the bbox head, Refer to https://github.com/open-mmlab/mmdetection/blob/main/mmdet/models/roi_heads/bbox_heads/convfc_bbox_head.py#L220 for implementation details. in_channels=256, # Input channels for bbox head. This is consistent with the out_channels in roi_extractor fc_out_channels=1024, # Output feature channels of FC layers. roi_feat_size=7, # Size of RoI features @@ -94,7 +94,7 @@ model = dict( out_channels=256, # Output channels of the extracted feature. featmap_strides=[4, 8, 16, 32]), # Strides of multi-scale feature maps. mask_head=dict( # Mask prediction head - type='FCNMaskHead', # Type of mask head, refer to https://mmdetection.readthedocs.io/en/3.x/api.html#mmdet.models.roi_heads.FCNMaskHead for implementation details. + type='FCNMaskHead', # Type of mask head, refer to https://mmdetection.readthedocs.io/en/latest/api.html#mmdet.models.roi_heads.FCNMaskHead for implementation details. num_convs=4, # Number of convolutional layers in mask head. in_channels=256, # Input channels, should be consistent with the output channels of mask roi extractor. conv_out_channels=256, # Output channels of the convolutional layer. @@ -106,14 +106,14 @@ model = dict( train_cfg = dict( # Config of training hyperparameters for rpn and rcnn rpn=dict( # Training config of rpn assigner=dict( # Config of assigner - type='MaxIoUAssigner', # Type of assigner, MaxIoUAssigner is used for many common detectors. Refer to https://github.com/open-mmlab/mmdetection/blob/3.x/mmdet/models/task_modules/assigners/max_iou_assigner.py#L14 for more details. + type='MaxIoUAssigner', # Type of assigner, MaxIoUAssigner is used for many common detectors. Refer to https://github.com/open-mmlab/mmdetection/blob/main/mmdet/models/task_modules/assigners/max_iou_assigner.py#L14 for more details. pos_iou_thr=0.7, # IoU >= threshold 0.7 will be taken as positive samples neg_iou_thr=0.3, # IoU < threshold 0.3 will be taken as negative samples min_pos_iou=0.3, # The minimal IoU threshold to take boxes as positive samples match_low_quality=True, # Whether to match the boxes under low quality (see API doc for more details). ignore_iof_thr=-1), # IoF threshold for ignoring bboxes sampler=dict( # Config of positive/negative sampler - type='RandomSampler', # Type of sampler, PseudoSampler and other samplers are also supported. Refer to https://github.com/open-mmlab/mmdetection/blob/3.x/mmdet/models/task_modules/samplers/random_sampler.py#L14 for implementation details. + type='RandomSampler', # Type of sampler, PseudoSampler and other samplers are also supported. Refer to https://github.com/open-mmlab/mmdetection/blob/main/mmdet/models/task_modules/samplers/random_sampler.py#L14 for implementation details. num=256, # Number of samples pos_fraction=0.5, # The ratio of positive samples in the total samples. neg_pos_ub=-1, # The upper bound of negative samples based on the number of positive samples. @@ -133,14 +133,14 @@ model = dict( min_bbox_size=0), # The allowed minimal box size rcnn=dict( # The config for the roi heads. assigner=dict( # Config of assigner for second stage, this is different for that in rpn - type='MaxIoUAssigner', # Type of assigner, MaxIoUAssigner is used for all roi_heads for now. Refer to https://github.com/open-mmlab/mmdetection/blob/3.x/mmdet/models/task_modules/assigners/max_iou_assigner.py#L14 for more details. + type='MaxIoUAssigner', # Type of assigner, MaxIoUAssigner is used for all roi_heads for now. Refer to https://github.com/open-mmlab/mmdetection/blob/main/mmdet/models/task_modules/assigners/max_iou_assigner.py#L14 for more details. pos_iou_thr=0.5, # IoU >= threshold 0.5 will be taken as positive samples neg_iou_thr=0.5, # IoU < threshold 0.5 will be taken as negative samples min_pos_iou=0.5, # The minimal IoU threshold to take boxes as positive samples match_low_quality=False, # Whether to match the boxes under low quality (see API doc for more details). ignore_iof_thr=-1), # IoF threshold for ignoring bboxes sampler=dict( - type='RandomSampler', # Type of sampler, PseudoSampler and other samplers are also supported. Refer to https://github.com/open-mmlab/mmdetection/blob/3.x/mmdet/models/task_modules/samplers/random_sampler.py#L14 for implementation details. + type='RandomSampler', # Type of sampler, PseudoSampler and other samplers are also supported. Refer to https://github.com/open-mmlab/mmdetection/blob/main/mmdet/models/task_modules/samplers/random_sampler.py#L14 for implementation details. num=512, # Number of samples pos_fraction=0.25, # The ratio of positive samples in the total samples. neg_pos_ub=-1, # The upper bound of negative samples based on the number of positive samples. @@ -176,10 +176,10 @@ model = dict( ```python dataset_type = 'CocoDataset' # Dataset type, this will be used to define the dataset data_root = 'data/coco/' # Root path of data -file_client_args = dict(backend='disk') # file client arguments +backend_args = None # Arguments to instantiate the corresponding file backend train_pipeline = [ # Training data processing pipeline - dict(type='LoadImageFromFile', file_client_args=file_client_args), # First pipeline to load images from file path + dict(type='LoadImageFromFile', backend_args=backend_args), # First pipeline to load images from file path dict( type='LoadAnnotations', # Second pipeline to load annotations for current image with_bbox=True, # Whether to use bounding box, True for detection @@ -196,7 +196,7 @@ train_pipeline = [ # Training data processing pipeline dict(type='PackDetInputs') # Pipeline that formats the annotation data and decides which keys in the data should be packed into data_samples ] test_pipeline = [ # Testing data processing pipeline - dict(type='LoadImageFromFile', file_client_args=file_client_args), # First pipeline to load images from file path + dict(type='LoadImageFromFile', backend_args=backend_args), # First pipeline to load images from file path dict(type='Resize', scale=(1333, 800), keep_ratio=True), # Pipeline that resizes the images dict( type='PackDetInputs', # Pipeline that formats the annotation data and decides which keys in the data should be packed into data_samples @@ -217,7 +217,8 @@ train_dataloader = dict( # Train dataloader config ann_file='annotations/instances_train2017.json', # Path of annotation file data_prefix=dict(img='train2017/'), # Prefix of image path filter_cfg=dict(filter_empty_gt=True, min_size=32), # Config of filtering images and annotations - pipeline=train_pipeline)) + pipeline=train_pipeline, + backend_args=backend_args)) val_dataloader = dict( # Validation dataloader config batch_size=1, # Batch size of a single GPU. If batch-size > 1, the extra padding area may influence the performance. num_workers=2, # Worker to pre-fetch data for each single GPU @@ -232,7 +233,8 @@ val_dataloader = dict( # Validation dataloader config ann_file='annotations/instances_val2017.json', data_prefix=dict(img='val2017/'), test_mode=True, # Turn on the test mode of the dataset to avoid filtering annotations or images - pipeline=test_pipeline)) + pipeline=test_pipeline, + backend_args=backend_args)) test_dataloader = val_dataloader # Testing dataloader config ``` @@ -243,7 +245,8 @@ val_evaluator = dict( # Validation evaluator config type='CocoMetric', # The coco metric used to evaluate AR, AP, and mAP for detection and instance segmentation ann_file=data_root + 'annotations/instances_val2017.json', # Annotation file path metric=['bbox', 'segm'], # Metrics to be evaluated, `bbox` for detection and `segm` for instance segmentation - format_only=False) + format_only=False, + backend_args=backend_args) test_evaluator = val_evaluator # Testing evaluator config ``` @@ -529,7 +532,7 @@ train_pipeline = [ dict(type='PackDetInputs') ] test_pipeline = [ - dict(type='LoadImageFromFile', file_client_args=file_client_args), + dict(type='LoadImageFromFile'), dict(type='Resize', scale=(1333, 800), keep_ratio=True), dict( type='PackDetInputs', diff --git a/docs/en/user_guides/deploy.md b/docs/en/user_guides/deploy.md index ab525c278fd..94c078882e3 100644 --- a/docs/en/user_guides/deploy.md +++ b/docs/en/user_guides/deploy.md @@ -15,7 +15,7 @@ This tutorial is organized as follows: ## Installation -Please follow the [guide](https://mmdetection.readthedocs.io/en/3.x/get_started.html) to install mmdet. And then install mmdeploy from source by following [this](https://mmdeploy.readthedocs.io/en/1.x/get_started.html#installation) guide. +Please follow the [guide](https://mmdetection.readthedocs.io/en/latest/get_started.html) to install mmdet. And then install mmdeploy from source by following [this](https://mmdeploy.readthedocs.io/en/1.x/get_started.html#installation) guide. ```{note} If you install mmdeploy prebuilt package, please also clone its repository by 'git clone https://github.com/open-mmlab/mmdeploy.git --depth=1' to get the deployment config files. @@ -25,7 +25,7 @@ If you install mmdeploy prebuilt package, please also clone its repository by 'g Suppose mmdetection and mmdeploy repositories are in the same directory, and the working directory is the root path of mmdetection. -Take [Faster R-CNN](https://github.com/open-mmlab/mmdetection/blob/3.x/configs/faster_rcnn/faster-rcnn_r50_fpn_1x_coco.py) model as an example. You can download its checkpoint from [here](https://download.openmmlab.com/mmdetection/v2.0/faster_rcnn/faster_rcnn_r50_fpn_1x_coco/faster_rcnn_r50_fpn_1x_coco_20200130-047c8118.pth), and then convert it to onnx model as follows: +Take [Faster R-CNN](https://github.com/open-mmlab/mmdetection/blob/main/configs/faster_rcnn/faster-rcnn_r50_fpn_1x_coco.py) model as an example. You can download its checkpoint from [here](https://download.openmmlab.com/mmdetection/v2.0/faster_rcnn/faster_rcnn_r50_fpn_1x_coco/faster_rcnn_r50_fpn_1x_coco_20200130-047c8118.pth), and then convert it to onnx model as follows: ```python from mmdeploy.apis import torch2onnx diff --git a/docs/en/user_guides/index.rst b/docs/en/user_guides/index.rst index 0a9582a4c7d..7986451893b 100644 --- a/docs/en/user_guides/index.rst +++ b/docs/en/user_guides/index.rst @@ -32,3 +32,4 @@ Useful Tools visualization.md robustness_benchmarking.md deploy.md + label_studio.md diff --git a/docs/en/user_guides/inference.md b/docs/en/user_guides/inference.md index 59a963d5d0f..33257ed5ed4 100644 --- a/docs/en/user_guides/inference.md +++ b/docs/en/user_guides/inference.md @@ -3,9 +3,9 @@ MMDetection provides hundreds of pre-trained detection models in [Model Zoo](https://mmdetection.readthedocs.io/en/latest/model_zoo.html). This note will show how to inference, which means using trained models to detect objects on images. -In MMDetection, a model is defined by a [configuration file](https://mmdetection.readthedocs.io/en/3.x/user_guides/config.html) and existing model parameters are saved in a checkpoint file. +In MMDetection, a model is defined by a [configuration file](https://mmdetection.readthedocs.io/en/latest/user_guides/config.html) and existing model parameters are saved in a checkpoint file. -To start with, we recommend [RTMDet](https://github.com/open-mmlab/mmdetection/tree/3.x/configs/rtmdet) with this [configuration file](https://github.com/open-mmlab/mmdetection/blob/3.x/configs/rtmdet/rtmdet_l_8xb32-300e_coco.py) and this [checkpoint file](https://download.openmmlab.com/mmdetection/v3.0/rtmdet/rtmdet_l_8xb32-300e_coco/rtmdet_l_8xb32-300e_coco_20220719_112030-5a0be7c4.pth). It is recommended to download the checkpoint file to `checkpoints` directory. +To start with, we recommend [RTMDet](https://github.com/open-mmlab/mmdetection/tree/main/configs/rtmdet) with this [configuration file](https://github.com/open-mmlab/mmdetection/blob/main/configs/rtmdet/rtmdet_l_8xb32-300e_coco.py) and this [checkpoint file](https://download.openmmlab.com/mmdetection/v3.0/rtmdet/rtmdet_l_8xb32-300e_coco/rtmdet_l_8xb32-300e_coco_20220719_112030-5a0be7c4.pth). It is recommended to download the checkpoint file to `checkpoints` directory. ## High-level APIs for inference @@ -84,14 +84,14 @@ for frame in track_iter_progress(video_reader): cv2.destroyAllWindows() ``` -A notebook demo can be found in [demo/inference_demo.ipynb](https://github.com/open-mmlab/mmdetection/blob/3.x/demo/inference_demo.ipynb). +A notebook demo can be found in [demo/inference_demo.ipynb](https://github.com/open-mmlab/mmdetection/blob/main/demo/inference_demo.ipynb). Note: `inference_detector` only supports single-image inference for now. ## Demos We also provide three demo scripts, implemented with high-level APIs and supporting functionality codes. -Source codes are available [here](https://github.com/open-mmlab/mmdetection/blob/3.x/demo). +Source codes are available [here](https://github.com/open-mmlab/mmdetection/blob/main/demo). ### Image demo diff --git a/docs/en/user_guides/label_studio.md b/docs/en/user_guides/label_studio.md new file mode 100644 index 00000000000..d4b37447349 --- /dev/null +++ b/docs/en/user_guides/label_studio.md @@ -0,0 +1,256 @@ +# Semi-automatic Object Detection Annotation with MMDetection and Label-Studio + +Annotation data is a time-consuming and laborious task. This article introduces how to perform semi-automatic annotation using the RTMDet algorithm in MMDetection in conjunction with Label-Studio software. Specifically, using RTMDet to predict image annotations and then refining the annotations with Label-Studio. Community users can refer to this process and methodology and apply it to other fields. + +- RTMDet: RTMDet is a high-precision single-stage object detection algorithm developed by OpenMMLab, open-sourced in the MMDetection object detection toolbox. Its open-source license is Apache 2.0, and it can be used freely without restrictions by industrial users. + +- [Label Studio](https://github.com/heartexlabs/label-studio) is an excellent annotation software covering the functionality of dataset annotation in areas such as image classification, object detection, and segmentation. + +In this article, we will use [cat](https://download.openmmlab.com/mmyolo/data/cat_dataset.zip) images for semi-automatic annotation. + +## Environment Configuration + +To begin with, you need to create a virtual environment and then install PyTorch and MMCV. In this article, we will specify the versions of PyTorch and MMCV. Next, you can install MMDetection, Label-Studio, and label-studio-ml-backend using the following steps: + +Create a virtual environment: + +```shell +conda create -n rtmdet python=3.9 -y +conda activate rtmdet +``` + +Install PyTorch: + +```shell +# Linux and Windows CPU only +pip install torch==1.10.1+cpu torchvision==0.11.2+cpu torchaudio==0.10.1 -f https://download.pytorch.org/whl/cpu/torch_stable.html +# Linux and Windows CUDA 11.3 +pip install torch==1.10.1+cu113 torchvision==0.11.2+cu113 torchaudio==0.10.1 -f https://download.pytorch.org/whl/cu113/torch_stable.html +# OSX +pip install torch==1.10.1 torchvision==0.11.2 torchaudio==0.10.1 +``` + +Install MMCV: + +```shell +pip install -U openmim +mim install "mmcv>=2.0.0" +# Installing mmcv will automatically install mmengine +``` + +Install MMDetection: + +```shell +git clone https://github.com/open-mmlab/mmdetection +cd mmdetection +pip install -v -e . +``` + +Install Label-Studio and label-studio-ml-backend: + +```shell +# Installing Label-Studio may take some time, if the version is not found, please use the official source +pip install label-studio==1.7.2 +pip install label-studio-ml==1.0.9 +``` + +Download the rtmdet weights: + +```shell +cd path/to/mmetection +mkdir work_dirs +cd work_dirs +wget https://download.openmmlab.com/mmdetection/v3.0/rtmdet/rtmdet_m_8xb32-300e_coco/rtmdet_m_8xb32-300e_coco_20220719_112220-229f527c.pth +``` + +## Start the Service + +Start the RTMDet backend inference service: + +```shell +cd path/to/mmetection + +label-studio-ml start projects/LabelStudio/backend_template --with \ +config_file=configs/rtmdet/rtmdet_m_8xb32-300e_coco.py \ +checkpoint_file=./work_dirs/rtmdet_m_8xb32-300e_coco_20220719_112220-229f527c.pth \ +device=cpu \ +--port 8003 +# Set device=cpu to use CPU inference, and replace cpu with cuda:0 to use GPU inference. +``` + +![](https://cdn.vansin.top/picgo20230330131601.png) + +The RTMDet backend inference service has now been started. To configure it in the Label-Studio web system, use http://localhost:8003 as the backend inference service. + +Now, start the Label-Studio web service: + +```shell +label-studio start +``` + +![](https://cdn.vansin.top/picgo20230330132913.png) + +Open your web browser and go to http://localhost:8080/ to see the Label-Studio interface. + +![](https://cdn.vansin.top/picgo20230330133118.png) + +Register a user and then create an RTMDet-Semiautomatic-Label project. + +![](https://cdn.vansin.top/picgo20230330133333.png) + +Download the example cat images by running the following command and import them using the Data Import button: + +```shell +cd path/to/mmetection +mkdir data && cd data + +wget https://download.openmmlab.com/mmyolo/data/cat_dataset.zip && unzip cat_dataset.zip +``` + +![](https://cdn.vansin.top/picgo20230330133628.png) + +![](https://cdn.vansin.top/picgo20230330133715.png) + +Then, select the Object Detection With Bounding Boxes template. + +![](https://cdn.vansin.top/picgo20230330133807.png) + +```shell +airplane +apple +backpack +banana +baseball_bat +baseball_glove +bear +bed +bench +bicycle +bird +boat +book +bottle +bowl +broccoli +bus +cake +car +carrot +cat +cell_phone +chair +clock +couch +cow +cup +dining_table +dog +donut +elephant +fire_hydrant +fork +frisbee +giraffe +hair_drier +handbag +horse +hot_dog +keyboard +kite +knife +laptop +microwave +motorcycle +mouse +orange +oven +parking_meter +person +pizza +potted_plant +refrigerator +remote +sandwich +scissors +sheep +sink +skateboard +skis +snowboard +spoon +sports_ball +stop_sign +suitcase +surfboard +teddy_bear +tennis_racket +tie +toaster +toilet +toothbrush +traffic_light +train +truck +tv +umbrella +vase +wine_glass +zebra +``` + +Then, copy and add the above categories to Label-Studio and click Save. + +![](https://cdn.vansin.top/picgo20230330134027.png) + +In the Settings, click Add Model to add the RTMDet backend inference service. + +![](https://cdn.vansin.top/picgo20230330134320.png) + +Click Validate and Save, and then click Start Labeling. + +![](https://cdn.vansin.top/picgo20230330134424.png) + +If you see Connected as shown below, the backend inference service has been successfully added. + +![](https://cdn.vansin.top/picgo20230330134554.png) + +## Start Semi-Automatic Labeling + +Click on Label to start labeling. + +![](https://cdn.vansin.top/picgo20230330134804.png) + +We can see that the RTMDet backend inference service has successfully returned the predicted results and displayed them on the image. However, we noticed that the predicted bounding boxes for the cats are a bit too large and not very accurate. + +![](https://cdn.vansin.top/picgo20230403104419.png) + +We manually adjust the position of the cat bounding box, and then click Submit to complete the annotation of this image. + +![](https://cdn.vansin.top/picgo/20230403105923.png) + +After submitting all images, click export to export the labeled dataset in COCO format. + +![](https://cdn.vansin.top/picgo20230330135921.png) + +Use VS Code to open the unzipped folder to see the labeled dataset, which includes the images and the annotation files in JSON format. + +![](https://cdn.vansin.top/picgo20230330140321.png) + +At this point, the semi-automatic labeling is complete. We can use this dataset to train a more accurate model in MMDetection and then continue semi-automatic labeling on newly collected images with this model. This way, we can iteratively expand the high-quality dataset and improve the accuracy of the model. + +## Use MMYOLO as the Backend Inference Service + +If you want to use Label-Studio in MMYOLO, you can refer to replacing the config_file and checkpoint_file with the configuration file and weight file of MMYOLO when starting the backend inference service. + +```shell +cd path/to/mmetection + +label-studio-ml start projects/LabelStudio/backend_template --with \ +config_file= path/to/mmyolo_config.py \ +checkpoint_file= path/to/mmyolo_weights.pth \ +device=cpu \ +--port 8003 +# device=cpu is for using CPU inference. If using GPU inference, replace cpu with cuda:0. +``` + +Rotation object detection and instance segmentation are still under development, please stay tuned. diff --git a/docs/en/user_guides/semi_det.md b/docs/en/user_guides/semi_det.md index 6cf5538e539..94ec3d670c8 100644 --- a/docs/en/user_guides/semi_det.md +++ b/docs/en/user_guides/semi_det.md @@ -117,7 +117,7 @@ We adopt a teacher-student joint training semi-supervised object detection frame # pipeline used to augment labeled data, # which will be sent to student model for supervised training. sup_pipeline = [ - dict(type='LoadImageFromFile', file_client_args=file_client_args), + dict(type='LoadImageFromFile', backend_args=backend_args), dict(type='LoadAnnotations', with_bbox=True), dict(type='RandomResize', scale=scale, keep_ratio=True), dict(type='RandomFlip', prob=0.5), @@ -164,7 +164,7 @@ strong_pipeline = [ # pipeline used to augment unlabeled data into different views unsup_pipeline = [ - dict(type='LoadImageFromFile', file_client_args=file_client_args), + dict(type='LoadImageFromFile', backend_args=backend_args), dict(type='LoadEmptyAnnotations'), dict( type='MultiBranch', diff --git a/docs/en/user_guides/test.md b/docs/en/user_guides/test.md index 333ccfbed1b..a7855e10ec7 100644 --- a/docs/en/user_guides/test.md +++ b/docs/en/user_guides/test.md @@ -55,7 +55,7 @@ Optional arguments: Assuming that you have already downloaded the checkpoints to the directory `checkpoints/`. 1. Test RTMDet and visualize the results. Press any key for the next image. - Config and checkpoint files are available [here](https://github.com/open-mmlab/mmdetection/tree/3.x/configs/rtmdet). + Config and checkpoint files are available [here](https://github.com/open-mmlab/mmdetection/tree/main/configs/rtmdet). ```shell python tools/test.py \ @@ -65,7 +65,7 @@ Assuming that you have already downloaded the checkpoints to the directory `chec ``` 2. Test RTMDet and save the painted images for future visualization. - Config and checkpoint files are available [here](https://github.com/open-mmlab/mmdetection/tree/3.x/configs/rtmdet). + Config and checkpoint files are available [here](https://github.com/open-mmlab/mmdetection/tree/main/configs/rtmdet). ```shell python tools/test.py \ @@ -79,7 +79,7 @@ Assuming that you have already downloaded the checkpoints to the directory `chec ```shell python tools/test.py \ - configs/pascal_voc/faster_rcnn_r50_fpn_1x_voc.py \ + configs/pascal_voc/faster-rcnn_r50_fpn_1x_voc0712.py \ checkpoints/faster_rcnn_r50_fpn_1x_voc0712_20200624-c9895d40.pth ``` @@ -219,7 +219,7 @@ tta_model = dict( tta_pipeline = [ dict(type='LoadImageFromFile', - file_client_args=dict(backend='disk')), + backend_args=None), dict( type='TestTimeAug', transforms=[[ @@ -274,7 +274,7 @@ tta_model = dict( img_scales = [(1333, 800), (666, 400), (2000, 1200)] tta_pipeline = [ dict(type='LoadImageFromFile', - file_client_args=dict(backend='disk')), + backend_args=None), dict( type='TestTimeAug', transforms=[[ diff --git a/docs/en/user_guides/train.md b/docs/en/user_guides/train.md index ec8181e8617..071a0b99720 100644 --- a/docs/en/user_guides/train.md +++ b/docs/en/user_guides/train.md @@ -436,7 +436,7 @@ To train a model with the new config, you can simply run python tools/train.py configs/balloon/mask-rcnn_r50-caffe_fpn_ms-poly-1x_balloon.py ``` -For more detailed usages, please refer to the [training guide](https://mmdetection.readthedocs.io/en/3.x/user_guides/train.html#train-predefined-models-on-standard-datasets). +For more detailed usages, please refer to the [training guide](https://mmdetection.readthedocs.io/en/latest/user_guides/train.html#train-predefined-models-on-standard-datasets). ## Test and inference @@ -446,4 +446,4 @@ To test the trained model, you can simply run python tools/test.py configs/balloon/mask-rcnn_r50-caffe_fpn_ms-poly-1x_balloon.py work_dirs/mask-rcnn_r50-caffe_fpn_ms-poly-1x_balloon/epoch_12.pth ``` -For more detailed usages, please refer to the [testing guide](https://mmdetection.readthedocs.io/en/3.x/user_guides/test.html). +For more detailed usages, please refer to the [testing guide](https://mmdetection.readthedocs.io/en/latest/user_guides/test.html). diff --git a/docs/en/user_guides/useful_hooks.md b/docs/en/user_guides/useful_hooks.md index 13b6bcf5846..4c30686d68a 100644 --- a/docs/en/user_guides/useful_hooks.md +++ b/docs/en/user_guides/useful_hooks.md @@ -8,7 +8,7 @@ MMDetection and MMEngine provide users with various useful hooks including log h ## MemoryProfilerHook -[Memory profiler hook](https://github.com/open-mmlab/mmdetection/blob/3.x/mmdet/engine/hooks/memory_profiler_hook.py) records memory information including virtual memory, swap memory, and the memory of the current process. This hook helps grasp the memory usage of the system and discover potential memory leak bugs. To use this hook, users should install `memory_profiler` and `psutil` by `pip install memory_profiler psutil` first. +[Memory profiler hook](https://github.com/open-mmlab/mmdetection/blob/main/mmdet/engine/hooks/memory_profiler_hook.py) records memory information including virtual memory, swap memory, and the memory of the current process. This hook helps grasp the memory usage of the system and discover potential memory leak bugs. To use this hook, users should install `memory_profiler` and `psutil` by `pip install memory_profiler psutil` first. ### Usage diff --git a/docs/en/user_guides/useful_tools.md b/docs/en/user_guides/useful_tools.md index 5cce0cb97e6..007d367ec8c 100644 --- a/docs/en/user_guides/useful_tools.md +++ b/docs/en/user_guides/useful_tools.md @@ -308,7 +308,7 @@ comparisons, but double check it before you adopt it in technical reports or pap 1. FLOPs are related to the input shape while parameters are not. The default input shape is (1, 3, 1280, 800). -2. Some operators are not counted into FLOPs like GN and custom operators. Refer to [`mmcv.cnn.get_model_complexity_info()`](https://github.com/open-mmlab/mmcv/blob/dev-3.x/mmcv/cnn/utils/flops_counter.py) for details. +2. Some operators are not counted into FLOPs like GN and custom operators. Refer to [`mmcv.cnn.get_model_complexity_info()`](https://github.com/open-mmlab/mmcv/blob/2.x/mmcv/cnn/utils/flops_counter.py) for details. 3. The FLOPs of two-stage detectors is dependent on the number of proposals. ## Model conversion diff --git a/docs/en/user_guides/visualization.md b/docs/en/user_guides/visualization.md index f0fa8b81498..dade26ed688 100644 --- a/docs/en/user_guides/visualization.md +++ b/docs/en/user_guides/visualization.md @@ -1 +1,91 @@ # Visualization + +Before reading this tutorial, it is recommended to read MMEngine's [Visualization](https://github.com/open-mmlab/mmengine/blob/main/docs/en/advanced_tutorials/visualization.md) documentation to get a first glimpse of the `Visualizer` definition and usage. + +In brief, the [`Visualizer`](mmengine.visualization.Visualizer) is implemented in MMEngine to meet the daily visualization needs, and contains three main functions: + +- Implement common drawing APIs, such as [`draw_bboxes`](mmengine.visualization.Visualizer.draw_bboxes) which implements bounding box drawing functions, [`draw_lines`](mmengine.visualization.Visualizer.draw_lines) implements the line drawing function. +- Support writing visualization results, learning rate curves, loss function curves, and verification accuracy curves to various backends, including local disks and common deep learning training logging tools such as [TensorBoard](https://www.tensorflow.org/tensorboard) and [Wandb](https://wandb.ai/site). +- Support calling anywhere in the code to visualize or record intermediate states of the model during training or testing, such as feature maps and validation results. + +Based on MMEngine's Visualizer, MMDet comes with a variety of pre-built visualization tools that can be used by the user by simply modifying the following configuration files. + +- The `tools/analysis_tools/browse_dataset.py` script provides a dataset visualization function that draws images and corresponding annotations after Data Transforms, as described in [`browse_dataset.py`](useful_tools.md#Visualization). +- MMEngine implements `LoggerHook`, which uses `Visualizer` to write the learning rate, loss and evaluation results to the backend set by `Visualizer`. Therefore, by modifying the `Visualizer` backend in the configuration file, for example to ` TensorBoardVISBackend` or `WandbVISBackend`, you can implement logging to common training logging tools such as `TensorBoard` or `WandB`, thus making it easy for users to use these visualization tools to analyze and monitor the training process. +- The `VisualizerHook` is implemented in MMDet, which uses the `Visualizer` to visualize or store the prediction results of the validation or prediction phase into the backend set by the `Visualizer`, so by modifying the `Visualizer` backend in the configuration file, for example, to ` TensorBoardVISBackend` or `WandbVISBackend`, you can implement storing the predicted images to `TensorBoard` or `Wandb`. + +## Configuration + +Thanks to the use of the registration mechanism, in MMDet we can set the behavior of the `Visualizer` by modifying the configuration file. Usually, we define the default configuration for the visualizer in `configs/_base_/default_runtime.py`, see [configuration tutorial](config.md) for details. + +```Python +vis_backends = [dict(type='LocalVisBackend')] +visualizer = dict( + type='DetLocalVisualizer', + vis_backends=vis_backends, + name='visualizer') +``` + +Based on the above example, we can see that the configuration of `Visualizer` consists of two main parts, namely, the type of `Visualizer` and the visualization backend `vis_backends` it uses. + +- Users can directly use `DetLocalVisualizer` to visualize labels or predictions for support tasks. +- MMDet sets the visualization backend `vis_backend` to the local visualization backend `LocalVisBackend` by default, saving all visualization results and other training information in a local folder. + +## Storage + +MMDet uses the local visualization backend [`LocalVisBackend`](mmengine.visualization.LocalVisBackend) by default, and the model loss, learning rate, model evaluation accuracy and visualization The information stored in `VisualizerHook` and `LoggerHook`, including loss, learning rate, evaluation accuracy will be saved to the `{work_dir}/{config_name}/{time}/{vis_data}` folder by default. In addition, MMDet also supports other common visualization backends, such as `TensorboardVisBackend` and `WandbVisBackend`, and you only need to change the `vis_backends` type in the configuration file to the corresponding visualization backend. For example, you can store data to `TensorBoard` and `Wandb` by simply inserting the following code block into the configuration file. + +```Python +# https://mmengine.readthedocs.io/en/latest/api/visualization.html +_base_.visualizer.vis_backends = [ + dict(type='LocalVisBackend'), # + dict(type='TensorboardVisBackend'), + dict(type='WandbVisBackend'),] +``` + +## Plot + +### Plot the prediction results + +MMDet mainly uses [`DetVisualizationHook`](mmdet.engine.hooks.DetVisualizationHook) to plot the prediction results of validation and test, by default `DetVisualizationHook` is off, and the default configuration is as follows. + +```Python +visualization=dict( # user visualization of validation and test results + type='DetVisualizationHook', + draw=False, + interval=1, + show=False) +``` + +The following table shows the parameters supported by `DetVisualizationHook`. + +| Parameters | Description | +| :--------: | :-----------------------------------------------------------------------------------------------------------: | +| draw | The DetVisualizationHook is turned on and off by the enable parameter, which is the default state. | +| interval | Controls how much iteration to store or display the results of a val or test if VisualizationHook is enabled. | +| show | Controls whether to visualize the results of val or test. | + +If you want to enable `DetVisualizationHook` related functions and configurations during training or testing, you only need to modify the configuration, take `configs/rtmdet/rtmdet_tiny_8xb32-300e_coco.py` as an example, draw annotations and predictions at the same time, and display the images, the configuration can be modified as follows + +```Python +visualization = _base_.default_hooks.visualization +visualization.update(dict(draw=True, show=True)) +``` + +
+ +
+ +The `test.py` procedure is further simplified by providing the `--show` and `--show-dir` parameters to visualize the annotation and prediction results during the test without modifying the configuration. + +```Shell +# Show test results +python tools/test.py configs/rtmdet/rtmdet_tiny_8xb32-300e_coco.py https://download.openmmlab.com/mmdetection/v3.0/rtmdet/rtmdet_tiny_8xb32-300e_coco/rtmdet_tiny_8xb32-300e_coco_20220902_112414-78e30dcc.pth --show + +# Specify where to store the prediction results +python tools/test.py configs/rtmdet/rtmdet_tiny_8xb32-300e_coco.py https://download.openmmlab.com/mmdetection/v3.0/rtmdet/rtmdet_tiny_8xb32-300e_coco/rtmdet_tiny_8xb32-300e_coco_20220902_112414-78e30dcc.pth --show-dir imgs/ +``` + +
+ +
diff --git a/docs/zh_cn/advanced_guides/conventions.md b/docs/zh_cn/advanced_guides/conventions.md index 261f5ed5eb7..9fb1f14c898 100644 --- a/docs/zh_cn/advanced_guides/conventions.md +++ b/docs/zh_cn/advanced_guides/conventions.md @@ -1,4 +1,4 @@ -# 默认约定(待更新) +# 默认约定 如果你想把 MMDetection 修改为自己的项目,请遵循下面的约定。 diff --git a/docs/zh_cn/advanced_guides/customize_dataset.md b/docs/zh_cn/advanced_guides/customize_dataset.md index e2fee435080..e845f37f2db 100644 --- a/docs/zh_cn/advanced_guides/customize_dataset.md +++ b/docs/zh_cn/advanced_guides/customize_dataset.md @@ -174,7 +174,7 @@ model = dict( ] ``` -我们使用这种方式来支持 CityScapes 数据集。脚本在 [cityscapes.py](https://github.com/open-mmlab/mmdetection/blob/3.x/tools/dataset_converters/cityscapes.py) 并且我们提供了微调的 [configs](https://github.com/open-mmlab/mmdetection/blob/3.x/configs/cityscapes). +我们使用这种方式来支持 CityScapes 数据集。脚本在 [cityscapes.py](https://github.com/open-mmlab/mmdetection/blob/main/tools/dataset_converters/cityscapes.py) 并且我们提供了微调的 [configs](https://github.com/open-mmlab/mmdetection/blob/main/configs/cityscapes). **注意** @@ -236,7 +236,7 @@ model = dict( 有些数据集可能会提供如:crowd/difficult/ignored bboxes 标注,那么我们使用 `ignore_flag`来包含它们。 -在得到上述标准的数据标注格式后,可以直接在配置中使用 MMDetection 的 [BaseDetDataset](https://github.com/open-mmlab/mmdetection/blob/3.x/mmdet/datasets/base_det_dataset.py#L13) ,而无需进行转换。 +在得到上述标准的数据标注格式后,可以直接在配置中使用 MMDetection 的 [BaseDetDataset](https://github.com/open-mmlab/mmdetection/blob/main/mmdet/datasets/base_det_dataset.py#L13) ,而无需进行转换。 ### 自定义数据集例子 @@ -351,7 +351,7 @@ test_dataloader = dict( - 在 MMDetection v2.5.0 之前,如果类别为集合时数据集将自动过滤掉不包含 GT 的图片,且没办法通过修改配置将其关闭。这是一种不可取的行为而且会引起混淆,因为当类别不是集合时数据集时,只有在 `filter_empty_gt=True` 以及 `test_mode=False` 的情况下才会过滤掉不包含 GT 的图片。在 MMDetection v2.5.0 之后,我们将图片的过滤以及类别的修改进行解耦,数据集只有在 `filter_cfg=dict(filter_empty_gt=True)` 和 `test_mode=False` 的情况下才会过滤掉不包含 GT 的图片,无论类别是否为集合。设置类别只会影响用于训练的标注类别,用户可以自行决定是否过滤不包含 GT 的图片。 - 直接使用 MMEngine 中的 `BaseDataset` 或者 MMDetection 中的 `BaseDetDataset` 时用户不能通过修改配置来过滤不含 GT 的图片,但是可以通过离线的方式来解决。 -- 当设置数据集中的 `classes` 时,记得修改 `num_classes`。从 v2.9.0 (PR#4508) 之后,我们实现了 [NumClassCheckHook](https://github.com/open-mmlab/mmdetection/blob/3.x/mmdet/engine/hooks/num_class_check_hook.py) 来检查类别数是否一致。 +- 当设置数据集中的 `classes` 时,记得修改 `num_classes`。从 v2.9.0 (PR#4508) 之后,我们实现了 [NumClassCheckHook](https://github.com/open-mmlab/mmdetection/blob/main/mmdet/engine/hooks/num_class_check_hook.py) 来检查类别数是否一致。 ## COCO 全景分割数据集 diff --git a/docs/zh_cn/advanced_guides/customize_losses.md b/docs/zh_cn/advanced_guides/customize_losses.md index e9f0ac83978..07ccccda128 100644 --- a/docs/zh_cn/advanced_guides/customize_losses.md +++ b/docs/zh_cn/advanced_guides/customize_losses.md @@ -39,7 +39,7 @@ train_cfg=dict( ## 微调损失 -微调一个损失主要与步骤 2,4,5 有关,大部分的修改可以在配置文件中指定。这里我们用 [Focal Loss (FL)](https://github.com/open-mmlab/mmdetection/blob/3.x/mmdet/models/losses/focal_loss.py) 作为例子。 +微调一个损失主要与步骤 2,4,5 有关,大部分的修改可以在配置文件中指定。这里我们用 [Focal Loss (FL)](https://github.com/open-mmlab/mmdetection/blob/main/mmdet/models/losses/focal_loss.py) 作为例子。 下面的代码分别是构建 FL 的方法和它的配置文件,他们是一一对应的。 ```python @@ -105,7 +105,7 @@ loss_cls=dict( ## 加权损失(步骤3) -加权损失就是我们逐元素修改损失权重。更具体来说,我们给损失张量乘以一个与他有相同形状的权重张量。所以,损失中不同的元素可以被赋予不同的比例,所以这里叫做逐元素。损失的权重在不同模型中变化很大,而且与上下文相关,但是总的来说主要有两种损失权重:分类损失的 `label_weights` 和边界框的 `bbox_weights`。你可以在相应的头中的 `get_target` 方法中找到他们。这里我们使用 [ATSSHead](https://github.com/open-mmlab/mmdetection/blob/3.x/mmdet/models/dense_heads/atss_head.py#L322) 作为一个例子。它继承了 [AnchorHead](https://github.com/open-mmlab/mmdetection/blob/3.x/mmdet/models/dense_heads/anchor_head.py) ,但是我们重写它的 +加权损失就是我们逐元素修改损失权重。更具体来说,我们给损失张量乘以一个与他有相同形状的权重张量。所以,损失中不同的元素可以被赋予不同的比例,所以这里叫做逐元素。损失的权重在不同模型中变化很大,而且与上下文相关,但是总的来说主要有两种损失权重:分类损失的 `label_weights` 和边界框的 `bbox_weights`。你可以在相应的头中的 `get_target` 方法中找到他们。这里我们使用 [ATSSHead](https://github.com/open-mmlab/mmdetection/blob/main/mmdet/models/dense_heads/atss_head.py#L322) 作为一个例子。它继承了 [AnchorHead](https://github.com/open-mmlab/mmdetection/blob/main/mmdet/models/dense_heads/anchor_head.py) ,但是我们重写它的 `get_targets` 方法来产生不同的 `label_weights` 和 `bbox_weights`。 ``` diff --git a/docs/zh_cn/advanced_guides/customize_runtime.md b/docs/zh_cn/advanced_guides/customize_runtime.md index 1f953eb41ed..d4a19098789 100644 --- a/docs/zh_cn/advanced_guides/customize_runtime.md +++ b/docs/zh_cn/advanced_guides/customize_runtime.md @@ -330,9 +330,9 @@ custom_hooks = [ #### 例子: `NumClassCheckHook` -我们实现了一个名为 [NumClassCheckHook](https://github.com/open-mmlab/mmdetection/blob/dev-3.x/mmdet/engine/hooks/num_class_check_hook.py) 的自定义钩子来检查 `num_classes` 是否在 head 中和 `dataset` 中的 `classes` 的长度相匹配。 +我们实现了一个名为 [NumClassCheckHook](https://github.com/open-mmlab/mmdetection/blob/main/mmdet/engine/hooks/num_class_check_hook.py) 的自定义钩子来检查 `num_classes` 是否在 head 中和 `dataset` 中的 `classes` 的长度相匹配。 -我们在 [default_runtime.py](https://github.com/open-mmlab/mmdetection/blob/dev-3.x/configs/_base_/default_runtime.py) 中设置它。 +我们在 [default_runtime.py](https://github.com/open-mmlab/mmdetection/blob/main/configs/_base_/default_runtime.py) 中设置它。 ```python custom_hooks = [dict(type='NumClassCheckHook')] diff --git a/docs/zh_cn/advanced_guides/customize_transforms.md b/docs/zh_cn/advanced_guides/customize_transforms.md index b51f96b7f2d..aa40717904a 100644 --- a/docs/zh_cn/advanced_guides/customize_transforms.md +++ b/docs/zh_cn/advanced_guides/customize_transforms.md @@ -1,25 +1,26 @@ -# 自定义数据预处理流程(待更新) +# 自定义数据预处理流程 1. 在任意文件里写一个新的流程,例如在 `my_pipeline.py`,它以一个字典作为输入并且输出一个字典: ```python import random - from mmdet.datasets import PIPELINES + from mmcv.transforms import BaseTransform + from mmdet.registry import TRANSFORMS - @PIPELINES.register_module() - class MyTransform: + @TRANSFORMS.register_module() + class MyTransform(BaseTransform): """Add your transform Args: p (float): Probability of shifts. Default 0.5. """ - def __init__(self, p=0.5): - self.p = p + def __init__(self, prob=0.5): + self.prob = prob - def __call__(self, results): - if random.random() > self.p: + def transform(self, results): + if random.random() > self.prob: results['dummy'] = True return results ``` @@ -29,18 +30,13 @@ ```python custom_imports = dict(imports=['path.to.my_pipeline'], allow_failed_imports=False) - img_norm_cfg = dict( - mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True) train_pipeline = [ dict(type='LoadImageFromFile'), dict(type='LoadAnnotations', with_bbox=True), - dict(type='Resize', img_scale=(1333, 800), keep_ratio=True), - dict(type='RandomFlip', flip_ratio=0.5), - dict(type='Normalize', **img_norm_cfg), - dict(type='Pad', size_divisor=32), - dict(type='MyTransform', p=0.2), - dict(type='DefaultFormatBundle'), - dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels']), + dict(type='Resize', scale=(1333, 800), keep_ratio=True), + dict(type='RandomFlip', prob=0.5), + dict(type='MyTransform', prob=0.2), + dict(type='PackDetInputs') ] ``` @@ -48,4 +44,4 @@ 如果想要可视化数据增强处理流程的结果,可以使用 `tools/misc/browse_dataset.py` 直观 地浏览检测数据集(图像和标注信息),或将图像保存到指定目录。 - 使用方法请参考[日志分析](../useful_tools.md) + 使用方法请参考[可视化文档](../user_guides/visualization.md) diff --git a/docs/zh_cn/advanced_guides/how_to.md b/docs/zh_cn/advanced_guides/how_to.md index 64b03ffba17..8fede40cfd3 100644 --- a/docs/zh_cn/advanced_guides/how_to.md +++ b/docs/zh_cn/advanced_guides/how_to.md @@ -34,10 +34,10 @@ model = dict( ### 通过 MMClassification 使用 TIMM 中实现的骨干网络 -由于 MMClassification 提供了 Py**T**orch **Im**age **M**odels (`timm`) 骨干网络的封装,用户也可以通过 MMClassification 直接使用 `timm` 中的骨干网络。假设想将 [`EfficientNet-B1`](https://github.com/open-mmlab/mmdetection/blob/dev-3.x/configs/timm_example/retinanet_timm-efficientnet-b1_fpn_1x_coco.py) 作为 `RetinaNet` 的骨干网络,则配置文件如下。 +由于 MMClassification 提供了 Py**T**orch **Im**age **M**odels (`timm`) 骨干网络的封装,用户也可以通过 MMClassification 直接使用 `timm` 中的骨干网络。假设想将 [`EfficientNet-B1`](../../../configs/timm_example/retinanet_timm-efficientnet-b1_fpn_1x_coco.py) 作为 `RetinaNet` 的骨干网络,则配置文件如下。 ```python -# https://github.com/open-mmlab/mmdetection/blob/master/configs/timm_example/retinanet_timm_efficientnet_b1_fpn_1x_coco.py +# https://github.com/open-mmlab/mmdetection/blob/main/configs/timm_example/retinanet_timm_efficientnet_b1_fpn_1x_coco.py _base_ = [ '../_base_/models/retinanet_r50_fpn.py', '../_base_/datasets/coco_detection.py', diff --git a/docs/zh_cn/conf.py b/docs/zh_cn/conf.py index 1bb57a4a31b..e6878408971 100644 --- a/docs/zh_cn/conf.py +++ b/docs/zh_cn/conf.py @@ -67,7 +67,7 @@ def get_version(): '.md': 'markdown', } -# The master toctree document. +# The main toctree document. master_doc = 'index' # List of patterns, relative to source directory, that match files and diff --git a/docs/zh_cn/get_started.md b/docs/zh_cn/get_started.md index d1898749eed..72be5fc3441 100644 --- a/docs/zh_cn/get_started.md +++ b/docs/zh_cn/get_started.md @@ -44,7 +44,7 @@ conda install pytorch torchvision cpuonly -c pytorch ```shell pip install -U openmim mim install mmengine -mim install "mmcv>=2.0.0rc1" +mim install "mmcv>=2.0.0" ``` **注意:** 在 MMCV-v2.x 中,`mmcv-full` 改名为 `mmcv`,如果你想安装不包含 CUDA 算子精简版,可以通过 `mim install "mmcv-lite>=2.0.0rc1"` 来安装。 @@ -54,8 +54,7 @@ mim install "mmcv>=2.0.0rc1" 方案 a:如果你开发并直接运行 mmdet,从源码安装它: ```shell -git clone https://github.com/open-mmlab/mmdetection.git -b 3.x -# "-b 3.x" 表示切换到 `3.x` 分支。 +git clone https://github.com/open-mmlab/mmdetection.git cd mmdetection pip install -v -e . # "-v" 指详细说明,或更多的输出 @@ -65,7 +64,7 @@ pip install -v -e . 方案 b:如果你将 mmdet 作为依赖或第三方 Python 包,使用 MIM 安装: ```shell -mim install "mmdet>=3.0.0rc0" +mim install mmdet ``` ## 验证安装 @@ -137,7 +136,7 @@ MMCV 包含 C++ 和 CUDA 扩展,因此其对 PyTorch 的依赖比较复杂。M 例如,下述命令将会安装基于 PyTorch 1.12.x 和 CUDA 11.6 编译的 MMCV。 ```shell -pip install "mmcv>=2.0.0rc1" -f https://download.openmmlab.com/mmcv/dist/cu116/torch1.12.0/index.html +pip install "mmcv>=2.0.0" -f https://download.openmmlab.com/mmcv/dist/cu116/torch1.12.0/index.html ``` #### 在 CPU 环境中安装 @@ -177,13 +176,13 @@ MMDetection 可以在 CPU 环境中构建。在 CPU 模式下,可以进行模 ```shell !pip3 install openmim !mim install mmengine -!mim install "mmcv>=2.0.0rc1,<2.1.0" +!mim install "mmcv>=2.0.0,<2.1.0" ``` **步骤 2.** 使用源码安装 MMDetection。 ```shell -!git clone https://github.com/open-mmlab/mmdetection.git -b 3.x +!git clone https://github.com/open-mmlab/mmdetection.git %cd mmdetection !pip install -e . ``` @@ -193,7 +192,7 @@ MMDetection 可以在 CPU 环境中构建。在 CPU 模式下,可以进行模 ```python import mmdet print(mmdet.__version__) -# 预期输出:3.0.0rc0 或其他版本号 +# 预期输出:3.0.0 或其他版本号 ``` ```{note} diff --git a/docs/zh_cn/index.rst b/docs/zh_cn/index.rst index 280e1ecacf6..58a4d8a52d3 100644 --- a/docs/zh_cn/index.rst +++ b/docs/zh_cn/index.rst @@ -24,7 +24,7 @@ Welcome to MMDetection's documentation! :maxdepth: 1 :caption: 迁移版本 - migration.md + migration/migration.md .. toctree:: :maxdepth: 1 diff --git a/docs/zh_cn/migration/api_and_registry_migration.md b/docs/zh_cn/migration/api_and_registry_migration.md new file mode 100644 index 00000000000..66e1c340806 --- /dev/null +++ b/docs/zh_cn/migration/api_and_registry_migration.md @@ -0,0 +1 @@ +# 将 API 和注册器从 MMDetection 2.x 迁移至 3.x diff --git a/docs/zh_cn/migration/config_migration.md b/docs/zh_cn/migration/config_migration.md new file mode 100644 index 00000000000..c4f9c8e3d2d --- /dev/null +++ b/docs/zh_cn/migration/config_migration.md @@ -0,0 +1,814 @@ +# 将配置文件从 MMDetection 2.x 迁移至 3.x + +MMDetection 3.x 的配置文件与 2.x 相比有较大变化,这篇文档将介绍如何将 2.x 的配置文件迁移到 3.x。 + +在前面的[配置文件教程](../user_guides/config.md)中,我们以 Mask R-CNN 为例介绍了 MMDetection 3.x 的配置文件结构,这里我们将按同样的结构介绍如何将 2.x 的配置文件迁移至 3.x。 + +## 模型配置 + +模型的配置与 2.x 相比并没有太大变化,对于模型的 backbone,neck,head,以及 train_cfg 和 test_cfg,它们的参数与 2.x 版本的参数保持一致。 + +不同的是,我们在 3.x 版本的模型中新增了 `DataPreprocessor` 模块。 +`DataPreprocessor` 模块的配置位于 `model.data_preprocessor` 中,它用于对输入数据进行预处理,例如对输入图像进行归一化,将不同大小的图片进行 padding 从而组成 batch,将图像从内存中读取到显存中等。这部分配置取代了原本存在于 train_pipeline 和 test_pipeline 中的 `Normalize` 和 `Pad`。 + + + + + + + + + +
原配置 + +```python +# 图像归一化参数 +img_norm_cfg = dict( + mean=[123.675, 116.28, 103.53], + std=[58.395, 57.12, 57.375], + to_rgb=True) +pipeline=[ + ..., + dict(type='Normalize', **img_norm_cfg), + dict(type='Pad', size_divisor=32), # 图像 padding 到 32 的倍数 + ... +] +``` + +
新配置 + +```python +model = dict( + data_preprocessor=dict( + type='DetDataPreprocessor', + # 图像归一化参数 + mean=[123.675, 116.28, 103.53], + std=[58.395, 57.12, 57.375], + bgr_to_rgb=True, + # 图像 padding 参数 + pad_mask=True, # 在实例分割中,需要将 mask 也进行 padding + pad_size_divisor=32) # 图像 padding 到 32 的倍数 +) +``` + +
+ +## 数据集和评测器配置 + +数据集和评测部分的配置相比 2.x 版本有较大的变化。我们将从 Dataloader 和 Dataset,Data transform pipeline,以及评测器配置三个方面介绍如何将 2.x 版本的配置迁移到 3.x 版本。 + +### Dataloader 和 Dataset 配置 + +在新版本中,我们将数据加载的设置与 PyTorch 官方的 DataLoader 保持一致,这样可以使用户更容易理解和上手。 +我们将训练、验证和测试的数据加载设置分别放在 `train_dataloader`,`val_dataloader` 和 `test_dataloader` 中,用户可以分别对这些 dataloader 设置不同的参数,其输入参数与 [PyTorch 的 Dataloader](https://pytorch.org/docs/stable/data.html?highlight=dataloader#torch.utils.data.DataLoader) 所需要的参数基本一致。 + +通过这种方式,我们将 2.x 版本中不可配置的 `sampler`,`batch_sampler`,`persistent_workers` 等参数都放到了配置文件中,使得用户可以更加灵活地设置数据加载的参数。 + +用户可以通过 `train_dataloader.dataset`,`val_dataloader.dataset` 和 `test_dataloader.dataset` 来设置数据集的配置,它们分别对应 2.x 版本中的 `data.train`,`data.val` 和 `data.test`。 + + + + + + + + + +
原配置 + +```python +data = dict( + samples_per_gpu=2, + workers_per_gpu=2, + train=dict( + type=dataset_type, + ann_file=data_root + 'annotations/instances_train2017.json', + img_prefix=data_root + 'train2017/', + pipeline=train_pipeline), + val=dict( + type=dataset_type, + ann_file=data_root + 'annotations/instances_val2017.json', + img_prefix=data_root + 'val2017/', + pipeline=test_pipeline), + test=dict( + type=dataset_type, + ann_file=data_root + 'annotations/instances_val2017.json', + img_prefix=data_root + 'val2017/', + pipeline=test_pipeline)) +``` + +
新配置 + +```python +train_dataloader = dict( + batch_size=2, + num_workers=2, + persistent_workers=True, # 避免每次迭代后 dataloader 重新创建子进程 + sampler=dict(type='DefaultSampler', shuffle=True), # 默认的 sampler,同时支持分布式训练和非分布式训练 + batch_sampler=dict(type='AspectRatioBatchSampler'), # 默认的 batch_sampler,用于保证 batch 中的图片具有相似的长宽比,从而可以更好地利用显存 + dataset=dict( + type=dataset_type, + data_root=data_root, + ann_file='annotations/instances_train2017.json', + data_prefix=dict(img='train2017/'), + filter_cfg=dict(filter_empty_gt=True, min_size=32), + pipeline=train_pipeline)) +# 在 3.x 版本中可以独立配置验证和测试的 dataloader +val_dataloader = dict( + batch_size=1, + num_workers=2, + persistent_workers=True, + drop_last=False, + sampler=dict(type='DefaultSampler', shuffle=False), + dataset=dict( + type=dataset_type, + data_root=data_root, + ann_file='annotations/instances_val2017.json', + data_prefix=dict(img='val2017/'), + test_mode=True, + pipeline=test_pipeline)) +test_dataloader = val_dataloader # 测试 dataloader 的配置与验证 dataloader 的配置相同,这里省略 +``` + +
+ +### Data transform pipeline 配置 + +上文中提到,我们将图像 normalize 和 padding 的配置从 `train_pipeline` 和 `test_pipeline` 中独立出来,放到了 `model.data_preprocessor` 中,因此在 3.x 版本的 pipeline 中,我们不再需要 `Normalize` 和 `Pad` 这两个 transform。 + +同时,我们也对负责数据格式打包的 transform 进行了重构,将 `Collect` 和 `DefaultFormatBundle` 这两个 transform 合并为了 `PackDetInputs`,它负责将 data pipeline 中的数据打包成模型的输入格式,关于输入格式的转换,详见[数据流文档](../advanced_guides/data_flow.md)。 + +下面以 Mask R-CNN 1x 的 train_pipeline 为例,介绍如何将 2.x 版本的配置迁移到 3.x 版本: + + + + + + + + + +
原配置 + +```python +img_norm_cfg = dict( + mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True) +train_pipeline = [ + dict(type='LoadImageFromFile'), + dict(type='LoadAnnotations', with_bbox=True), + dict(type='Resize', img_scale=(1333, 800), keep_ratio=True), + dict(type='RandomFlip', flip_ratio=0.5), + dict(type='Normalize', **img_norm_cfg), + dict(type='Pad', size_divisor=32), + dict(type='DefaultFormatBundle'), + dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels']), +] +``` + +
新配置 + +```python +train_pipeline = [ + dict(type='LoadImageFromFile'), + dict(type='LoadAnnotations', with_bbox=True), + dict(type='Resize', scale=(1333, 800), keep_ratio=True), + dict(type='RandomFlip', prob=0.5), + dict(type='PackDetInputs') +] +``` + +
+ +对于 test_pipeline,除了将 `Normalize` 和 `Pad` 这两个 transform 去掉之外,我们也将测试时的数据增强(TTA)与普通的测试流程分开,移除了 `MultiScaleFlipAug`。关于新版的 TTA 如何使用,详见[TTA 文档](../advanced_guides/tta.md)。 + +下面同样以 Mask R-CNN 1x 的 test_pipeline 为例,介绍如何将 2.x 版本的配置迁移到 3.x 版本: + + + + + + + + + +
原配置 + +```python +test_pipeline = [ + dict(type='LoadImageFromFile'), + dict( + type='MultiScaleFlipAug', + img_scale=(1333, 800), + flip=False, + transforms=[ + dict(type='Resize', keep_ratio=True), + dict(type='RandomFlip'), + dict(type='Normalize', **img_norm_cfg), + dict(type='Pad', size_divisor=32), + dict(type='ImageToTensor', keys=['img']), + dict(type='Collect', keys=['img']), + ]) +] +``` + +
新配置 + +```python +test_pipeline = [ + dict(type='LoadImageFromFile'), + dict(type='Resize', scale=(1333, 800), keep_ratio=True), + dict( + type='PackDetInputs', + meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape', + 'scale_factor')) +] +``` + +
+ +除此之外,我们还对一些数据增强进行了重构,下表列出了 2.x 版本中的 transform 与 3.x 版本中的 transform 的对应关系: + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
名称原配置新配置
Resize + +```python +dict(type='Resize', + img_scale=(1333, 800), + keep_ratio=True) +``` + + + +```python +dict(type='Resize', + scale=(1333, 800), + keep_ratio=True) +``` + +
RandomResize + +```python +dict( + type='Resize', + img_scale=[ + (1333, 640), (1333, 800)], + multiscale_mode='range', + keep_ratio=True) +``` + + + +```python +dict( + type='RandomResize', + scale=[ + (1333, 640), (1333, 800)], + keep_ratio=True) +``` + +
RandomChoiceResize + +```python +dict( + type='Resize', + img_scale=[ + (1333, 640), (1333, 672), + (1333, 704), (1333, 736), + (1333, 768), (1333, 800)], + multiscale_mode='value', + keep_ratio=True) +``` + + + +```python +dict( + type='RandomChoiceResize', + scales=[ + (1333, 640), (1333, 672), + (1333, 704), (1333, 736), + (1333, 768), (1333, 800)], + keep_ratio=True) +``` + +
RandomFlip + +```python +dict(type='RandomFlip', + flip_ratio=0.5) +``` + + + +```python +dict(type='RandomFlip', + prob=0.5) +``` + +
+ +### 评测器配置 + +在 3.x 版本中,模型精度评测不再与数据集绑定,而是通过评测器(Evaluator)来完成。 +评测器配置分为 val_evaluator 和 test_evaluator 两部分,其中 val_evaluator 用于验证集评测,test_evaluator 用于测试集评测,对应 2.x 版本中的 evaluation 字段。 +下表列出了 2.x 版本与 3.x 版本中的评测器的对应关系: + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
评测指标名称原配置新配置
COCO + +```python +data = dict( + val=dict( + type='CocoDataset', + ann_file=data_root + 'annotations/instances_val2017.json')) +evaluation = dict(metric=['bbox', 'segm']) +``` + + + +```python +val_evaluator = dict( + type='CocoMetric', + ann_file=data_root + 'annotations/instances_val2017.json', + metric=['bbox', 'segm'], + format_only=False) +``` + +
Pascal VOC + +```python +data = dict( + val=dict( + type=dataset_type, + ann_file=data_root + 'VOC2007/ImageSets/Main/test.txt')) +evaluation = dict(metric='mAP') +``` + + + +```python +val_evaluator = dict( + type='VOCMetric', + metric='mAP', + eval_mode='11points') +``` + +
OpenImages + +```python +data = dict( + val=dict( + type='OpenImagesDataset', + ann_file=data_root + 'annotations/validation-annotations-bbox.csv', + img_prefix=data_root + 'OpenImages/validation/', + label_file=data_root + 'annotations/class-descriptions-boxable.csv', + hierarchy_file=data_root + + 'annotations/bbox_labels_600_hierarchy.json', + meta_file=data_root + 'annotations/validation-image-metas.pkl', + image_level_ann_file=data_root + + 'annotations/validation-annotations-human-imagelabels-boxable.csv')) +evaluation = dict(interval=1, metric='mAP') +``` + + + +```python +val_evaluator = dict( + type='OpenImagesMetric', + iou_thrs=0.5, + ioa_thrs=0.5, + use_group_of=True, + get_supercategory=True) +``` + +
CityScapes + +```python +data = dict( + val=dict( + type='CityScapesDataset', + ann_file=data_root + + 'annotations/instancesonly_filtered_gtFine_val.json', + img_prefix=data_root + 'leftImg8bit/val/', + pipeline=test_pipeline)) +evaluation = dict(metric=['bbox', 'segm']) +``` + + + +```python +val_evaluator = [ + dict( + type='CocoMetric', + ann_file=data_root + + 'annotations/instancesonly_filtered_gtFine_val.json', + metric=['bbox', 'segm']), + dict( + type='CityScapesMetric', + ann_file=data_root + + 'annotations/instancesonly_filtered_gtFine_val.json', + seg_prefix=data_root + '/gtFine/val', + outfile_prefix='./work_dirs/cityscapes_metric/instance') +] +``` + +
+ +## 训练和测试的配置 + + + + + + + + + +
原配置 + +```python +runner = dict( + type='EpochBasedRunner', # 训练循环的类型 + max_epochs=12) # 最大训练轮次 +evaluation = dict(interval=2) # 验证间隔。每 2 个 epoch 验证一次 +``` + +
新配置 + +```python +train_cfg = dict( + type='EpochBasedTrainLoop', # 训练循环的类型,请参考 https://github.com/open-mmlab/mmengine/blob/main/mmengine/runner/loops.py + max_epochs=12, # 最大训练轮次 + val_interval=2) # 验证间隔。每 2 个 epoch 验证一次 +val_cfg = dict(type='ValLoop') # 验证循环的类型 +test_cfg = dict(type='TestLoop') # 测试循环的类型 +``` + +
+ +## 优化相关配置 + +优化器以及梯度裁剪的配置都移至 optim_wrapper 字段中。下表列出了 2.x 版本与 3.x 版本中的优化器配置的对应关系: + + + + + + + + + +
原配置 + +```python +optimizer = dict( + type='SGD', # 随机梯度下降优化器 + lr=0.02, # 基础学习率 + momentum=0.9, # 带动量的随机梯度下降 + weight_decay=0.0001) # 权重衰减 +optimizer_config = dict(grad_clip=None) # 梯度裁剪的配置,设置为 None 关闭梯度裁剪 +``` + +
新配置 + +```python +optim_wrapper = dict( # 优化器封装的配置 + type='OptimWrapper', # 优化器封装的类型。可以切换至 AmpOptimWrapper 来启用混合精度训练 + optimizer=dict( # 优化器配置。支持 PyTorch 的各种优化器。请参考 https://pytorch.org/docs/stable/optim.html#algorithms + type='SGD', # 随机梯度下降优化器 + lr=0.02, # 基础学习率 + momentum=0.9, # 带动量的随机梯度下降 + weight_decay=0.0001), # 权重衰减 + clip_grad=None, # 梯度裁剪的配置,设置为 None 关闭梯度裁剪。使用方法请见 https://mmengine.readthedocs.io/en/latest/tutorials/optimizer.html + ) +``` + +
+ +学习率的配置也从 lr_config 字段中移至 param_scheduler 字段中。param_scheduler 的配置更贴近 PyTorch 的学习率调整策略,更加灵活。下表列出了 2.x 版本与 3.x 版本中的学习率配置的对应关系: + + + + + + + + + +
原配置 + +```python +lr_config = dict( + policy='step', # 在训练过程中使用 multi step 学习率策略 + warmup='linear', # 使用线性学习率预热 + warmup_iters=500, # 到第 500 个 iteration 结束预热 + warmup_ratio=0.001, # 学习率预热的系数 + step=[8, 11], # 在哪几个 epoch 进行学习率衰减 + gamma=0.1) # 学习率衰减系数 +``` + +
新配置 + +```python +param_scheduler = [ + dict( + type='LinearLR', # 使用线性学习率预热 + start_factor=0.001, # 学习率预热的系数 + by_epoch=False, # 按 iteration 更新预热学习率 + begin=0, # 从第一个 iteration 开始 + end=500), # 到第 500 个 iteration 结束 + dict( + type='MultiStepLR', # 在训练过程中使用 multi step 学习率策略 + by_epoch=True, # 按 epoch 更新学习率 + begin=0, # 从第一个 epoch 开始 + end=12, # 到第 12 个 epoch 结束 + milestones=[8, 11], # 在哪几个 epoch 进行学习率衰减 + gamma=0.1) # 学习率衰减系数 +] +``` + +
+ +关于其他的学习率调整策略的迁移,请参考 MMEngine 的[学习率迁移文档](https://mmengine.readthedocs.io/zh_CN/latest/migration/param_scheduler.html)。 + +## 其他配置的迁移 + +### 保存 checkpoint 的配置 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
功能原配置新配置
设置保存间隔 + +```python +checkpoint_config = dict( + interval=1) +``` + + + +```python +default_hooks = dict( + checkpoint=dict( + type='CheckpointHook', + interval=1)) +``` + +
保存最佳模型 + +```python +evaluation = dict( + save_best='auto') +``` + + + +```python +default_hooks = dict( + checkpoint=dict( + type='CheckpointHook', + save_best='auto')) +``` + +
只保留最新的几个模型 + +```python +checkpoint_config = dict( + max_keep_ckpts=3) +``` + + + +```python +default_hooks = dict( + checkpoint=dict( + type='CheckpointHook', + max_keep_ckpts=3)) +``` + +
+ +### 日志的配置 + +3.x 版本中,日志的打印和可视化由 MMEngine 中的 logger 和 visualizer 分别完成。下表列出了 2.x 版本与 3.x 版本中的日志配置的对应关系: + + + + + + + + + + + + + + + + + + + + + + + + +
功能原配置新配置
设置日志打印间隔 + +```python +log_config = dict( + interval=50) +``` + + + +```python +default_hooks = dict( + logger=dict( + type='LoggerHook', + interval=50)) +# 可选: 配置日志打印数值的平滑窗口大小 +log_processor = dict( + type='LogProcessor', + window_size=50) +``` + +
使用 TensorBoard 或 WandB 可视化日志 + +```python +log_config = dict( + interval=50, + hooks=[ + dict(type='TextLoggerHook'), + dict(type='TensorboardLoggerHook'), + dict(type='MMDetWandbHook', + init_kwargs={ + 'project': 'mmdetection', + 'group': 'maskrcnn-r50-fpn-1x-coco' + }, + interval=50, + log_checkpoint=True, + log_checkpoint_metadata=True, + num_eval_images=100) + ]) +``` + + + +```python +vis_backends = [ + dict(type='LocalVisBackend'), + dict(type='TensorboardVisBackend'), + dict(type='WandbVisBackend', + init_kwargs={ + 'project': 'mmdetection', + 'group': 'maskrcnn-r50-fpn-1x-coco' + }) +] +visualizer = dict( + type='DetLocalVisualizer', vis_backends=vis_backends, name='visualizer') +``` + +
+ +关于可视化相关的教程,请参考 MMDetection 的[可视化教程](../user_guides/visualization.md)。 + +### Runtime 的配置 + +3.x 版本中 runtime 的配置字段有所调整,具体的对应关系如下: + + + + + + + + + + + + + + + + +
原配置新配置
+ +```python +cudnn_benchmark = False +opencv_num_threads = 0 +mp_start_method = 'fork' +dist_params = dict(backend='nccl') +log_level = 'INFO' +load_from = None +resume_from = None + + +``` + + + +```python +env_cfg = dict( + cudnn_benchmark=False, + mp_cfg=dict(mp_start_method='fork', + opencv_num_threads=0), + dist_cfg=dict(backend='nccl')) +log_level = 'INFO' +load_from = None +resume = False +``` + +
diff --git a/docs/zh_cn/migration/dataset_migration.md b/docs/zh_cn/migration/dataset_migration.md new file mode 100644 index 00000000000..c379b9f1b7b --- /dev/null +++ b/docs/zh_cn/migration/dataset_migration.md @@ -0,0 +1 @@ +# 将数据集从 MMDetection 2.x 迁移至 3.x diff --git a/docs/zh_cn/migration/migration.md b/docs/zh_cn/migration/migration.md new file mode 100644 index 00000000000..d706856fa82 --- /dev/null +++ b/docs/zh_cn/migration/migration.md @@ -0,0 +1,12 @@ +# 从 MMDetection 2.x 迁移至 3.x + +MMDetection 3.x 版本是一个重大更新,包含了许多 API 和配置文件的变化。本文档旨在帮助用户从 MMDetection 2.x 版本迁移到 3.x 版本。 +我们将迁移指南分为以下几个部分: + +- [配置文件迁移](./config_migration.md) +- [API 和 Registry 迁移](./api_and_registry_migration.md) +- [数据集迁移](./dataset_migration.md) +- [模型迁移](./model_migration.md) +- [常见问题](./migration_faq.md) + +如果您在迁移过程中遇到任何问题,欢迎在 issue 中提出。我们也欢迎您为本文档做出贡献。 diff --git a/docs/zh_cn/migration/migration_faq.md b/docs/zh_cn/migration/migration_faq.md new file mode 100644 index 00000000000..208a138b25d --- /dev/null +++ b/docs/zh_cn/migration/migration_faq.md @@ -0,0 +1 @@ +# 迁移 FAQ diff --git a/docs/zh_cn/migration/model_migration.md b/docs/zh_cn/migration/model_migration.md new file mode 100644 index 00000000000..d7992440228 --- /dev/null +++ b/docs/zh_cn/migration/model_migration.md @@ -0,0 +1 @@ +# 将模型从 MMDetection 2.x 迁移至 3.x diff --git a/docs/zh_cn/model_zoo.md b/docs/zh_cn/model_zoo.md index afa74505861..b5376152d9c 100644 --- a/docs/zh_cn/model_zoo.md +++ b/docs/zh_cn/model_zoo.md @@ -10,7 +10,7 @@ - 我们使用分布式训练。 - 所有 pytorch-style 的 ImageNet 预训练主干网络来自 PyTorch 的模型库,caffe-style 的预训练主干网络来自 detectron2 最新开源的模型。 - 为了与其他代码库公平比较,文档中所写的 GPU 内存是8个 GPU 的 `torch.cuda.max_memory_allocated()` 的最大值,此值通常小于 nvidia-smi 显示的值。 -- 我们以网络 forward 和后处理的时间加和作为推理时间,不包含数据加载时间。所有结果通过 [benchmark.py](https://github.com/open-mmlab/mmdetection/blob/master/tools/analysis_tools/benchmark.py) 脚本计算所得。该脚本会计算推理 2000 张图像的平均时间。 +- 我们以网络 forward 和后处理的时间加和作为推理时间,不包含数据加载时间。所有结果通过 [benchmark.py](https://github.com/open-mmlab/mmdetection/blob/main/tools/analysis_tools/benchmark.py) 脚本计算所得。该脚本会计算推理 2000 张图像的平均时间。 ## ImageNet 预训练模型 @@ -37,223 +37,223 @@ MMdetection 常用到的主干网络细节如下表所示: ### RPN -请参考 [RPN](https://github.com/open-mmlab/mmdetection/blob/master/configs/rpn)。 +请参考 [RPN](https://github.com/open-mmlab/mmdetection/blob/main/configs/rpn)。 ### Faster R-CNN -请参考 [Faster R-CNN](https://github.com/open-mmlab/mmdetection/blob/master/configs/faster_rcnn)。 +请参考 [Faster R-CNN](https://github.com/open-mmlab/mmdetection/blob/main/configs/faster_rcnn)。 ### Mask R-CNN -请参考 [Mask R-CNN](https://github.com/open-mmlab/mmdetection/blob/master/configs/mask_rcnn)。 +请参考 [Mask R-CNN](https://github.com/open-mmlab/mmdetection/blob/main/configs/mask_rcnn)。 ### Fast R-CNN (使用提前计算的 proposals) -请参考 [Fast R-CNN](https://github.com/open-mmlab/mmdetection/blob/master/configs/fast_rcnn)。 +请参考 [Fast R-CNN](https://github.com/open-mmlab/mmdetection/blob/main/configs/fast_rcnn)。 ### RetinaNet -请参考 [RetinaNet](https://github.com/open-mmlab/mmdetection/blob/master/configs/retinanet)。 +请参考 [RetinaNet](https://github.com/open-mmlab/mmdetection/blob/main/configs/retinanet)。 ### Cascade R-CNN and Cascade Mask R-CNN -请参考 [Cascade R-CNN](https://github.com/open-mmlab/mmdetection/blob/master/configs/cascade_rcnn)。 +请参考 [Cascade R-CNN](https://github.com/open-mmlab/mmdetection/blob/main/configs/cascade_rcnn)。 ### Hybrid Task Cascade (HTC) -请参考 [HTC](https://github.com/open-mmlab/mmdetection/blob/master/configs/htc)。 +请参考 [HTC](https://github.com/open-mmlab/mmdetection/blob/main/configs/htc)。 ### SSD -请参考 [SSD](https://github.com/open-mmlab/mmdetection/blob/master/configs/ssd)。 +请参考 [SSD](https://github.com/open-mmlab/mmdetection/blob/main/configs/ssd)。 ### Group Normalization (GN) -请参考 [Group Normalization](https://github.com/open-mmlab/mmdetection/blob/master/configs/gn)。 +请参考 [Group Normalization](https://github.com/open-mmlab/mmdetection/blob/main/configs/gn)。 ### Weight Standardization -请参考 [Weight Standardization](https://github.com/open-mmlab/mmdetection/blob/master/configs/gn+ws)。 +请参考 [Weight Standardization](https://github.com/open-mmlab/mmdetection/blob/main/configs/gn+ws)。 ### Deformable Convolution v2 -请参考 [Deformable Convolutional Networks](https://github.com/open-mmlab/mmdetection/blob/master/configs/dcn)。 +请参考 [Deformable Convolutional Networks](https://github.com/open-mmlab/mmdetection/blob/main/configs/dcn)。 ### CARAFE: Content-Aware ReAssembly of FEatures -请参考 [CARAFE](https://github.com/open-mmlab/mmdetection/blob/master/configs/carafe)。 +请参考 [CARAFE](https://github.com/open-mmlab/mmdetection/blob/main/configs/carafe)。 ### Instaboost -请参考 [Instaboost](https://github.com/open-mmlab/mmdetection/blob/master/configs/instaboost)。 +请参考 [Instaboost](https://github.com/open-mmlab/mmdetection/blob/main/configs/instaboost)。 ### Libra R-CNN -请参考 [Libra R-CNN](https://github.com/open-mmlab/mmdetection/blob/master/configs/libra_rcnn)。 +请参考 [Libra R-CNN](https://github.com/open-mmlab/mmdetection/blob/main/configs/libra_rcnn)。 ### Guided Anchoring -请参考 [Guided Anchoring](https://github.com/open-mmlab/mmdetection/blob/master/configs/guided_anchoring)。 +请参考 [Guided Anchoring](https://github.com/open-mmlab/mmdetection/blob/main/configs/guided_anchoring)。 ### FCOS -请参考 [FCOS](https://github.com/open-mmlab/mmdetection/blob/master/configs/fcos)。 +请参考 [FCOS](https://github.com/open-mmlab/mmdetection/blob/main/configs/fcos)。 ### FoveaBox -请参考 [FoveaBox](https://github.com/open-mmlab/mmdetection/blob/master/configs/foveabox)。 +请参考 [FoveaBox](https://github.com/open-mmlab/mmdetection/blob/main/configs/foveabox)。 ### RepPoints -请参考 [RepPoints](https://github.com/open-mmlab/mmdetection/blob/master/configs/reppoints)。 +请参考 [RepPoints](https://github.com/open-mmlab/mmdetection/blob/main/configs/reppoints)。 ### FreeAnchor -请参考 [FreeAnchor](https://github.com/open-mmlab/mmdetection/blob/master/configs/free_anchor)。 +请参考 [FreeAnchor](https://github.com/open-mmlab/mmdetection/blob/main/configs/free_anchor)。 ### Grid R-CNN (plus) -请参考 [Grid R-CNN](https://github.com/open-mmlab/mmdetection/blob/master/configs/grid_rcnn)。 +请参考 [Grid R-CNN](https://github.com/open-mmlab/mmdetection/blob/main/configs/grid_rcnn)。 ### GHM -请参考 [GHM](https://github.com/open-mmlab/mmdetection/blob/master/configs/ghm)。 +请参考 [GHM](https://github.com/open-mmlab/mmdetection/blob/main/configs/ghm)。 ### GCNet -请参考 [GCNet](https://github.com/open-mmlab/mmdetection/blob/master/configs/gcnet)。 +请参考 [GCNet](https://github.com/open-mmlab/mmdetection/blob/main/configs/gcnet)。 ### HRNet -请参考 [HRNet](https://github.com/open-mmlab/mmdetection/blob/master/configs/hrnet)。 +请参考 [HRNet](https://github.com/open-mmlab/mmdetection/blob/main/configs/hrnet)。 ### Mask Scoring R-CNN -请参考 [Mask Scoring R-CNN](https://github.com/open-mmlab/mmdetection/blob/master/configs/ms_rcnn)。 +请参考 [Mask Scoring R-CNN](https://github.com/open-mmlab/mmdetection/blob/main/configs/ms_rcnn)。 ### Train from Scratch -请参考 [Rethinking ImageNet Pre-training](https://github.com/open-mmlab/mmdetection/blob/master/configs/scratch)。 +请参考 [Rethinking ImageNet Pre-training](https://github.com/open-mmlab/mmdetection/blob/main/configs/scratch)。 ### NAS-FPN -请参考 [NAS-FPN](https://github.com/open-mmlab/mmdetection/blob/master/configs/nas_fpn)。 +请参考 [NAS-FPN](https://github.com/open-mmlab/mmdetection/blob/main/configs/nas_fpn)。 ### ATSS -请参考 [ATSS](https://github.com/open-mmlab/mmdetection/blob/master/configs/atss)。 +请参考 [ATSS](https://github.com/open-mmlab/mmdetection/blob/main/configs/atss)。 ### FSAF -请参考 [FSAF](https://github.com/open-mmlab/mmdetection/blob/master/configs/fsaf)。 +请参考 [FSAF](https://github.com/open-mmlab/mmdetection/blob/main/configs/fsaf)。 ### RegNetX -请参考 [RegNet](https://github.com/open-mmlab/mmdetection/blob/master/configs/regnet)。 +请参考 [RegNet](https://github.com/open-mmlab/mmdetection/blob/main/configs/regnet)。 ### Res2Net -请参考 [Res2Net](https://github.com/open-mmlab/mmdetection/blob/master/configs/res2net)。 +请参考 [Res2Net](https://github.com/open-mmlab/mmdetection/blob/main/configs/res2net)。 ### GRoIE -请参考 [GRoIE](https://github.com/open-mmlab/mmdetection/blob/master/configs/groie)。 +请参考 [GRoIE](https://github.com/open-mmlab/mmdetection/blob/main/configs/groie)。 ### Dynamic R-CNN -请参考 [Dynamic R-CNN](https://github.com/open-mmlab/mmdetection/blob/master/configs/dynamic_rcnn)。 +请参考 [Dynamic R-CNN](https://github.com/open-mmlab/mmdetection/blob/main/configs/dynamic_rcnn)。 ### PointRend -请参考 [PointRend](https://github.com/open-mmlab/mmdetection/blob/master/configs/point_rend)。 +请参考 [PointRend](https://github.com/open-mmlab/mmdetection/blob/main/configs/point_rend)。 ### DetectoRS -请参考 [DetectoRS](https://github.com/open-mmlab/mmdetection/blob/master/configs/detectors)。 +请参考 [DetectoRS](https://github.com/open-mmlab/mmdetection/blob/main/configs/detectors)。 ### Generalized Focal Loss -请参考 [Generalized Focal Loss](https://github.com/open-mmlab/mmdetection/blob/master/configs/gfl)。 +请参考 [Generalized Focal Loss](https://github.com/open-mmlab/mmdetection/blob/main/configs/gfl)。 ### CornerNet -请参考 [CornerNet](https://github.com/open-mmlab/mmdetection/blob/master/configs/cornernet)。 +请参考 [CornerNet](https://github.com/open-mmlab/mmdetection/blob/main/configs/cornernet)。 ### YOLOv3 -请参考 [YOLOv3](https://github.com/open-mmlab/mmdetection/blob/master/configs/yolo)。 +请参考 [YOLOv3](https://github.com/open-mmlab/mmdetection/blob/main/configs/yolo)。 ### PAA -请参考 [PAA](https://github.com/open-mmlab/mmdetection/blob/master/configs/paa)。 +请参考 [PAA](https://github.com/open-mmlab/mmdetection/blob/main/configs/paa)。 ### SABL -请参考 [SABL](https://github.com/open-mmlab/mmdetection/blob/master/configs/sabl)。 +请参考 [SABL](https://github.com/open-mmlab/mmdetection/blob/main/configs/sabl)。 ### CentripetalNet -请参考 [CentripetalNet](https://github.com/open-mmlab/mmdetection/blob/master/configs/centripetalnet)。 +请参考 [CentripetalNet](https://github.com/open-mmlab/mmdetection/blob/main/configs/centripetalnet)。 ### ResNeSt -请参考 [ResNeSt](https://github.com/open-mmlab/mmdetection/blob/master/configs/resnest)。 +请参考 [ResNeSt](https://github.com/open-mmlab/mmdetection/blob/main/configs/resnest)。 ### DETR -请参考 [DETR](https://github.com/open-mmlab/mmdetection/blob/master/configs/detr)。 +请参考 [DETR](https://github.com/open-mmlab/mmdetection/blob/main/configs/detr)。 ### Deformable DETR -请参考 [Deformable DETR](https://github.com/open-mmlab/mmdetection/blob/master/configs/deformable_detr)。 +请参考 [Deformable DETR](https://github.com/open-mmlab/mmdetection/blob/main/configs/deformable_detr)。 ### AutoAssign -请参考 [AutoAssign](https://github.com/open-mmlab/mmdetection/blob/master/configs/autoassign)。 +请参考 [AutoAssign](https://github.com/open-mmlab/mmdetection/blob/main/configs/autoassign)。 ### YOLOF -请参考 [YOLOF](https://github.com/open-mmlab/mmdetection/blob/master/configs/yolof)。 +请参考 [YOLOF](https://github.com/open-mmlab/mmdetection/blob/main/configs/yolof)。 ### Seesaw Loss -请参考 [Seesaw Loss](https://github.com/open-mmlab/mmdetection/blob/master/configs/seesaw_loss)。 +请参考 [Seesaw Loss](https://github.com/open-mmlab/mmdetection/blob/main/configs/seesaw_loss)。 ### CenterNet -请参考 [CenterNet](https://github.com/open-mmlab/mmdetection/blob/master/configs/centernet)。 +请参考 [CenterNet](https://github.com/open-mmlab/mmdetection/blob/main/configs/centernet)。 ### YOLOX -请参考 [YOLOX](https://github.com/open-mmlab/mmdetection/blob/master/configs/yolox)。 +请参考 [YOLOX](https://github.com/open-mmlab/mmdetection/blob/main/configs/yolox)。 ### PVT -请参考 [PVT](https://github.com/open-mmlab/mmdetection/blob/master/configs/pvt)。 +请参考 [PVT](https://github.com/open-mmlab/mmdetection/blob/main/configs/pvt)。 ### SOLO -请参考 [SOLO](https://github.com/open-mmlab/mmdetection/blob/master/configs/solo)。 +请参考 [SOLO](https://github.com/open-mmlab/mmdetection/blob/main/configs/solo)。 ### QueryInst -请参考 [QueryInst](https://github.com/open-mmlab/mmdetection/blob/master/configs/queryinst)。 +请参考 [QueryInst](https://github.com/open-mmlab/mmdetection/blob/main/configs/queryinst)。 ### Other datasets -我们还在 [PASCAL VOC](https://github.com/open-mmlab/mmdetection/blob/master/configs/pascal_voc),[Cityscapes](https://github.com/open-mmlab/mmdetection/blob/master/configs/cityscapes) 和 [WIDER FACE](https://github.com/open-mmlab/mmdetection/blob/master/configs/wider_face) 上对一些方法进行了基准测试。 +我们还在 [PASCAL VOC](https://github.com/open-mmlab/mmdetection/blob/main/configs/pascal_voc),[Cityscapes](https://github.com/open-mmlab/mmdetection/blob/main/configs/cityscapes) 和 [WIDER FACE](https://github.com/open-mmlab/mmdetection/blob/main/configs/wider_face) 上对一些方法进行了基准测试。 ### Pre-trained Models -我们还通过多尺度训练和更长的训练策略来训练用 ResNet-50 和 [RegNetX-3.2G](https://github.com/open-mmlab/mmdetection/blob/master/configs/regnet) 作为主干网络的 [Faster R-CNN](https://github.com/open-mmlab/mmdetection/blob/master/configs/faster_rcnn) 和 [Mask R-CNN](https://github.com/open-mmlab/mmdetection/blob/master/configs/mask_rcnn)。这些模型可以作为下游任务的预训练模型。 +我们还通过多尺度训练和更长的训练策略来训练用 ResNet-50 和 [RegNetX-3.2G](https://github.com/open-mmlab/mmdetection/blob/main/configs/regnet) 作为主干网络的 [Faster R-CNN](https://github.com/open-mmlab/mmdetection/blob/main/configs/faster_rcnn) 和 [Mask R-CNN](https://github.com/open-mmlab/mmdetection/blob/main/configs/mask_rcnn)。这些模型可以作为下游任务的预训练模型。 ## 速度基准 ### 训练速度基准 -我们提供 [analyze_logs.py](https://github.com/open-mmlab/mmdetection/blob/master/tools/analysis_tools/analyze_logs.py) 来得到训练中每一次迭代的平均时间。示例请参考 [Log Analysis](https://mmdetection.readthedocs.io/en/latest/useful_tools.html#log-analysis)。 +我们提供 [analyze_logs.py](https://github.com/open-mmlab/mmdetection/blob/main/tools/analysis_tools/analyze_logs.py) 来得到训练中每一次迭代的平均时间。示例请参考 [Log Analysis](https://mmdetection.readthedocs.io/en/latest/useful_tools.html#log-analysis)。 -我们与其他流行框架的 Mask R-CNN 训练速度进行比较(数据是从 [detectron2](https://github.com/facebookresearch/detectron2/blob/master/docs/notes/benchmarks.md/) 复制而来)。在 mmdetection 中,我们使用 [mask_rcnn_r50_caffe_fpn_poly_1x_coco_v1.py](https://github.com/open-mmlab/mmdetection/blob/master/configs/mask_rcnn/mask_rcnn_r50_caffe_fpn_poly_1x_coco_v1.py) 进行基准测试。它与 detectron2 的 [mask_rcnn_R_50_FPN_noaug_1x.yaml](https://github.com/facebookresearch/detectron2/blob/master/configs/Detectron1-Comparisons/mask_rcnn_R_50_FPN_noaug_1x.yaml) 设置完全一样。同时,我们还提供了[模型权重](https://download.openmmlab.com/mmdetection/v2.0/benchmark/mask_rcnn_r50_caffe_fpn_poly_1x_coco_no_aug/mask_rcnn_r50_caffe_fpn_poly_1x_coco_no_aug_compare_20200518-10127928.pth)和[训练 log](https://download.openmmlab.com/mmdetection/v2.0/benchmark/mask_rcnn_r50_caffe_fpn_poly_1x_coco_no_aug/mask_rcnn_r50_caffe_fpn_poly_1x_coco_no_aug_20200518_105755.log.json) 作为参考。为了跳过 GPU 预热时间,吞吐量按照100-500次迭代之间的平均吞吐量来计算。 +我们与其他流行框架的 Mask R-CNN 训练速度进行比较(数据是从 [detectron2](https://github.com/facebookresearch/detectron2/blob/main/docs/notes/benchmarks.md/) 复制而来)。在 mmdetection 中,我们使用 [mask-rcnn_r50-caffe_fpn_poly-1x_coco_v1.py](https://github.com/open-mmlab/mmdetection/blob/main/configs/mask_rcnn/mask-rcnn_r50-caffe_fpn_poly-1x_coco_v1.py) 进行基准测试。它与 detectron2 的 [mask_rcnn_R_50_FPN_noaug_1x.yaml](https://github.com/facebookresearch/detectron2/blob/main/configs/Detectron1-Comparisons/mask_rcnn_R_50_FPN_noaug_1x.yaml) 设置完全一样。同时,我们还提供了[模型权重](https://download.openmmlab.com/mmdetection/v2.0/benchmark/mask_rcnn_r50_caffe_fpn_poly_1x_coco_no_aug/mask_rcnn_r50_caffe_fpn_poly_1x_coco_no_aug_compare_20200518-10127928.pth)和[训练 log](https://download.openmmlab.com/mmdetection/v2.0/benchmark/mask_rcnn_r50_caffe_fpn_poly_1x_coco_no_aug/mask_rcnn_r50_caffe_fpn_poly_1x_coco_no_aug_20200518_105755.log.json) 作为参考。为了跳过 GPU 预热时间,吞吐量按照100-500次迭代之间的平均吞吐量来计算。 | 框架 | 吞吐量 (img/s) | | -------------------------------------------------------------------------------------- | -------------- | @@ -267,7 +267,7 @@ MMdetection 常用到的主干网络细节如下表所示: ### 推理时间基准 -我们提供 [benchmark.py](https://github.com/open-mmlab/mmdetection/blob/master/tools/analysis_tools/benchmark.py) 对推理时间进行基准测试。此脚本将推理 2000 张图片并计算忽略前 5 次推理的平均推理时间。可以通过设置 `LOG-INTERVAL` 来改变 log 输出间隔(默认为 50)。 +我们提供 [benchmark.py](https://github.com/open-mmlab/mmdetection/blob/main/tools/analysis_tools/benchmark.py) 对推理时间进行基准测试。此脚本将推理 2000 张图片并计算忽略前 5 次推理的平均推理时间。可以通过设置 `LOG-INTERVAL` 来改变 log 输出间隔(默认为 50)。 ```shell python tools/benchmark.py ${CONFIG} ${CHECKPOINT} [--log-interval $[LOG-INTERVAL]] [--fuse-conv-bn] @@ -295,11 +295,11 @@ python tools/benchmark.py ${CONFIG} ${CHECKPOINT} [--log-interval $[LOG-INTERVAL ### 精度 -| 模型 | 训练策略 | Detectron2 | mmdetection | 下载 | -| -------------------------------------------------------------------------------------------------------------------------------------- | -------- | -------------------------------------------------------------------------------------------------------------------------------------- | ----------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -| [Faster R-CNN](https://github.com/open-mmlab/mmdetection/blob/master/configs/faster_rcnn/faster_rcnn_r50_caffe_fpn_mstrain_1x_coco.py) | 1x | [37.9](https://github.com/facebookresearch/detectron2/blob/master/configs/COCO-Detection/faster_rcnn_R_50_FPN_1x.yaml) | 38.0 | [model](https://download.openmmlab.com/mmdetection/v2.0/benchmark/faster_rcnn_r50_caffe_fpn_mstrain_1x_coco/faster_rcnn_r50_caffe_fpn_mstrain_1x_coco-5324cff8.pth) \| [log](https://download.openmmlab.com/mmdetection/v2.0/benchmark/faster_rcnn_r50_caffe_fpn_mstrain_1x_coco/faster_rcnn_r50_caffe_fpn_mstrain_1x_coco_20200429_234554.log.json) | -| [Mask R-CNN](https://github.com/open-mmlab/mmdetection/blob/master/configs/mask_rcnn/mask_rcnn_r50_caffe_fpn_mstrain-poly_1x_coco.py) | 1x | [38.6 & 35.2](https://github.com/facebookresearch/detectron2/blob/master/configs/COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_1x.yaml) | 38.8 & 35.4 | [model](https://download.openmmlab.com/mmdetection/v2.0/benchmark/mask_rcnn_r50_caffe_fpn_mstrain-poly_1x_coco/mask_rcnn_r50_caffe_fpn_mstrain-poly_1x_coco-dbecf295.pth) \| [log](https://download.openmmlab.com/mmdetection/v2.0/benchmark/mask_rcnn_r50_caffe_fpn_mstrain-poly_1x_coco/mask_rcnn_r50_caffe_fpn_mstrain-poly_1x_coco_20200430_054239.log.json) | -| [Retinanet](https://github.com/open-mmlab/mmdetection/blob/master/configs/retinanet/retinanet_r50_caffe_fpn_mstrain_1x_coco.py) | 1x | [36.5](https://github.com/facebookresearch/detectron2/blob/master/configs/COCO-Detection/retinanet_R_50_FPN_1x.yaml) | 37.0 | [model](https://download.openmmlab.com/mmdetection/v2.0/benchmark/retinanet_r50_caffe_fpn_mstrain_1x_coco/retinanet_r50_caffe_fpn_mstrain_1x_coco-586977a0.pth) \| [log](https://download.openmmlab.com/mmdetection/v2.0/benchmark/retinanet_r50_caffe_fpn_mstrain_1x_coco/retinanet_r50_caffe_fpn_mstrain_1x_coco_20200430_014748.log.json) | +| 模型 | 训练策略 | Detectron2 | mmdetection | 下载 | +| ------------------------------------------------------------------------------------------------------------------------------- | -------- | -------------------------------------------------------------------------------------------------------------------------------------- | ----------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| [Faster R-CNN](https://github.com/open-mmlab/mmdetection/blob/main/configs/faster_rcnn/faster-rcnn_r50-caffe_fpn_ms-1x_coco.py) | 1x | [37.9](https://github.com/facebookresearch/detectron2/blob/main/configs/COCO-Detection/faster_rcnn_R_50_FPN_1x.yaml) | 38.0 | [model](https://download.openmmlab.com/mmdetection/v2.0/benchmark/faster_rcnn_r50_caffe_fpn_mstrain_1x_coco/faster_rcnn_r50_caffe_fpn_mstrain_1x_coco-5324cff8.pth) \| [log](https://download.openmmlab.com/mmdetection/v2.0/benchmark/faster_rcnn_r50_caffe_fpn_mstrain_1x_coco/faster_rcnn_r50_caffe_fpn_mstrain_1x_coco_20200429_234554.log.json) | +| [Mask R-CNN](https://github.com/open-mmlab/mmdetection/blob/main/configs/mask_rcnn/mask-rcnn_r50-caffe_fpn_ms-poly-1x_coco.py) | 1x | [38.6 & 35.2](https://github.com/facebookresearch/detectron2/blob/master/configs/COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_1x.yaml) | 38.8 & 35.4 | [model](https://download.openmmlab.com/mmdetection/v2.0/benchmark/mask_rcnn_r50_caffe_fpn_mstrain-poly_1x_coco/mask_rcnn_r50_caffe_fpn_mstrain-poly_1x_coco-dbecf295.pth) \| [log](https://download.openmmlab.com/mmdetection/v2.0/benchmark/mask_rcnn_r50_caffe_fpn_mstrain-poly_1x_coco/mask_rcnn_r50_caffe_fpn_mstrain-poly_1x_coco_20200430_054239.log.json) | +| [Retinanet](https://github.com/open-mmlab/mmdetection/blob/main/configs/retinanet/retinanet_r50-caffe_fpn_ms-1x_coco.py) | 1x | [36.5](https://github.com/facebookresearch/detectron2/blob/master/configs/COCO-Detection/retinanet_R_50_FPN_1x.yaml) | 37.0 | [model](https://download.openmmlab.com/mmdetection/v2.0/benchmark/retinanet_r50_caffe_fpn_mstrain_1x_coco/retinanet_r50_caffe_fpn_mstrain_1x_coco-586977a0.pth) \| [log](https://download.openmmlab.com/mmdetection/v2.0/benchmark/retinanet_r50_caffe_fpn_mstrain_1x_coco/retinanet_r50_caffe_fpn_mstrain_1x_coco_20200430_014748.log.json) | ### 训练速度 diff --git a/docs/zh_cn/notes/faq.md b/docs/zh_cn/notes/faq.md index bca80ba18ba..7f1333fcd1d 100644 --- a/docs/zh_cn/notes/faq.md +++ b/docs/zh_cn/notes/faq.md @@ -1,6 +1,42 @@ # 常见问题解答 -我们在这里列出了使用时的一些常见问题及其相应的解决方案。 如果您发现有一些问题被遗漏,请随时提 PR 丰富这个列表。 如果您无法在此获得帮助,请使用 [issue模板](https://github.com/open-mmlab/mmdetection/blob/master/.github/ISSUE_TEMPLATE/error-report.md/)创建问题,但是请在模板中填写所有必填信息,这有助于我们更快定位问题。 +我们在这里列出了使用时的一些常见问题及其相应的解决方案。 如果您发现有一些问题被遗漏,请随时提 PR 丰富这个列表。 如果您无法在此获得帮助,请使用 [issue模板](https://github.com/open-mmlab/mmdetection/blob/main/.github/ISSUE_TEMPLATE/error-report.md/)创建问题,但是请在模板中填写所有必填信息,这有助于我们更快定位问题。 + +## PyTorch 2.0 支持 + +MMDetection 目前绝大部分算法已经支持了 PyTorch 2.0 及其 `torch.compile` 功能, 用户只需要安装 MMDetection 3.0.0rc7 及其以上版本即可。如果你在使用中发现有不支持的算法,欢迎给我们反馈。我们也非常欢迎社区贡献者来 benchmark 对比 `torch.compile` 功能所带来的速度提升。 + +如果你想启动 `torch.compile` 功能,只需要在 `train.py` 或者 `test.py` 后面加上 `--cfg-options compile=True`。 以 RTMDet 为例,你可以使用以下命令启动 `torch.compile` 功能: + +```shell +# 单卡 +python tools/train.py configs/rtmdet/rtmdet_s_8xb32-300e_coco.py --cfg-options compile=True + +# 单机 8 卡 +./tools/dist_train.sh configs/rtmdet/rtmdet_s_8xb32-300e_coco.py 8 --cfg-options compile=True + +# 单机 8 卡 + AMP 混合精度训练 +./tools/dist_train.sh configs/rtmdet/rtmdet_s_8xb32-300e_coco.py 8 --cfg-options compile=True --amp +``` + +需要特别注意的是,PyTorch 2.0 对于动态 shape 支持不是非常完善,目标检测算法中大部分不仅输入 shape 是动态的,而且 loss 计算和后处理过程中也是动态的,这会导致在开启 `torch.compile` 功能后训练速度会变慢。基于此,如果你想启动 `torch.compile` 功能,则应该遵循如下原则: + +1. 输入到网络的图片是固定 shape 的,而非多尺度的 +2. 设置 `torch._dynamo.config.cache_size_limit` 参数。TorchDynamo 会将 Python 字节码转换并缓存,已编译的函数会被存入缓存中。当下一次检查发现需要重新编译时,该函数会被重新编译并缓存。但是如果重编译次数超过预设的最大值(64),则该函数将不再被缓存或重新编译。前面说过目标检测算法中的 loss 计算和后处理部分也是动态计算的,这些函数需要在每次迭代中重新编译。因此将 `torch._dynamo.config.cache_size_limit` 参数设置得更小一些可以有效减少编译时间 + +在 MMDetection 中可以通过环境变量 `DYNAMO_CACHE_SIZE_LIMIT` 设置 `torch._dynamo.config.cache_size_limit` 参数,以 RTMDet 为例,命令如下所示: + +```shell +# 单卡 +export DYNAMO_CACHE_SIZE_LIMIT = 4 +python tools/train.py configs/rtmdet/rtmdet_s_8xb32-300e_coco.py --cfg-options compile=True + +# 单机 8 卡 +export DYNAMO_CACHE_SIZE_LIMIT = 4 +./tools/dist_train.sh configs/rtmdet/rtmdet_s_8xb32-300e_coco.py 8 --cfg-options compile=True +``` + +关于 PyTorch 2.0 的 dynamo 常见问题,可以参考 [这里](https://pytorch.org/docs/stable/dynamo/faq.html) ## 安装 @@ -10,7 +46,8 @@ | MMDetection 版本 | MMCV 版本 | MMEngine 版本 | | :--------------: | :---------------------: | :----------------------: | - | 3.x | mmcv>=2.0.0rc4, \<2.1.0 | mmengine>=0.6.0, \<1.0.0 | + | main | mmcv>=2.0.0, \<2.1.0 | mmengine>=0.7.1, \<1.0.0 | + | 3.x | mmcv>=2.0.0, \<2.1.0 | mmengine>=0.7.1, \<1.0.0 | | 3.0.0rc6 | mmcv>=2.0.0rc4, \<2.1.0 | mmengine>=0.6.0, \<1.0.0 | | 3.0.0rc5 | mmcv>=2.0.0rc1, \<2.1.0 | mmengine>=0.3.0, \<1.0.0 | | 3.0.0rc4 | mmcv>=2.0.0rc1, \<2.1.0 | mmengine>=0.3.0, \<1.0.0 | @@ -172,7 +209,7 @@ PYTHONPATH="$(dirname $0)/..":$PYTHONPATH - 训练中保存最好模型 - 可以通过配置 `evaluation = dict(save_best=‘auto’)`开启。在 auto 参数情况下会根据返回的验证结果中的第一个 key 作为选择最优模型的依据,你也可以直接设置评估结果中的 key 来手动设置,例如 `evaluation = dict(save_best=‘mAP’)`。 + 可以通过配置 `default_hooks = dict(checkpoint=dict(type='CheckpointHook', interval=1, save_best='auto')`开启。在 `auto` 参数情况下会根据返回的验证结果中的第一个 key 作为选择最优模型的依据,你也可以直接设置评估结果中的 key 来手动设置,例如 `save_best='coco/bbox_mAP'`。 - 在 Resume 训练中使用 `ExpMomentumEMAHook` diff --git a/docs/zh_cn/overview.md b/docs/zh_cn/overview.md index b27ead66357..5269aed896d 100644 --- a/docs/zh_cn/overview.md +++ b/docs/zh_cn/overview.md @@ -42,11 +42,13 @@ MMDetection 由 7 个主要部分组成,apis、structures、datasets、models 2. MMDetection 的基本使用方法请参考以下教程。 - - [训练和测试](https://mmdetection.readthedocs.io/zh_CN/dev-3.x/user_guides/index.html#train-test) + - [训练和测试](https://mmdetection.readthedocs.io/zh_CN/latest/user_guides/index.html#train-test) - - [实用工具](https://mmdetection.readthedocs.io/zh_CN/dev-3.x/user_guides/index.html#useful-tools) + - [实用工具](https://mmdetection.readthedocs.io/zh_CN/latest/user_guides/index.html#useful-tools) 3. 参考以下教程深入了解: - - [基础概念](https://mmdetection.readthedocs.io/zh_CN/dev-3.x/advanced_guides/index.html#basic-concepts) - - [组件定制](https://mmdetection.readthedocs.io/zh_CN/dev-3.x/advanced_guides/index.html#component-customization) + - [基础概念](https://mmdetection.readthedocs.io/zh_CN/latest/advanced_guides/index.html#basic-concepts) + - [组件定制](https://mmdetection.readthedocs.io/zh_CN/latest/advanced_guides/index.html#component-customization) + +4. 对于 MMDetection 2.x 版本的用户,我们提供了[迁移指南](./migration/migration.md),帮助您完成新版本的适配。 diff --git a/docs/zh_cn/stat.py b/docs/zh_cn/stat.py index aa2c9de7398..1ea5fbd25b8 100755 --- a/docs/zh_cn/stat.py +++ b/docs/zh_cn/stat.py @@ -6,7 +6,7 @@ import numpy as np -url_prefix = 'https://github.com/open-mmlab/mmdetection/blob/3.x/' +url_prefix = 'https://github.com/open-mmlab/mmdetection/blob/main/' files = sorted(glob.glob('../configs/*/README.md')) diff --git a/docs/zh_cn/user_guides/config.md b/docs/zh_cn/user_guides/config.md index a3dc0f26635..3a670bf8ada 100644 --- a/docs/zh_cn/user_guides/config.md +++ b/docs/zh_cn/user_guides/config.md @@ -14,14 +14,14 @@ MMDetection 采用模块化设计,所有功能的模块都可以通过配置 model = dict( type='MaskRCNN', # 检测器名 data_preprocessor=dict( # 数据预处理器的配置,通常包括图像归一化和 padding - type='DetDataPreprocessor', # 数据预处理器的类型,参考 https://mmdetection.readthedocs.io/en/3.x/api.html#mmdet.models.data_preprocessors.DetDataPreprocessor + type='DetDataPreprocessor', # 数据预处理器的类型,参考 https://mmdetection.readthedocs.io/en/latest/api.html#mmdet.models.data_preprocessors.DetDataPreprocessor mean=[123.675, 116.28, 103.53], # 用于预训练骨干网络的图像归一化通道均值,按 R、G、B 排序 std=[58.395, 57.12, 57.375], # 用于预训练骨干网络的图像归一化通道标准差,按 R、G、B 排序 bgr_to_rgb=True, # 是否将图片通道从 BGR 转为 RGB pad_mask=True, # 是否填充实例分割掩码 pad_size_divisor=32), # padding 后的图像的大小应该可以被 ``pad_size_divisor`` 整除 backbone=dict( # 主干网络的配置文件 - type='ResNet', # 主干网络的类别,可用选项请参考 https://mmdetection.readthedocs.io/en/3.x/api.html#mmdet.models.backbones.ResNet + type='ResNet', # 主干网络的类别,可用选项请参考 https://mmdetection.readthedocs.io/en/latest/api.html#mmdet.models.backbones.ResNet depth=50, # 主干网络的深度,对于 ResNet 和 ResNext 通常设置为 50 或 101 num_stages=4, # 主干网络状态(stages)的数目,这些状态产生的特征图作为后续的 head 的输入 out_indices=(0, 1, 2, 3), # 每个状态产生的特征图输出的索引 @@ -33,34 +33,34 @@ model = dict( style='pytorch', # 主干网络的风格,'pytorch' 意思是步长为2的层为 3x3 卷积, 'caffe' 意思是步长为2的层为 1x1 卷积 init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet50')), # 加载通过 ImageNet 预训练的模型 neck=dict( - type='FPN', # 检测器的 neck 是 FPN,我们同样支持 'NASFPN', 'PAFPN' 等,更多细节可以参考 https://mmdetection.readthedocs.io/en/3.x/api.html#mmdet.models.necks.FPN + type='FPN', # 检测器的 neck 是 FPN,我们同样支持 'NASFPN', 'PAFPN' 等,更多细节可以参考 https://mmdetection.readthedocs.io/en/latest/api.html#mmdet.models.necks.FPN in_channels=[256, 512, 1024, 2048], # 输入通道数,这与主干网络的输出通道一致 out_channels=256, # 金字塔特征图每一层的输出通道 num_outs=5), # 输出的范围(scales) rpn_head=dict( - type='RPNHead', # rpn_head 的类型是 'RPNHead', 我们也支持 'GARPNHead' 等,更多细节可以参考 https://mmdetection.readthedocs.io/en/3.x/api.html#mmdet.models.dense_heads.RPNHead + type='RPNHead', # rpn_head 的类型是 'RPNHead', 我们也支持 'GARPNHead' 等,更多细节可以参考 https://mmdetection.readthedocs.io/en/latest/api.html#mmdet.models.dense_heads.RPNHead in_channels=256, # 每个输入特征图的输入通道,这与 neck 的输出通道一致 feat_channels=256, # head 卷积层的特征通道 anchor_generator=dict( # 锚点(Anchor)生成器的配置 - type='AnchorGenerator', # 大多数方法使用 AnchorGenerator 作为锚点生成器, SSD 检测器使用 `SSDAnchorGenerator`。更多细节请参考 https://github.com/open-mmlab/mmdetection/blob/3.x/mmdet/models/task_modules/prior_generators/anchor_generator.py#L18 + type='AnchorGenerator', # 大多数方法使用 AnchorGenerator 作为锚点生成器, SSD 检测器使用 `SSDAnchorGenerator`。更多细节请参考 https://github.com/open-mmlab/mmdetection/blob/main/mmdet/models/task_modules/prior_generators/anchor_generator.py#L18 scales=[8], # 锚点的基本比例,特征图某一位置的锚点面积为 scale * base_sizes ratios=[0.5, 1.0, 2.0], # 高度和宽度之间的比率 strides=[4, 8, 16, 32, 64]), # 锚生成器的步幅。这与 FPN 特征步幅一致。 如果未设置 base_sizes,则当前步幅值将被视为 base_sizes bbox_coder=dict( # 在训练和测试期间对框进行编码和解码 - type='DeltaXYWHBBoxCoder', # 框编码器的类别,'DeltaXYWHBBoxCoder' 是最常用的,更多细节请参考 https://github.com/open-mmlab/mmdetection/blob/3.x/mmdet/models/task_modules/coders/delta_xywh_bbox_coder.py#L13 + type='DeltaXYWHBBoxCoder', # 框编码器的类别,'DeltaXYWHBBoxCoder' 是最常用的,更多细节请参考 https://github.com/open-mmlab/mmdetection/blob/main/mmdet/models/task_modules/coders/delta_xywh_bbox_coder.py#L13 target_means=[0.0, 0.0, 0.0, 0.0], # 用于编码和解码框的目标均值 target_stds=[1.0, 1.0, 1.0, 1.0]), # 用于编码和解码框的标准差 loss_cls=dict( # 分类分支的损失函数配置 - type='CrossEntropyLoss', # 分类分支的损失类型,我们也支持 FocalLoss 等,更多细节请参考 https://github.com/open-mmlab/mmdetection/blob/3.x/mmdet/models/losses/cross_entropy_loss.py#L201 + type='CrossEntropyLoss', # 分类分支的损失类型,我们也支持 FocalLoss 等,更多细节请参考 https://github.com/open-mmlab/mmdetection/blob/main/mmdet/models/losses/cross_entropy_loss.py#L201 use_sigmoid=True, # RPN 通常进行二分类,所以通常使用 sigmoid 函数 los_weight=1.0), # 分类分支的损失权重 loss_bbox=dict( # 回归分支的损失函数配置 - type='L1Loss', # 损失类型,我们还支持许多 IoU Losses 和 Smooth L1-loss 等,更多细节请参考 https://github.com/open-mmlab/mmdetection/blob/3.x/mmdet/models/losses/smooth_l1_loss.py#L56 + type='L1Loss', # 损失类型,我们还支持许多 IoU Losses 和 Smooth L1-loss 等,更多细节请参考 https://github.com/open-mmlab/mmdetection/blob/main/mmdet/models/losses/smooth_l1_loss.py#L56 loss_weight=1.0)), # 回归分支的损失权重 roi_head=dict( # RoIHead 封装了两步(two-stage)/级联(cascade)检测器的第二步 - type='StandardRoIHead', # RoI head 的类型,更多细节请参考 https://github.com/open-mmlab/mmdetection/blob/3.x/mmdet/models/roi_heads/standard_roi_head.py#L17 + type='StandardRoIHead', # RoI head 的类型,更多细节请参考 https://github.com/open-mmlab/mmdetection/blob/main/mmdet/models/roi_heads/standard_roi_head.py#L17 bbox_roi_extractor=dict( # 用于 bbox 回归的 RoI 特征提取器 - type='SingleRoIExtractor', # RoI 特征提取器的类型,大多数方法使用 SingleRoIExtractor,更多细节请参考 https://github.com/open-mmlab/mmdetection/blob/3.x/mmdet/models/roi_heads/roi_extractors/single_level_roi_extractor.py#L13 + type='SingleRoIExtractor', # RoI 特征提取器的类型,大多数方法使用 SingleRoIExtractor,更多细节请参考 https://github.com/open-mmlab/mmdetection/blob/main/mmdet/models/roi_heads/roi_extractors/single_level_roi_extractor.py#L13 roi_layer=dict( # RoI 层的配置 type='RoIAlign', # RoI 层的类别, 也支持 DeformRoIPoolingPack 和 ModulatedDeformRoIPoolingPack,更多细节请参考 https://mmcv.readthedocs.io/en/latest/api.html#mmcv.ops.RoIAlign output_size=7, # 特征图的输出大小 @@ -68,7 +68,7 @@ model = dict( out_channels=256, # 提取特征的输出通道 featmap_strides=[4, 8, 16, 32]), # 多尺度特征图的步幅,应该与主干的架构保持一致 bbox_head=dict( # RoIHead 中 box head 的配置 - type='Shared2FCBBoxHead', # bbox head 的类别,更多细节请参考 https://github.com/open-mmlab/mmdetection/blob/3.x/mmdet/models/roi_heads/bbox_heads/convfc_bbox_head.py#L220 + type='Shared2FCBBoxHead', # bbox head 的类别,更多细节请参考 https://github.com/open-mmlab/mmdetection/blob/main/mmdet/models/roi_heads/bbox_heads/convfc_bbox_head.py#L220 in_channels=256, # bbox head 的输入通道。 这与 roi_extractor 中的 out_channels 一致 fc_out_channels=1024, # FC 层的输出特征通道 roi_feat_size=7, # 候选区域(Region of Interest)特征的大小 @@ -94,7 +94,7 @@ model = dict( out_channels=256, # 提取特征的输出通道 featmap_strides=[4, 8, 16, 32]), # 多尺度特征图的步幅 mask_head=dict( # mask 预测 head 模型 - type='FCNMaskHead', # mask head 的类型,更多细节请参考 https://mmdetection.readthedocs.io/en/3.x/api.html#mmdet.models.roi_heads.FCNMaskHead + type='FCNMaskHead', # mask head 的类型,更多细节请参考 https://mmdetection.readthedocs.io/en/latest/api.html#mmdet.models.roi_heads.FCNMaskHead num_convs=4, # mask head 中的卷积层数 in_channels=256, # 输入通道,应与 mask roi extractor 的输出通道一致 conv_out_channels=256, # 卷积层的输出通道 @@ -106,14 +106,14 @@ model = dict( train_cfg = dict( # rpn 和 rcnn 训练超参数的配置 rpn=dict( # rpn 的训练配置 assigner=dict( # 分配器(assigner)的配置 - type='MaxIoUAssigner', # 分配器的类型,MaxIoUAssigner 用于许多常见的检测器,更多细节请参考 https://github.com/open-mmlab/mmdetection/blob/3.x/mmdet/models/task_modules/assigners/max_iou_assigner.py#L14 + type='MaxIoUAssigner', # 分配器的类型,MaxIoUAssigner 用于许多常见的检测器,更多细节请参考 https://github.com/open-mmlab/mmdetection/blob/main/mmdet/models/task_modules/assigners/max_iou_assigner.py#L14 pos_iou_thr=0.7, # IoU >= 0.7(阈值) 被视为正样本 neg_iou_thr=0.3, # IoU < 0.3(阈值) 被视为负样本 min_pos_iou=0.3, # 将框作为正样本的最小 IoU 阈值 match_low_quality=True, # 是否匹配低质量的框(更多细节见 API 文档) ignore_iof_thr=-1), # 忽略 bbox 的 IoF 阈值 sampler=dict( # 正/负采样器(sampler)的配置 - type='RandomSampler', # 采样器类型,还支持 PseudoSampler 和其他采样器,更多细节请参考 https://github.com/open-mmlab/mmdetection/blob/3.x/mmdet/models/task_modules/samplers/random_sampler.py#L14 + type='RandomSampler', # 采样器类型,还支持 PseudoSampler 和其他采样器,更多细节请参考 https://github.com/open-mmlab/mmdetection/blob/main/mmdet/models/task_modules/samplers/random_sampler.py#L14 num=256, # 样本数量。 pos_fraction=0.5, # 正样本占总样本的比例 neg_pos_ub=-1, # 基于正样本数量的负样本上限 @@ -133,14 +133,14 @@ model = dict( min_bbox_size=0), # 允许的最小 box 尺寸 rcnn=dict( # roi head 的配置。 assigner=dict( # 第二阶段分配器的配置,这与 rpn 中的不同 - type='MaxIoUAssigner', # 分配器的类型,MaxIoUAssigner 目前用于所有 roi_heads。更多细节请参考 https://github.com/open-mmlab/mmdetection/blob/3.x/mmdet/models/task_modules/assigners/max_iou_assigner.py#L14 + type='MaxIoUAssigner', # 分配器的类型,MaxIoUAssigner 目前用于所有 roi_heads。更多细节请参考 https://github.com/open-mmlab/mmdetection/blob/main/mmdet/models/task_modules/assigners/max_iou_assigner.py#L14 pos_iou_thr=0.5, # IoU >= 0.5(阈值)被认为是正样本 neg_iou_thr=0.5, # IoU < 0.5(阈值)被认为是负样本 min_pos_iou=0.5, # 将 box 作为正样本的最小 IoU 阈值 match_low_quality=False, # 是否匹配低质量下的 box(有关更多详细信息,请参阅 API 文档) ignore_iof_thr=-1), # 忽略 bbox 的 IoF 阈值 sampler=dict( - type='RandomSampler', # 采样器的类型,还支持 PseudoSampler 和其他采样器,更多细节请参考 https://github.com/open-mmlab/mmdetection/blob/3.x/mmdet/models/task_modules/samplers/random_sampler.py#L14 + type='RandomSampler', # 采样器的类型,还支持 PseudoSampler 和其他采样器,更多细节请参考 https://github.com/open-mmlab/mmdetection/blob/main/mmdet/models/task_modules/samplers/random_sampler.py#L14 num=512, # 样本数量 pos_fraction=0.25, # 正样本占总样本的比例 neg_pos_ub=-1, # 基于正样本数量的负样本上限 @@ -176,10 +176,9 @@ model = dict( ```python dataset_type = 'CocoDataset' # 数据集类型,这将被用来定义数据集。 data_root = 'data/coco/' # 数据的根路径。 -file_client_args = dict(backend='disk') # 文件读取后端的配置,默认从硬盘读取 train_pipeline = [ # 训练数据处理流程 - dict(type='LoadImageFromFile', file_client_args=file_client_args), # 第 1 个流程,从文件路径里加载图像。 + dict(type='LoadImageFromFile'), # 第 1 个流程,从文件路径里加载图像。 dict( type='LoadAnnotations', # 第 2 个流程,对于当前图像,加载它的注释信息。 with_bbox=True, # 是否使用标注框(bounding box), 目标检测需要设置为 True。 @@ -196,7 +195,7 @@ train_pipeline = [ # 训练数据处理流程 dict(type='PackDetInputs') # 将数据转换为检测器输入格式的流程 ] test_pipeline = [ # 测试数据处理流程 - dict(type='LoadImageFromFile', file_client_args=file_client_args), # 第 1 个流程,从文件路径里加载图像。 + dict(type='LoadImageFromFile'), # 第 1 个流程,从文件路径里加载图像。 dict(type='Resize', scale=(1333, 800), keep_ratio=True), # 变化图像大小的流程。 dict( type='PackDetInputs', # 将数据转换为检测器输入格式的流程 @@ -519,7 +518,7 @@ train_pipeline = [ dict(type='PackDetInputs') ] test_pipeline = [ - dict(type='LoadImageFromFile', file_client_args=file_client_args), + dict(type='LoadImageFromFile'), dict(type='Resize', scale=(1333, 800), keep_ratio=True), dict( type='PackDetInputs', diff --git a/docs/zh_cn/user_guides/dataset_prepare.md b/docs/zh_cn/user_guides/dataset_prepare.md index e03127bdb68..b33ec3bd309 100644 --- a/docs/zh_cn/user_guides/dataset_prepare.md +++ b/docs/zh_cn/user_guides/dataset_prepare.md @@ -1,17 +1,14 @@ -## 数据集准备(待更新) +## 数据集准备 -为了测试一个模型的精度,我们通常会在标准数据集上对其进行测试。MMDetection 支持多个公共数据集,包括 [COCO](https://cocodataset.org/) , -[Pascal VOC](http://host.robots.ox.ac.uk/pascal/VOC) ,[Cityscapes](https://www.cityscapes-dataset.com/) 等等。 -这一部分将会介绍如何在支持的数据集上测试现有模型。 +MMDetection 支持多个公共数据集,包括 [COCO](https://cocodataset.org/), [Pascal VOC](http://host.robots.ox.ac.uk/pascal/VOC), [Cityscapes](https://www.cityscapes-dataset.com/) 和 [其他更多数据集](https://github.com/open-mmlab/mmdetection/tree/main/configs/_base_/datasets)。 -一些公共数据集,比如 Pascal VOC 及其镜像数据集,或者 COCO 等数据集都可以从官方网站或者镜像网站获取。 -注意:在检测任务中,Pascal VOC 2012 是 Pascal VOC 2007 的无交集扩展,我们通常将两者一起使用。 -我们建议将数据集下载,然后解压到项目外部的某个文件夹内,然后通过符号链接的方式,将数据集根目录链接到 `$MMDETECTION/data` 文件夹下,格式如下所示。 -如果你的文件夹结构和下方不同的话,你需要在配置文件中改变对应的路径。 -我们提供了下载 COCO 等数据集的脚本,你可以运行 `python tools/misc/download_dataset.py --dataset-name coco2017` 下载 COCO 数据集。 -对于中国境内的用户,我们也推荐通过开源数据平台 [OpenDataLab](https://opendatalab.com/?source=OpenMMLab%20GitHub) 来下载数据,以获得更好的下载体验。 +一些公共数据集,比如 Pascal VOC 及其镜像数据集,或者 COCO 等数据集都可以从官方网站或者镜像网站获取。注意:在检测任务中,Pascal VOC 2012 是 Pascal VOC 2007 的无交集扩展,我们通常将两者一起使用。 我们建议将数据集下载,然后解压到项目外部的某个文件夹内,然后通过符号链接的方式,将数据集根目录链接到 `$MMDETECTION/data` 文件夹下, 如果你的文件夹结构和下方不同的话,你需要在配置文件中改变对应的路径。 -```plain +我们提供了下载 COCO 等数据集的脚本,你可以运行 `python tools/misc/download_dataset.py --dataset-name coco2017` 下载 COCO 数据集。 对于中国境内的用户,我们也推荐通过开源数据平台 [OpenDataLab](https://opendatalab.com/?source=OpenMMLab%20GitHub) 来下载数据,以获得更好的下载体验。 + +更多用法请参考[数据集下载](./useful_tools.md#dataset-download) + +```text mmdetection ├── mmdet ├── tools @@ -37,7 +34,7 @@ mmdetection 有些模型需要额外的 [COCO-stuff](http://calvin.inf.ed.ac.uk/wp-content/uploads/data/cocostuffdataset/stuffthingmaps_trainval2017.zip) 数据集,比如 HTC,DetectoRS 和 SCNet,你可以下载并解压它们到 `coco` 文件夹下。文件夹会是如下结构: -```plain +```text mmdetection ├── data │ ├── coco diff --git a/docs/zh_cn/user_guides/deploy.md b/docs/zh_cn/user_guides/deploy.md index 135aeb5b0af..da2e7f68241 100644 --- a/docs/zh_cn/user_guides/deploy.md +++ b/docs/zh_cn/user_guides/deploy.md @@ -16,7 +16,7 @@ ## 安装 -请参考[此处](https://mmdetection.readthedocs.io/en/3.x/get_started.html)安装 mmdet。然后,按照[说明](https://mmdeploy.readthedocs.io/zh_CN/1.x/get_started.html#mmdeploy)安装 mmdeploy。 +请参考[此处](https://mmdetection.readthedocs.io/en/latest/get_started.html)安装 mmdet。然后,按照[说明](https://mmdeploy.readthedocs.io/zh_CN/1.x/get_started.html#mmdeploy)安装 mmdeploy。 ```{note} 如果安装的是 mmdeploy 预编译包,那么也请通过 'git clone https://github.com/open-mmlab/mmdeploy.git --depth=1' 下载 mmdeploy 源码。因为它包含了部署时要用到的配置文件 @@ -24,7 +24,7 @@ ## 模型转换 -假设在安装步骤中,mmdetection 和 mmdeploy 代码库在同级目录下,并且当前的工作目录为 mmdetection 的根目录,那么以 [Faster R-CNN](https://github.com/open-mmlab/mmdetection/blob/3.x/configs/faster_rcnn/faster-rcnn_r50_fpn_1x_coco.py) 模型为例,你可以从[此处](https://download.openmmlab.com/mmdetection/v2.0/faster_rcnn/faster_rcnn_r50_fpn_1x_coco/faster_rcnn_r50_fpn_1x_coco_20200130-047c8118.pth)下载对应的 checkpoint,并使用以下代码将之转换为 onnx 模型: +假设在安装步骤中,mmdetection 和 mmdeploy 代码库在同级目录下,并且当前的工作目录为 mmdetection 的根目录,那么以 [Faster R-CNN](https://github.com/open-mmlab/mmdetection/blob/main/configs/faster_rcnn/faster-rcnn_r50_fpn_1x_coco.py) 模型为例,你可以从[此处](https://download.openmmlab.com/mmdetection/v2.0/faster_rcnn/faster_rcnn_r50_fpn_1x_coco/faster_rcnn_r50_fpn_1x_coco_20200130-047c8118.pth)下载对应的 checkpoint,并使用以下代码将之转换为 onnx 模型: ```python from mmdeploy.apis import torch2onnx diff --git a/docs/zh_cn/user_guides/index.rst b/docs/zh_cn/user_guides/index.rst index 0c413db58f0..5abc50ad1cd 100644 --- a/docs/zh_cn/user_guides/index.rst +++ b/docs/zh_cn/user_guides/index.rst @@ -31,3 +31,4 @@ MMDetection 在 `Model Zoo =2.0.0" +# 安装 mmcv 的过程中会自动安装 mmengine +``` + +安装 MMDetection + +```shell +git clone https://github.com/open-mmlab/mmdetection +cd mmdetection +pip install -v -e . +``` + +安装 Label-Studio 和 label-studio-ml-backend + +```shell +# 安装 label-studio 需要一段时间,如果找不到版本请使用官方源 +pip install label-studio==1.7.2 +pip install label-studio-ml==1.0.9 +``` + +下载rtmdet权重 + +```shell +cd path/to/mmetection +mkdir work_dirs +cd work_dirs +wget https://download.openmmlab.com/mmdetection/v3.0/rtmdet/rtmdet_m_8xb32-300e_coco/rtmdet_m_8xb32-300e_coco_20220719_112220-229f527c.pth +``` + +## 启动服务 + +启动 RTMDet 后端推理服务: + +```shell +cd path/to/mmetection + +label-studio-ml start projects/LabelStudio/backend_template --with \ +config_file=configs/rtmdet/rtmdet_m_8xb32-300e_coco.py \ +checkpoint_file=./work_dirs/rtmdet_m_8xb32-300e_coco_20220719_112220-229f527c.pth \ +device=cpu \ +--port 8003 +# device=cpu 为使用 CPU 推理,如果使用 GPU 推理,将 cpu 替换为 cuda:0 +``` + +![](https://cdn.vansin.top/picgo20230330131601.png) + +此时,RTMDet 后端推理服务已经启动,后续在 Label-Studio Web 系统中配置 http://localhost:8003 后端推理服务即可。 + +现在启动 Label-Studio 网页服务: + +```shell +label-studio start +``` + +![](https://cdn.vansin.top/picgo20230330132913.png) + +打开浏览器访问 [http://localhost:8080/](http://localhost:8080/) 即可看到 Label-Studio 的界面。 + +![](https://cdn.vansin.top/picgo20230330133118.png) + +我们注册一个用户,然后创建一个 RTMDet-Semiautomatic-Label 项目。 + +![](https://cdn.vansin.top/picgo20230330133333.png) + +我们通过下面的方式下载好示例的喵喵图片,点击 Data Import 导入需要标注的猫图片。 + +```shell +cd path/to/mmetection +mkdir data && cd data + +wget https://download.openmmlab.com/mmyolo/data/cat_dataset.zip && unzip cat_dataset.zip +``` + +![](https://cdn.vansin.top/picgo20230330133628.png) + +![](https://cdn.vansin.top/picgo20230330133715.png) + +然后选择 Object Detection With Bounding Boxes 模板 + +![](https://cdn.vansin.top/picgo20230330133807.png) + +```shell +airplane +apple +backpack +banana +baseball_bat +baseball_glove +bear +bed +bench +bicycle +bird +boat +book +bottle +bowl +broccoli +bus +cake +car +carrot +cat +cell_phone +chair +clock +couch +cow +cup +dining_table +dog +donut +elephant +fire_hydrant +fork +frisbee +giraffe +hair_drier +handbag +horse +hot_dog +keyboard +kite +knife +laptop +microwave +motorcycle +mouse +orange +oven +parking_meter +person +pizza +potted_plant +refrigerator +remote +sandwich +scissors +sheep +sink +skateboard +skis +snowboard +spoon +sports_ball +stop_sign +suitcase +surfboard +teddy_bear +tennis_racket +tie +toaster +toilet +toothbrush +traffic_light +train +truck +tv +umbrella +vase +wine_glass +zebra +``` + +然后将上述类别复制添加到 Label-Studio,然后点击 Save。 + +![](https://cdn.vansin.top/picgo20230330134027.png) + +然后在设置中点击 Add Model 添加 RTMDet 后端推理服务。 + +![](https://cdn.vansin.top/picgo20230330134320.png) + +点击 Validate and Save,然后点击 Start Labeling。 + +![](https://cdn.vansin.top/picgo20230330134424.png) + +看到如下 Connected 就说明后端推理服务添加成功。 + +![](https://cdn.vansin.top/picgo20230330134554.png) + +## 开始半自动化标注 + +点击 Label 开始标注 + +![](https://cdn.vansin.top/picgo20230330134804.png) + +我们可以看到 RTMDet 后端推理服务已经成功返回了预测结果并显示在图片上,我们可以发现这个喵喵预测的框有点大。 + +![](https://cdn.vansin.top/picgo20230403104419.png) + +我们手工拖动框,修正一下框的位置,得到以下修正过后的标注,然后点击 Submit,本张图片就标注完毕了。 + +![](https://cdn.vansin.top/picgo/20230403105923.png) + +我们 submit 完毕所有图片后,点击 exprot 导出 COCO 格式的数据集,就能把标注好的数据集的压缩包导出来了。 + +![](https://cdn.vansin.top/picgo20230330135921.png) + +用 vscode 打开解压后的文件夹,可以看到标注好的数据集,包含了图片和 json 格式的标注文件。 + +![](https://cdn.vansin.top/picgo20230330140321.png) + +到此半自动化标注就完成了,我们可以用这个数据集在 MMDetection 训练精度更高的模型了,训练出更好的模型,然后再用这个模型继续半自动化标注新采集的图片,这样就可以不断迭代,扩充高质量数据集,提高模型的精度。 + +## 使用 MMYOLO 作为后端推理服务 + +如果想在 MMYOLO 中使用 Label-Studio,可以参考在启动后端推理服务时,将 config_file 和 checkpoint_file 替换为 MMYOLO 的配置文件和权重文件即可。 + +```shell +cd path/to/mmetection + +label-studio-ml start projects/LabelStudio/backend_template --with \ +config_file= path/to/mmyolo_config.py \ +checkpoint_file= path/to/mmyolo_weights.pth \ +device=cpu \ +--port 8003 +# device=cpu 为使用 CPU 推理,如果使用 GPU 推理,将 cpu 替换为 cuda:0 +``` + +旋转目标检测和实例分割还在支持中,敬请期待。 diff --git a/docs/zh_cn/user_guides/robustness_benchmarking.md b/docs/zh_cn/user_guides/robustness_benchmarking.md index d9c66a70f15..e95c79a91f1 100644 --- a/docs/zh_cn/user_guides/robustness_benchmarking.md +++ b/docs/zh_cn/user_guides/robustness_benchmarking.md @@ -1,4 +1,4 @@ -# 检测器鲁棒性检查 (待更新) +# 检测器鲁棒性检查 ## 介绍 diff --git a/docs/zh_cn/user_guides/semi_det.md b/docs/zh_cn/user_guides/semi_det.md index 4665e40260c..a223523705c 100644 --- a/docs/zh_cn/user_guides/semi_det.md +++ b/docs/zh_cn/user_guides/semi_det.md @@ -4,12 +4,13 @@ 按照以下流程进行半监督目标检测: -- [准备和拆分数据集](#准备和拆分数据集) -- [配置多分支数据流程](#配置多分支数据流程) -- [配置加载半监督数据](#配置半监督数据加载) -- [配置半监督模型](#配置半监督模型) -- [配置 MeanTeacherHook](#配置MeanTeacherHook) -- [配置 TeacherStudentValLoop](#配置TeacherStudentValLoop) +- [半监督目标检测](#半监督目标检测) + - [准备和拆分数据集](#准备和拆分数据集) + - [配置多分支数据流程](#配置多分支数据流程) + - [配置半监督数据加载](#配置半监督数据加载) + - [配置半监督模型](#配置半监督模型) + - [配置MeanTeacherHook](#配置meanteacherhook) + - [配置TeacherStudentValLoop](#配置teacherstudentvalloop) ## 准备和拆分数据集 @@ -116,7 +117,7 @@ mmdetection # pipeline used to augment labeled data, # which will be sent to student model for supervised training. sup_pipeline = [ - dict(type='LoadImageFromFile', file_client_args=file_client_args), + dict(type='LoadImageFromFile',backend_args = backend_args), dict(type='LoadAnnotations', with_bbox=True), dict(type='RandomResize', scale=scale, keep_ratio=True), dict(type='RandomFlip', prob=0.5), @@ -163,7 +164,7 @@ strong_pipeline = [ # pipeline used to augment unlabeled data into different views unsup_pipeline = [ - dict(type='LoadImageFromFile', file_client_args=file_client_args), + dict(type='LoadImageFromFile', backend_args = backend_args), dict(type='LoadEmptyAnnotations'), dict( type='MultiBranch', diff --git a/docs/zh_cn/user_guides/test.md b/docs/zh_cn/user_guides/test.md index 0cd70cfa9f8..1b165b049d9 100644 --- a/docs/zh_cn/user_guides/test.md +++ b/docs/zh_cn/user_guides/test.md @@ -1,4 +1,4 @@ -# 测试现有模型(待更新) +# 测试现有模型 我们提供了测试脚本,能够测试一个现有模型在所有数据集(COCO,Pascal VOC,Cityscapes 等)上的性能。我们支持在如下环境下测试: @@ -15,7 +15,6 @@ python tools/test.py \ ${CONFIG_FILE} \ ${CHECKPOINT_FILE} \ [--out ${RESULT_FILE}] \ - [--eval ${EVAL_METRICS}] \ [--show] # CPU 测试:禁用 GPU 并运行单 GPU 测试脚本 @@ -24,7 +23,6 @@ python tools/test.py \ ${CONFIG_FILE} \ ${CHECKPOINT_FILE} \ [--out ${RESULT_FILE}] \ - [--eval ${EVAL_METRICS}] \ [--show] # 单节点多 GPU 测试 @@ -32,8 +30,7 @@ bash tools/dist_test.sh \ ${CONFIG_FILE} \ ${CHECKPOINT_FILE} \ ${GPU_NUM} \ - [--out ${RESULT_FILE}] \ - [--eval ${EVAL_METRICS}] + [--out ${RESULT_FILE}] ``` `tools/dist_test.sh` 也支持多节点测试,不过需要依赖 PyTorch 的 [启动工具](https://pytorch.org/docs/stable/distributed.html#launch-utility) 。 @@ -41,18 +38,15 @@ bash tools/dist_test.sh \ 可选参数: - `RESULT_FILE`: 结果文件名称,需以 .pkl 形式存储。如果没有声明,则不将结果存储到文件。 -- `EVAL_METRICS`: 需要测试的度量指标。可选值是取决于数据集的,比如 `proposal_fast`,`proposal`,`bbox`,`segm` 是 COCO 数据集的可选值,`mAP`,`recall` 是 Pascal VOC 数据集的可选值。Cityscapes 数据集可以测试 `cityscapes` 和所有 COCO 数据集支持的度量指标。 - `--show`: 如果开启,检测结果将被绘制在图像上,以一个新窗口的形式展示。它只适用于单 GPU 的测试,是用于调试和可视化的。请确保使用此功能时,你的 GUI 可以在环境中打开。否则,你可能会遇到这么一个错误 `cannot connect to X server`。 - `--show-dir`: 如果指明,检测结果将会被绘制在图像上并保存到指定目录。它只适用于单 GPU 的测试,是用于调试和可视化的。即使你的环境中没有 GUI,这个选项也可使用。 -- `--show-score-thr`: 如果指明,得分低于此阈值的检测结果将会被移除。 - `--cfg-options`: 如果指明,这里的键值对将会被合并到配置文件中。 -- `--eval-options`: 如果指明,这里的键值对将会作为字典参数被传入 `dataset.evaluation()` 函数中,仅在测试阶段使用。 ### 样例 假设你已经下载了 checkpoint 文件到 `checkpoints/` 文件下了。 -1. 测试 RTMDet 并可视化其结果。按任意键继续下张图片的测试。配置文件和 checkpoint 文件 [在此](https://github.com/open-mmlab/mmdetection/tree/3.x/configs/rtmdet) 。 +1. 测试 RTMDet 并可视化其结果。按任意键继续下张图片的测试。配置文件和 checkpoint 文件 [在此](https://github.com/open-mmlab/mmdetection/tree/main/configs/rtmdet) 。 ```shell python tools/test.py \ @@ -61,7 +55,7 @@ bash tools/dist_test.sh \ --show ``` -2. 测试 RTMDet,并为了之后的可视化保存绘制的图像。配置文件和 checkpoint 文件 [在此](https://github.com/open-mmlab/mmdetection/tree/3.x/configs/rtmdet) 。 +2. 测试 RTMDet,并为了之后的可视化保存绘制的图像。配置文件和 checkpoint 文件 [在此](https://github.com/open-mmlab/mmdetection/tree/main/configs/rtmdet) 。 ```shell python tools/test.py \ @@ -70,67 +64,60 @@ bash tools/dist_test.sh \ --show-dir rtmdet_l_8xb32-300e_coco_results ``` -3. 在 Pascal VOC 数据集上测试 Faster R-CNN,不保存测试结果,测试 `mAP`。配置文件和 checkpoint 文件 [在此](https://github.com/open-mmlab/mmdetection/tree/master/configs/pascal_voc) 。 +3. 在 Pascal VOC 数据集上测试 Faster R-CNN,不保存测试结果,测试 `mAP`。配置文件和 checkpoint 文件 [在此](../../../configs/pascal_voc) 。 ```shell python tools/test.py \ - configs/pascal_voc/faster_rcnn_r50_fpn_1x_voc.py \ - checkpoints/faster_rcnn_r50_fpn_1x_voc0712_20200624-c9895d40.pth \ - --eval mAP + configs/pascal_voc/faster-rcnn_r50_fpn_1x_voc0712.py \ + checkpoints/faster_rcnn_r50_fpn_1x_voc0712_20200624-c9895d40.pth ``` -4. 使用 8 块 GPU 测试 Mask R-CNN,测试 `bbox` 和 `mAP` 。配置文件和 checkpoint 文件 [在此](https://github.com/open-mmlab/mmdetection/tree/master/configs/mask_rcnn) 。 +4. 使用 8 块 GPU 测试 Mask R-CNN,测试 `bbox` 和 `mAP` 。配置文件和 checkpoint 文件 [在此](../../../configs/mask_rcnn) 。 ```shell ./tools/dist_test.sh \ - configs/mask_rcnn_r50_fpn_1x_coco.py \ + configs/mask-rcnn_r50_fpn_1x_coco.py \ checkpoints/mask_rcnn_r50_fpn_1x_coco_20200205-d4b0c5d6.pth \ 8 \ - --out results.pkl \ - --eval bbox segm + --out results.pkl ``` -5. 使用 8 块 GPU 测试 Mask R-CNN,测试**每类**的 `bbox` 和 `mAP`。配置文件和 checkpoint 文件 [在此](https://github.com/open-mmlab/mmdetection/tree/master/configs/mask_rcnn) 。 +5. 使用 8 块 GPU 测试 Mask R-CNN,测试**每类**的 `bbox` 和 `mAP`。配置文件和 checkpoint 文件 [在此](../../../configs/mask_rcnn) 。 ```shell ./tools/dist_test.sh \ - configs/mask_rcnn/mask_rcnn_r50_fpn_1x_coco.py \ + configs/mask_rcnn/mask-rcnn_r50_fpn_1x_coco.py \ checkpoints/mask_rcnn_r50_fpn_1x_coco_20200205-d4b0c5d6.pth \ - 8 \ - --out results.pkl \ - --eval bbox segm \ - --options "classwise=True" + 8 ``` -6. 在 COCO test-dev 数据集上,使用 8 块 GPU 测试 Mask R-CNN,并生成 JSON 文件提交到官方评测服务器。配置文件和 checkpoint 文件 [在此](https://github.com/open-mmlab/mmdetection/tree/master/configs/mask_rcnn) 。 + 该命令生成两个JSON文件 `./work_dirs/coco_instance/test.bbox.json` 和 `./work_dirs/coco_instance/test.segm.json`。 + +6. 在 COCO test-dev 数据集上,使用 8 块 GPU 测试 Mask R-CNN,并生成 JSON 文件提交到官方评测服务器,配置文件和 checkpoint 文件 [在此](../../../configs/mask_rcnnn) 。你可以在 [config](./././configs/_base_/datasets/coco_instance.py) 的注释中用 test_evaluator 和 test_dataloader 替换原来的 test_evaluator 和 test_dataloader,然后运行: ```shell ./tools/dist_test.sh \ - configs/mask_rcnn/mask_rcnn_r50_fpn_1x_coco.py \ - checkpoints/mask_rcnn_r50_fpn_1x_coco_20200205-d4b0c5d6.pth \ - 8 \ - --format-only \ - --options "jsonfile_prefix=./mask_rcnn_test-dev_results" + configs/cityscapes/mask-rcnn_r50_fpn_1x_cityscapes.py \ + checkpoints/mask_rcnn_r50_fpn_1x_cityscapes_20200227-afe51d5a.pth \ + 8 ``` -这行命令生成两个 JSON 文件 `mask_rcnn_test-dev_results.bbox.json` 和 `mask_rcnn_test-dev_results.segm.json`。 + 这行命令生成两个 JSON 文件 `mask_rcnn_test-dev_results.bbox.json` 和 `mask_rcnn_test-dev_results.segm.json`。 -7. 在 Cityscapes 数据集上,使用 8 块 GPU 测试 Mask R-CNN,生成 txt 和 png 文件,并上传到官方评测服务器。配置文件和 checkpoint 文件 [在此](https://github.com/open-mmlab/mmdetection/tree/master/configs/cityscapes) 。 +7. 在 Cityscapes 数据集上,使用 8 块 GPU 测试 Mask R-CNN,生成 txt 和 png 文件,并上传到官方评测服务器。配置文件和 checkpoint 文件 [在此](../../../configs/cityscapes) 。 你可以在 [config](./././configs/_base_/datasets/cityscapes_instance.py) 的注释中用 test_evaluator 和 test_dataloader 替换原来的 test_evaluator 和 test_dataloader,然后运行: ```shell ./tools/dist_test.sh \ - configs/cityscapes/mask_rcnn_r50_fpn_1x_cityscapes.py \ + configs/cityscapes/mask-rcnn_r50_fpn_1x_cityscapes.py \ checkpoints/mask_rcnn_r50_fpn_1x_cityscapes_20200227-afe51d5a.pth \ - 8 \ - --format-only \ - --options "txtfile_prefix=./mask_rcnn_cityscapes_test_results" + 8 ``` -生成的 png 和 txt 文件在 `./mask_rcnn_cityscapes_test_results` 文件夹下。 + 生成的 png 和 txt 文件在 `./work_dirs/cityscapes_metric` 文件夹下。 ### 不使用 Ground Truth 标注进行测试 -MMDetection 支持在不使用 ground-truth 标注的情况下对模型进行测试,这需要用到 `CocoDataset`。如果你的数据集格式不是 COCO 格式的,请将其转化成 COCO 格式。如果你的数据集格式是 VOC 或者 Cityscapes,你可以使用 [tools/dataset_converters](https://github.com/open-mmlab/mmdetection/tree/master/tools/dataset_converters) 内的脚本直接将其转化成 COCO 格式。如果是其他格式,可以使用 [images2coco 脚本](https://github.com/open-mmlab/mmdetection/tree/master/tools/dataset_converters/images2coco.py) 进行转换。 +MMDetection 支持在不使用 ground-truth 标注的情况下对模型进行测试,这需要用到 `CocoDataset`。如果你的数据集格式不是 COCO 格式的,请将其转化成 COCO 格式。如果你的数据集格式是 VOC 或者 Cityscapes,你可以使用 [tools/dataset_converters](https://github.com/open-mmlab/mmdetection/tree/main/tools/dataset_converters) 内的脚本直接将其转化成 COCO 格式。如果是其他格式,可以使用 [images2coco 脚本](https://github.com/open-mmlab/mmdetection/tree/master/tools/dataset_converters/images2coco.py) 进行转换。 ```shell python tools/dataset_converters/images2coco.py \ @@ -154,8 +141,6 @@ python tools/dataset_converters/images2coco.py \ python tools/test.py \ ${CONFIG_FILE} \ ${CHECKPOINT_FILE} \ - --format-only \ - --options ${JSONFILE_PREFIX} \ [--show] # CPU 测试:禁用 GPU 并运行单 GPU 测试脚本 @@ -164,7 +149,6 @@ python tools/test.py \ ${CONFIG_FILE} \ ${CHECKPOINT_FILE} \ [--out ${RESULT_FILE}] \ - [--eval ${EVAL_METRICS}] \ [--show] # 单节点多 GPU 测试 @@ -172,8 +156,6 @@ bash tools/dist_test.sh \ ${CONFIG_FILE} \ ${CHECKPOINT_FILE} \ ${GPU_NUM} \ - --format-only \ - --options ${JSONFILE_PREFIX} \ [--show] ``` @@ -182,14 +164,12 @@ bash tools/dist_test.sh \ ```sh ./tools/dist_test.sh \ - configs/mask_rcnn/mask_rcnn_r50_fpn_1x_coco.py \ + configs/mask_rcnn/mask-rcnn_r50_fpn_1x_coco.py \ checkpoints/mask_rcnn_r50_fpn_1x_coco_20200205-d4b0c5d6.pth \ - 8 \ - -format-only \ - --options "jsonfile_prefix=./mask_rcnn_test-dev_results" + 8 ``` -这行命令生成两个 JSON 文件 `mask_rcnn_test-dev_results.bbox.json` 和 `mask_rcnn_test-dev_results.segm.json`。 +这行命令生成两个 JSON 文件 `./work_dirs/coco_instance/test.bbox.jso` 和 `./work_dirs/coco_instance/test.segm.jsonn`。 ### 批量推理 @@ -197,47 +177,109 @@ MMDetection 在测试模式下,既支持单张图片的推理,也支持对 开启批量推理的配置文件修改方法为: ```shell -data = dict(train=dict(...), val=dict(...), test=dict(samples_per_gpu=2, ...)) +data = dict(train_dataloader=dict(...), val_dataloader=dict(...), test_dataloader=dict(batch_size=2, ...)) ``` -或者你可以通过将 `--cfg-options` 设置为 `--cfg-options data.test.samples_per_gpu=2` 来开启它。 - -### 弃用 ImageToTensor - -在测试模式下,弃用 `ImageToTensor` 流程,取而代之的是 `DefaultFormatBundle`。建议在你的测试数据流程的配置文件中手动替换它,如: - -```python -# (已弃用)使用 ImageToTensor -pipelines = [ - dict(type='LoadImageFromFile'), - dict( - type='MultiScaleFlipAug', - img_scale=(1333, 800), - flip=False, - transforms=[ - dict(type='Resize', keep_ratio=True), - dict(type='RandomFlip'), - dict(type='Normalize', mean=[0, 0, 0], std=[1, 1, 1]), - dict(type='Pad', size_divisor=32), - dict(type='ImageToTensor', keys=['img']), - dict(type='Collect', keys=['img']), - ]) - ] - -# (建议使用)手动将 ImageToTensor 替换为 DefaultFormatBundle -pipelines = [ - dict(type='LoadImageFromFile'), - dict( - type='MultiScaleFlipAug', - img_scale=(1333, 800), - flip=False, - transforms=[ - dict(type='Resize', keep_ratio=True), - dict(type='RandomFlip'), - dict(type='Normalize', mean=[0, 0, 0], std=[1, 1, 1]), - dict(type='Pad', size_divisor=32), - dict(type='DefaultFormatBundle'), - dict(type='Collect', keys=['img']), - ]) - ] +或者你可以通过将 `--cfg-options` 设置为 `--cfg-options test_dataloader.batch_size=` 来开启它。 + +## 测试时增强 (TTA) + +测试时增强 (TTA) 是一种在测试阶段使用的数据增强策略。它对同一张图片应用不同的增强,例如翻转和缩放,用于模型推理,然后将每个增强后的图像的预测结果合并,以获得更准确的预测结果。为了让用户更容易使用 TTA,MMEngine 提供了 [BaseTTAModel](https://mmengine.readthedocs.io/en/latest/api/generated/mmengine.model.BaseTTAModel.html#mmengine.model.BaseTTAModel) 类,允许用户根据自己的需求通过简单地扩展 BaseTTAModel 类来实现不同的 TTA 策略。 + +在 MMDetection 中,我们提供了 [DetTTAModel](../../../mmdet/models/test_time_augs/det_tta.py) 类,它继承自 BaseTTAModel。 + +### 使用案例 + +使用 TTA 需要两个步骤。首先,你需要在配置文件中添加 `tta_model` 和 `tta_pipeline`: + +```shell +tta_model = dict( + type='DetTTAModel', + tta_cfg=dict(nms=dict( + type='nms', + iou_threshold=0.5), + max_per_img=100)) + +tta_pipeline = [ + dict(type='LoadImageFromFile', + backend_args=None), + dict( + type='TestTimeAug', + transforms=[[ + dict(type='Resize', scale=(1333, 800), keep_ratio=True) + ], [ # It uses 2 flipping transformations (flipping and not flipping). + dict(type='RandomFlip', prob=1.), + dict(type='RandomFlip', prob=0.) + ], [ + dict( + type='PackDetInputs', + meta_keys=('img_id', 'img_path', 'ori_shape', + 'img_shape', 'scale_factor', 'flip', + 'flip_direction')) + ]])] ``` + +第二步,运行测试脚本时,设置 `--tta` 参数,如下所示: + +```shell +# 单 GPU 测试 +python tools/test.py \ + ${CONFIG_FILE} \ + ${CHECKPOINT_FILE} \ + [--tta] + +# CPU 测试:禁用 GPU 并运行单 GPU 测试脚本 +export CUDA_VISIBLE_DEVICES=-1 +python tools/test.py \ + ${CONFIG_FILE} \ + ${CHECKPOINT_FILE} \ + [--out ${RESULT_FILE}] \ + [--tta] + +# 多 GPU 测试 +bash tools/dist_test.sh \ + ${CONFIG_FILE} \ + ${CHECKPOINT_FILE} \ + ${GPU_NUM} \ + [--tta] +``` + +你也可以自己修改 TTA 配置,例如添加缩放增强: + +```shell +tta_model = dict( + type='DetTTAModel', + tta_cfg=dict(nms=dict( + type='nms', + iou_threshold=0.5), + max_per_img=100)) + +img_scales = [(1333, 800), (666, 400), (2000, 1200)] +tta_pipeline = [ + dict(type='LoadImageFromFile', + backend_args=None), + dict( + type='TestTimeAug', + transforms=[[ + dict(type='Resize', scale=s, keep_ratio=True) for s in img_scales + ], [ + dict(type='RandomFlip', prob=1.), + dict(type='RandomFlip', prob=0.) + ], [ + dict( + type='PackDetInputs', + meta_keys=('img_id', 'img_path', 'ori_shape', + 'img_shape', 'scale_factor', 'flip', + 'flip_direction')) + ]])] +``` + +以上数据增强管道将首先对图像执行 3 个多尺度转换,然后执行 2 个翻转转换(翻转和不翻转),最后使用 PackDetInputs 将图像打包到最终结果中。 +这里有更多的 TTA 使用案例供您参考: + +- [RetinaNet](../../../configs/retinanet/retinanet_tta.py) +- [CenterNet](../../../configs/centernet/centernet_tta.py) +- [YOLOX](../../../configs/rtmdet/rtmdet_tta.py) +- [RTMDet](../../../configs/yolox/yolox_tta.py) + +更多高级用法和 TTA 的数据流,请参考 [MMEngine](https://mmengine.readthedocs.io/en/latest/advanced_tutorials/test_time_augmentation.html#data-flow)。我们将在后续支持实例分割 TTA。 diff --git a/docs/zh_cn/user_guides/train.md b/docs/zh_cn/user_guides/train.md index 0ae65b24366..428eb11d9b3 100644 --- a/docs/zh_cn/user_guides/train.md +++ b/docs/zh_cn/user_guides/train.md @@ -134,7 +134,7 @@ Slurm 是一个常见的计算集群调度系统。在 Slurm 管理的集群上 GPUS=16 ./tools/slurm_train.sh dev mask_r50_1x configs/mask_rcnn_r50_fpn_1x_coco.py /nfs/xxxx/mask_rcnn_r50_fpn_1x ``` -你可以查看 [源码](https://github.com/open-mmlab/mmdetection/blob/master/tools/slurm_train.sh) 来检查全部的参数和环境变量. +你可以查看 [源码](https://github.com/open-mmlab/mmdetection/blob/main/tools/slurm_train.sh) 来检查全部的参数和环境变量. 在使用 Slurm 时,端口需要以下方的某个方法之一来设置。 @@ -438,7 +438,7 @@ load_from = 'https://download.openmmlab.com/mmdetection/v2.0/mask_rcnn/mask_rcnn python tools/train.py configs/balloon/mask-rcnn_r50-caffe_fpn_ms-poly-1x_balloon.py ``` -参考 [在标准数据集上训练预定义的模型](https://mmdetection.readthedocs.io/zh_CN/3.x/user_guides/train.html#id1) 来获取更多详细的使用方法。 +参考 [在标准数据集上训练预定义的模型](https://mmdetection.readthedocs.io/zh_CN/latest/user_guides/train.html#id1) 来获取更多详细的使用方法。 ## 测试以及推理 @@ -448,4 +448,4 @@ python tools/train.py configs/balloon/mask-rcnn_r50-caffe_fpn_ms-poly-1x_balloon python tools/test.py configs/balloon/mask-rcnn_r50-caffe_fpn_ms-poly-1x_balloon.py work_dirs/mask-rcnn_r50-caffe_fpn_ms-poly-1x_balloon/epoch_12.pth ``` -参考 [测试现有模型](https://mmdetection.readthedocs.io/zh_CN/3.x/user_guides/test.html) 来获取更多详细的使用方法。 +参考 [测试现有模型](https://mmdetection.readthedocs.io/zh_CN/latest/user_guides/test.html) 来获取更多详细的使用方法。 diff --git a/docs/zh_cn/user_guides/useful_hooks.md b/docs/zh_cn/user_guides/useful_hooks.md index 7d24ec4608a..07a59df2a8b 100644 --- a/docs/zh_cn/user_guides/useful_hooks.md +++ b/docs/zh_cn/user_guides/useful_hooks.md @@ -9,7 +9,7 @@ MMDetection 和 MMEngine 为用户提供了多种多样实用的钩子(Hook) ## MemoryProfilerHook -[内存分析钩子](https://github.com/open-mmlab/mmdetection/blob/3.x/mmdet/engine/hooks/memory_profiler_hook.py) +[内存分析钩子](https://github.com/open-mmlab/mmdetection/blob/main/mmdet/engine/hooks/memory_profiler_hook.py) 记录了包括虚拟内存、交换内存、当前进程在内的所有内存信息,它能够帮助捕捉系统的使用状况与发现隐藏的内存泄露问题。为了使用这个钩子,你需要先通过 `pip install memory_profiler psutil` 命令安装 `memory_profiler` 和 `psutil`。 ### 使用 diff --git a/docs/zh_cn/user_guides/useful_tools.md b/docs/zh_cn/user_guides/useful_tools.md index e2b2d626d70..e53ffdfc60a 100644 --- a/docs/zh_cn/user_guides/useful_tools.md +++ b/docs/zh_cn/user_guides/useful_tools.md @@ -296,7 +296,7 @@ Params: 37.74 M **注意**:这个工具还只是实验性质,我们不保证这个数值是绝对正确的。你可以将他用于简单的比较,但如果用于科技论文报告需要再三检查确认。 1. FLOPs 与输入的形状大小相关,参数量没有这个关系,默认的输入形状大小为 (1, 3, 1280, 800) 。 -2. 一些算子并不计入 FLOPs,比如 GN 或其他自定义的算子。你可以参考 [`mmcv.cnn.get_model_complexity_info()`](https://github.com/open-mmlab/mmcv/blob/dev-3.x/mmcv/cnn/utils/flops_counter.py) 查看更详细的说明。 +2. 一些算子并不计入 FLOPs,比如 GN 或其他自定义的算子。你可以参考 [`mmcv.cnn.get_model_complexity_info()`](https://github.com/open-mmlab/mmcv/blob/2.x/mmcv/cnn/utils/flops_counter.py) 查看更详细的说明。 3. 两阶段检测的 FLOPs 大小取决于 proposal 的数量。 ## 模型转换 diff --git a/docs/zh_cn/user_guides/visualization.md b/docs/zh_cn/user_guides/visualization.md index 04aa43c3ed6..f90ab6d49fd 100644 --- a/docs/zh_cn/user_guides/visualization.md +++ b/docs/zh_cn/user_guides/visualization.md @@ -1 +1,93 @@ -# 可视化(待更新) +# 可视化 + +在阅读本教程之前,建议先阅读 MMEngine 的 [Visualization](https://github.com/open-mmlab/mmengine/blob/main/docs/en/advanced_tutorials/visualization.md) 文档,以对 `Visualizer` 的定义和用法有一个初步的了解。 + +简而言之,`Visualizer` 在 MMEngine 中实现以满足日常可视化需求,并包含以下三个主要功能: + +- 实现通用的绘图 API,例如 [`draw_bboxes`](mmengine.visualization.Visualizer.draw_bboxes) 实现了绘制边界框的功能,[`draw_lines`](mmengine.visualization.Visualizer.draw_lines) 实现了绘制线条的功能。 +- 支持将可视化结果、学习率曲线、损失函数曲线以及验证精度曲线写入到各种后端中,包括本地磁盘以及常见的深度学习训练日志工具,例如 [TensorBoard](https://www.tensorflow.org/tensorboard) 和 [Wandb](https://wandb.ai/site)。 +- 支持在代码的任何位置调用以可视化或记录模型在训练或测试期间的中间状态,例如特征图和验证结果。 + +基于 MMEngine 的 `Visualizer`,MMDet 提供了各种预构建的可视化工具,用户可以通过简单地修改以下配置文件来使用它们。 + +- `tools/analysis_tools/browse_dataset.py` 脚本提供了一个数据集可视化功能,可以在数据经过数据转换后绘制图像和相应的注释,具体描述请参见[`browse_dataset.py`](useful_tools.md#Visualization)。 + +- MMEngine实现了`LoggerHook`,使用`Visualizer`将学习率、损失和评估结果写入由`Visualizer`设置的后端。因此,通过修改配置文件中的`Visualizer`后端,例如修改为`TensorBoardVISBackend`或`WandbVISBackend`,可以实现日志记录到常用的训练日志工具,如`TensorBoard`或`WandB`,从而方便用户使用这些可视化工具来分析和监控训练过程。 + +- 在MMDet中实现了`VisualizerHook`,它使用`Visualizer`将验证或预测阶段的预测结果可视化或存储到由`Visualizer`设置的后端。因此,通过修改配置文件中的`Visualizer`后端,例如修改为`TensorBoardVISBackend`或`WandbVISBackend`,可以将预测图像存储到`TensorBoard`或`Wandb`中。 + +## 配置 + +由于使用了注册机制,在MMDet中我们可以通过修改配置文件来设置`Visualizer`的行为。通常,我们会在`configs/_base_/default_runtime.py`中为可视化器定义默认配置,详细信息请参见[配置教程](config.md)。 + +```Python +vis_backends = [dict(type='LocalVisBackend')] +visualizer = dict( + type='DetLocalVisualizer', + vis_backends=vis_backends, + name='visualizer') +``` + +基于上面的例子,我们可以看到`Visualizer`的配置由两个主要部分组成,即`Visualizer`类型和其使用的可视化后端`vis_backends`。 + +- 用户可直接使用`DetLocalVisualizer`来可视化支持任务的标签或预测结果。 +- MMDet默认将可视化后端`vis_backend`设置为本地可视化后端`LocalVisBackend`,将所有可视化结果和其他训练信息保存在本地文件夹中。 + +## 存储 + +MMDet默认使用本地可视化后端[`LocalVisBackend`](mmengine.visualization.LocalVisBackend),`VisualizerHook`和`LoggerHook`中存储的模型损失、学习率、模型评估精度和可视化信息,包括损失、学习率、评估精度将默认保存到`{work_dir}/{config_name}/{time}/{vis_data}`文件夹中。此外,MMDet还支持其他常见的可视化后端,例如`TensorboardVisBackend`和`WandbVisBackend`,您只需要在配置文件中更改`vis_backends`类型为相应的可视化后端即可。例如,只需在配置文件中插入以下代码块即可将数据存储到`TensorBoard`和`Wandb`中。 + +```Python +# https://mmengine.readthedocs.io/en/latest/api/visualization.html +_base_.visualizer.vis_backends = [ + dict(type='LocalVisBackend'), # + dict(type='TensorboardVisBackend'), + dict(type='WandbVisBackend'),] +``` + +## 绘图 + +### 绘制预测结果 + +MMDet主要使用[`DetVisualizationHook`](mmdet.engine.hooks.DetVisualizationHook)来绘制验证和测试的预测结果,默认情况下`DetVisualizationHook`是关闭的,其默认配置如下。 + +```Python +visualization=dict( #用户可视化验证和测试结果 + type='DetVisualizationHook', + draw=False, + interval=1, + show=False) +``` + +以下表格展示了`DetVisualizationHook`支持的参数。 + +| 参数 | 描述 | +| :------: | :------------------------------------------------------------------------------: | +| draw | DetVisualizationHook通过enable参数打开和关闭,默认状态为关闭。 | +| interval | 控制在DetVisualizationHook启用时存储或显示验证或测试结果的间隔,单位为迭代次数。 | +| show | 控制是否可视化验证或测试的结果。 | + +如果您想在训练或测试期间启用 `DetVisualizationHook` 相关功能和配置,您只需要修改配置文件,以 `configs/rtmdet/rtmdet_tiny_8xb32-300e_coco.py` 为例,同时绘制注释和预测,并显示图像,配置文件可以修改如下: + +```Python +visualization = _base_.default_hooks.visualization +visualization.update(dict(draw=True, show=True)) +``` + +
+ +
+ +`test.py`程序提供了`--show`和`--show-dir`参数,可以在测试过程中可视化注释和预测结果,而不需要修改配置文件,从而进一步简化了测试过程。 + +```Shell +# 展示测试结果 +python tools/test.py configs/rtmdet/rtmdet_tiny_8xb32-300e_coco.py https://download.openmmlab.com/mmdetection/v3.0/rtmdet/rtmdet_tiny_8xb32-300e_coco/rtmdet_tiny_8xb32-300e_coco_20220902_112414-78e30dcc.pth --show + +# 指定存储预测结果的位置 +python tools/test.py configs/rtmdet/rtmdet_tiny_8xb32-300e_coco.py https://download.openmmlab.com/mmdetection/v3.0/rtmdet/rtmdet_tiny_8xb32-300e_coco/rtmdet_tiny_8xb32-300e_coco_20220902_112414-78e30dcc.pth --show-dir imgs/ +``` + +
+ +
diff --git a/mmdet/__init__.py b/mmdet/__init__.py index d48c523bc79..e9c1489c7e9 100644 --- a/mmdet/__init__.py +++ b/mmdet/__init__.py @@ -9,7 +9,7 @@ mmcv_maximum_version = '2.1.0' mmcv_version = digit_version(mmcv.__version__) -mmengine_minimum_version = '0.6.0' +mmengine_minimum_version = '0.7.1' mmengine_maximum_version = '1.0.0' mmengine_version = digit_version(mmengine.__version__) diff --git a/mmdet/datasets/base_det_dataset.py b/mmdet/datasets/base_det_dataset.py index 55598ef267a..cbc6bad46f9 100644 --- a/mmdet/datasets/base_det_dataset.py +++ b/mmdet/datasets/base_det_dataset.py @@ -3,7 +3,7 @@ from typing import List, Optional from mmengine.dataset import BaseDataset -from mmengine.fileio import FileClient, load +from mmengine.fileio import load from mmengine.utils import is_abs from ..registry import DATASETS @@ -15,21 +15,28 @@ class BaseDetDataset(BaseDataset): Args: proposal_file (str, optional): Proposals file path. Defaults to None. - file_client_args (dict): Arguments to instantiate a FileClient. - See :class:`mmengine.fileio.FileClient` for details. - Defaults to ``dict(backend='disk')``. + file_client_args (dict): Arguments to instantiate the + corresponding backend in mmdet <= 3.0.0rc6. Defaults to None. + backend_args (dict, optional): Arguments to instantiate the + corresponding backend. Defaults to None. """ def __init__(self, *args, seg_map_suffix: str = '.png', proposal_file: Optional[str] = None, - file_client_args: dict = dict(backend='disk'), + file_client_args: dict = None, + backend_args: dict = None, **kwargs) -> None: self.seg_map_suffix = seg_map_suffix self.proposal_file = proposal_file - self.file_client_args = file_client_args - self.file_client = FileClient(**file_client_args) + self.backend_args = backend_args + if file_client_args is not None: + raise RuntimeError( + 'The `file_client_args` is deprecated, ' + 'please use `backend_args` instead, please refer to' + 'https://github.com/open-mmlab/mmdetection/blob/main/configs/_base_/datasets/coco_detection.py' # noqa: E501 + ) super().__init__(*args, **kwargs) def full_init(self) -> None: @@ -88,7 +95,7 @@ def load_proposals(self) -> None: if not is_abs(self.proposal_file): self.proposal_file = osp.join(self.data_root, self.proposal_file) proposals_list = load( - self.proposal_file, file_client_args=self.file_client_args) + self.proposal_file, backend_args=self.backend_args) assert len(self.data_list) == len(proposals_list) for data_info in self.data_list: img_path = data_info['img_path'] diff --git a/mmdet/datasets/coco.py b/mmdet/datasets/coco.py index 873f635d0b0..f95dd8cb414 100644 --- a/mmdet/datasets/coco.py +++ b/mmdet/datasets/coco.py @@ -3,6 +3,8 @@ import os.path as osp from typing import List, Union +from mmengine.fileio import get_local_path + from mmdet.registry import DATASETS from .api_wrappers import COCO from .base_det_dataset import BaseDetDataset @@ -60,7 +62,8 @@ def load_data_list(self) -> List[dict]: Returns: List[dict]: A list of annotation. """ # noqa: E501 - with self.file_client.get_local_path(self.ann_file) as local_path: + with get_local_path( + self.ann_file, backend_args=self.backend_args) as local_path: self.coco = self.COCOAPI(local_path) # The order of returned `cat_ids` will not # change with the order of the `classes` diff --git a/mmdet/datasets/coco_panoptic.py b/mmdet/datasets/coco_panoptic.py index 917456ac137..33d4189e6c4 100644 --- a/mmdet/datasets/coco_panoptic.py +++ b/mmdet/datasets/coco_panoptic.py @@ -168,7 +168,9 @@ def __init__(self, pipeline: List[Union[dict, Callable]] = [], test_mode: bool = False, lazy_init: bool = False, - max_refetch: int = 1000) -> None: + max_refetch: int = 1000, + backend_args: dict = None, + **kwargs) -> None: super().__init__( ann_file=ann_file, metainfo=metainfo, @@ -180,7 +182,9 @@ def __init__(self, pipeline=pipeline, test_mode=test_mode, lazy_init=lazy_init, - max_refetch=max_refetch) + max_refetch=max_refetch, + backend_args=backend_args, + **kwargs) def parse_data_info(self, raw_data_info: dict) -> dict: """Parse raw annotation to target format. diff --git a/mmdet/datasets/crowdhuman.py b/mmdet/datasets/crowdhuman.py index fd67d2a5cc2..650176ee545 100644 --- a/mmdet/datasets/crowdhuman.py +++ b/mmdet/datasets/crowdhuman.py @@ -7,7 +7,7 @@ import mmcv from mmengine.dist import get_rank -from mmengine.fileio import dump, load +from mmengine.fileio import dump, get, get_text, load from mmengine.logging import print_log from mmengine.utils import ProgressBar @@ -66,8 +66,8 @@ def load_data_list(self) -> List[dict]: Returns: List[dict]: A list of annotation. """ # noqa: E501 - anno_strs = self.file_client.get_text( - self.ann_file).strip().split('\n') + anno_strs = get_text( + self.ann_file, backend_args=self.backend_args).strip().split('\n') print_log('loading CrowdHuman annotation...', level=logging.INFO) data_list = [] prog_bar = ProgressBar(len(anno_strs)) @@ -110,7 +110,7 @@ def parse_data_info(self, raw_data_info: dict) -> Union[dict, List[dict]]: data_info['img_id'] = raw_data_info['ID'] if not self.extra_ann_exist: - img_bytes = self.file_client.get(img_path) + img_bytes = get(img_path, backend_args=self.backend_args) img = mmcv.imfrombytes(img_bytes, backend='cv2') data_info['height'], data_info['width'] = img.shape[:2] self.extra_anns[raw_data_info['ID']] = img.shape[:2] diff --git a/mmdet/datasets/lvis.py b/mmdet/datasets/lvis.py index f24fec4971b..b9629f5d463 100644 --- a/mmdet/datasets/lvis.py +++ b/mmdet/datasets/lvis.py @@ -3,6 +3,8 @@ import warnings from typing import List +from mmengine.fileio import get_local_path + from mmdet.registry import DATASETS from .coco import CocoDataset @@ -285,7 +287,8 @@ def load_data_list(self) -> List[dict]: raise ImportError( 'Package lvis is not installed. Please run "pip install git+https://github.com/lvis-dataset/lvis-api.git".' # noqa: E501 ) - with self.file_client.get_local_path(self.ann_file) as local_path: + with get_local_path( + self.ann_file, backend_args=self.backend_args) as local_path: self.lvis = LVIS(local_path) self.cat_ids = self.lvis.get_cat_ids() self.cat2label = {cat_id: i for i, cat_id in enumerate(self.cat_ids)} @@ -597,7 +600,9 @@ def load_data_list(self) -> List[dict]: raise ImportError( 'Package lvis is not installed. Please run "pip install git+https://github.com/lvis-dataset/lvis-api.git".' # noqa: E501 ) - self.lvis = LVIS(self.ann_file) + with get_local_path( + self.ann_file, backend_args=self.backend_args) as local_path: + self.lvis = LVIS(local_path) self.cat_ids = self.lvis.get_cat_ids() self.cat2label = {cat_id: i for i, cat_id in enumerate(self.cat_ids)} self.cat_img_map = copy.deepcopy(self.lvis.cat_img_map) diff --git a/mmdet/datasets/objects365.py b/mmdet/datasets/objects365.py index 92e3fe14325..e99869bfa30 100644 --- a/mmdet/datasets/objects365.py +++ b/mmdet/datasets/objects365.py @@ -3,6 +3,8 @@ import os.path as osp from typing import List +from mmengine.fileio import get_local_path + from mmdet.registry import DATASETS from .api_wrappers import COCO from .coco import CocoDataset @@ -102,7 +104,8 @@ def load_data_list(self) -> List[dict]: Returns: List[dict]: A list of annotation. """ # noqa: E501 - with self.file_client.get_local_path(self.ann_file) as local_path: + with get_local_path( + self.ann_file, backend_args=self.backend_args) as local_path: self.coco = self.COCOAPI(local_path) # 'categories' list in objects365_train.json and objects365_val.json @@ -234,7 +237,8 @@ def load_data_list(self) -> List[dict]: Returns: List[dict]: A list of annotation. """ # noqa: E501 - with self.file_client.get_local_path(self.ann_file) as local_path: + with get_local_path( + self.ann_file, backend_args=self.backend_args) as local_path: self.coco = self.COCOAPI(local_path) # The order of returned `cat_ids` will not # change with the order of the `classes` diff --git a/mmdet/datasets/openimages.py b/mmdet/datasets/openimages.py index a6994071de1..a3c6c8ec44f 100644 --- a/mmdet/datasets/openimages.py +++ b/mmdet/datasets/openimages.py @@ -5,7 +5,7 @@ from typing import Dict, List, Optional import numpy as np -from mmengine.fileio import load +from mmengine.fileio import get_local_path, load from mmengine.utils import is_abs from mmdet.registry import DATASETS @@ -25,9 +25,8 @@ class OpenImagesDataset(BaseDetDataset): hierarchy_file (str): The file path of the class hierarchy. image_level_ann_file (str): Human-verified image level annotation, which is used in evaluation. - file_client_args (dict): Arguments to instantiate a FileClient. - See :class:`mmengine.fileio.FileClient` for details. - Defaults to ``dict(backend='disk')``. + backend_args (dict, optional): Arguments to instantiate the + corresponding backend. Defaults to None. """ METAINFO: dict = dict(dataset_type='oid_v6') @@ -66,7 +65,8 @@ def load_data_list(self) -> List[dict]: self._metainfo['RELATION_MATRIX'] = relation_matrix data_list = [] - with self.file_client.get_local_path(self.ann_file) as local_path: + with get_local_path( + self.ann_file, backend_args=self.backend_args) as local_path: with open(local_path, 'r') as f: reader = csv.reader(f) last_img_id = None @@ -123,9 +123,7 @@ def load_data_list(self) -> List[dict]: # add image metas to data list img_metas = load( - self.meta_file, - file_format='pkl', - file_client_args=self.file_client_args) + self.meta_file, file_format='pkl', backend_args=self.backend_args) assert len(img_metas) == len(data_list) for i, meta in enumerate(img_metas): img_id = data_list[i]['img_id'] @@ -167,7 +165,8 @@ def _parse_label_file(self, label_file: str) -> tuple: index_list = [] classes_names = [] - with self.file_client.get_local_path(label_file) as local_path: + with get_local_path( + label_file, backend_args=self.backend_args) as local_path: with open(local_path, 'r') as f: reader = csv.reader(f) for line in reader: @@ -201,7 +200,9 @@ def _parse_img_level_ann(self, """ item_lists = defaultdict(list) - with self.file_client.get_local_path(img_level_ann_file) as local_path: + with get_local_path( + img_level_ann_file, + backend_args=self.backend_args) as local_path: with open(local_path, 'r') as f: reader = csv.reader(f) for i, line in enumerate(reader): @@ -230,9 +231,7 @@ def _get_relation_matrix(self, hierarchy_file: str) -> np.ndarray: """ # noqa hierarchy = load( - hierarchy_file, - file_format='json', - file_client_args=self.file_client_args) + hierarchy_file, file_format='json', backend_args=self.backend_args) class_num = len(self._metainfo['classes']) relation_matrix = np.eye(class_num, class_num) relation_matrix = self._convert_hierarchy_tree(hierarchy, @@ -336,7 +335,8 @@ def load_data_list(self) -> List[dict]: self._metainfo['RELATION_MATRIX'] = relation_matrix data_list = [] - with self.file_client.get_local_path(self.ann_file) as local_path: + with get_local_path( + self.ann_file, backend_args=self.backend_args) as local_path: with open(local_path, 'r') as f: lines = f.readlines() i = 0 @@ -368,9 +368,7 @@ def load_data_list(self) -> List[dict]: # add image metas to data list img_metas = load( - self.meta_file, - file_format='pkl', - file_client_args=self.file_client_args) + self.meta_file, file_format='pkl', backend_args=self.backend_args) assert len(img_metas) == len(data_list) for i, meta in enumerate(img_metas): img_id = osp.split(data_list[i]['img_path'])[-1][:-4] @@ -413,7 +411,8 @@ def _parse_label_file(self, label_file: str) -> tuple: label_list = [] id_list = [] index_mapping = {} - with self.file_client.get_local_path(label_file) as local_path: + with get_local_path( + label_file, backend_args=self.backend_args) as local_path: with open(local_path, 'r') as f: reader = csv.reader(f) for line in reader: @@ -445,8 +444,9 @@ def _parse_img_level_ann(self, image_level_ann_file): """ item_lists = defaultdict(list) - with self.file_client.get_local_path( - image_level_ann_file) as local_path: + with get_local_path( + image_level_ann_file, + backend_args=self.backend_args) as local_path: with open(local_path, 'r') as f: reader = csv.reader(f) i = -1 @@ -478,6 +478,7 @@ def _get_relation_matrix(self, hierarchy_file: str) -> np.ndarray: relationship between the parent class and the child class, of shape (class_num, class_num). """ - with self.file_client.get_local_path(hierarchy_file) as local_path: + with get_local_path( + hierarchy_file, backend_args=self.backend_args) as local_path: class_label_tree = np.load(local_path, allow_pickle=True) return class_label_tree[1:, 1:] diff --git a/mmdet/datasets/transforms/loading.py b/mmdet/datasets/transforms/loading.py index f3092d40354..1a408e4d4ec 100644 --- a/mmdet/datasets/transforms/loading.py +++ b/mmdet/datasets/transforms/loading.py @@ -8,7 +8,7 @@ from mmcv.transforms import BaseTransform from mmcv.transforms import LoadAnnotations as MMCV_LoadAnnotations from mmcv.transforms import LoadImageFromFile -from mmengine.fileio import FileClient +from mmengine.fileio import get from mmengine.structures import BaseDataElement from mmdet.registry import TRANSFORMS @@ -88,9 +88,10 @@ class LoadMultiChannelImageFromFiles(BaseTransform): argument for :func:``mmcv.imfrombytes``. See :func:``mmcv.imfrombytes`` for details. Defaults to 'cv2'. - file_client_args (dict): Arguments to instantiate a FileClient. - See :class:`mmengine.fileio.FileClient` for details. - Defaults to ``dict(backend='disk')``. + file_client_args (dict): Arguments to instantiate the + corresponding backend in mmdet <= 3.0.0rc6. Defaults to None. + backend_args (dict, optional): Arguments to instantiate the + corresponding backend in mmdet >= 3.0.0rc7. Defaults to None. """ def __init__( @@ -98,13 +99,19 @@ def __init__( to_float32: bool = False, color_type: str = 'unchanged', imdecode_backend: str = 'cv2', - file_client_args: dict = dict(backend='disk') + file_client_args: dict = None, + backend_args: dict = None, ) -> None: self.to_float32 = to_float32 self.color_type = color_type self.imdecode_backend = imdecode_backend - self.file_client_args = file_client_args.copy() - self.file_client = FileClient(**self.file_client_args) + self.backend_args = backend_args + if file_client_args is not None: + raise RuntimeError( + 'The `file_client_args` is deprecated, ' + 'please use `backend_args` instead, please refer to' + 'https://github.com/open-mmlab/mmdetection/blob/main/configs/_base_/datasets/coco_detection.py' # noqa: E501 + ) def transform(self, results: dict) -> dict: """Transform functions to load multiple images and get images meta @@ -120,7 +127,7 @@ def transform(self, results: dict) -> dict: assert isinstance(results['img_path'], list) img = [] for name in results['img_path']: - img_bytes = self.file_client.get(name) + img_bytes = get(name, backend_args=self.backend_args) img.append( mmcv.imfrombytes( img_bytes, @@ -140,7 +147,7 @@ def __repr__(self): f'to_float32={self.to_float32}, ' f"color_type='{self.color_type}', " f"imdecode_backend='{self.imdecode_backend}', " - f'file_client_args={self.file_client_args})') + f'backend_args={self.backend_args})') return repr_str @@ -236,9 +243,8 @@ class LoadAnnotations(MMCV_LoadAnnotations): argument for :func:``mmcv.imfrombytes``. See :fun:``mmcv.imfrombytes`` for details. Defaults to 'cv2'. - file_client_args (dict): Arguments to instantiate a FileClient. - See :class:``mmengine.fileio.FileClient`` for details. - Defaults to ``dict(backend='disk')``. + backend_args (dict, optional): Arguments to instantiate the + corresponding backend. Defaults to None. """ def __init__(self, @@ -404,7 +410,7 @@ def __repr__(self) -> str: repr_str += f'with_seg={self.with_seg}, ' repr_str += f'poly2mask={self.poly2mask}, ' repr_str += f"imdecode_backend='{self.imdecode_backend}', " - repr_str += f'file_client_args={self.file_client_args})' + repr_str += f'backend_args={self.backend_args})' return repr_str @@ -501,21 +507,18 @@ class LoadPanopticAnnotations(LoadAnnotations): argument for :func:``mmcv.imfrombytes``. See :fun:``mmcv.imfrombytes`` for details. Defaults to 'cv2'. - file_client_args (dict): Arguments to instantiate a FileClient. - See :class:``mmengine.fileio.FileClient`` for details. - Defaults to ``dict(backend='disk')``. + backend_args (dict, optional): Arguments to instantiate the + corresponding backend in mmdet >= 3.0.0rc7. Defaults to None. """ - def __init__( - self, - with_bbox: bool = True, - with_label: bool = True, - with_mask: bool = True, - with_seg: bool = True, - box_type: str = 'hbox', - imdecode_backend: str = 'cv2', - file_client_args: dict = dict(backend='disk') - ) -> None: + def __init__(self, + with_bbox: bool = True, + with_label: bool = True, + with_mask: bool = True, + with_seg: bool = True, + box_type: str = 'hbox', + imdecode_backend: str = 'cv2', + backend_args: dict = None) -> None: try: from panopticapi import utils except ImportError: @@ -525,7 +528,6 @@ def __init__( 'panopticapi.git.') self.rgb2id = utils.rgb2id - self.file_client = FileClient(**file_client_args) super(LoadPanopticAnnotations, self).__init__( with_bbox=with_bbox, with_label=with_label, @@ -534,7 +536,7 @@ def __init__( with_keypoints=False, box_type=box_type, imdecode_backend=imdecode_backend, - file_client_args=file_client_args) + backend_args=backend_args) def _load_masks_and_semantic_segs(self, results: dict) -> None: """Private function to load mask and semantic segmentation annotations. @@ -550,7 +552,8 @@ def _load_masks_and_semantic_segs(self, results: dict) -> None: if results.get('seg_map_path', None) is None: return - img_bytes = self.file_client.get(results['seg_map_path']) + img_bytes = get( + results['seg_map_path'], backend_args=self.backend_args) pan_png = mmcv.imfrombytes( img_bytes, flag='color', channel_order='rgb').squeeze() pan_png = self.rgb2id(pan_png) diff --git a/mmdet/datasets/transforms/transforms.py b/mmdet/datasets/transforms/transforms.py index 129fe9202db..b844d0a3fe7 100644 --- a/mmdet/datasets/transforms/transforms.py +++ b/mmdet/datasets/transforms/transforms.py @@ -722,7 +722,7 @@ def _crop_data(self, results: dict, crop_size: Tuple[int, int], img = img[crop_y1:crop_y2, crop_x1:crop_x2, ...] img_shape = img.shape results['img'] = img - results['img_shape'] = img_shape + results['img_shape'] = img_shape[:2] # crop bboxes accordingly and clip to the image boundary if results.get('gt_bboxes', None) is not None: @@ -1510,7 +1510,7 @@ def transform(self, results: dict) -> Union[dict, None]: return None # back to the original format results = self.mapper(results, self.keymap_back) - results['img_shape'] = results['img'].shape + results['img_shape'] = results['img'].shape[:2] return results def _preprocess_results(self, results: dict) -> tuple: @@ -1574,16 +1574,15 @@ def _postprocess_results( results['masks'] = np.array( [results['masks'][i] for i in results['idx_mapper']]) results['masks'] = ori_masks.__class__( - results['masks'], results['image'].shape[0], - results['image'].shape[1]) + results['masks'], ori_masks.height, ori_masks.width) if (not len(results['idx_mapper']) and self.skip_img_without_anno): return None elif 'masks' in results: - results['masks'] = ori_masks.__class__( - results['masks'], results['image'].shape[0], - results['image'].shape[1]) + results['masks'] = ori_masks.__class__(results['masks'], + ori_masks.height, + ori_masks.width) return results @@ -1862,7 +1861,7 @@ def _train_aug(self, results): if len(gt_bboxes) == 0: results['img'] = cropped_img - results['img_shape'] = cropped_img.shape + results['img_shape'] = cropped_img.shape[:2] return results # if image do not have valid bbox, any crop patch is valid. @@ -1871,7 +1870,7 @@ def _train_aug(self, results): continue results['img'] = cropped_img - results['img_shape'] = cropped_img.shape + results['img_shape'] = cropped_img.shape[:2] x0, y0, x1, y1 = patch @@ -1937,7 +1936,7 @@ def _test_aug(self, results): cropped_img, border, _ = self._crop_image_and_paste( img, [h // 2, w // 2], [target_h, target_w]) results['img'] = cropped_img - results['img_shape'] = cropped_img.shape + results['img_shape'] = cropped_img.shape[:2] results['border'] = border return results @@ -2241,7 +2240,7 @@ def transform(self, results: dict) -> dict: mosaic_ignore_flags = mosaic_ignore_flags[inside_inds] results['img'] = mosaic_img - results['img_shape'] = mosaic_img.shape + results['img_shape'] = mosaic_img.shape[:2] results['gt_bboxes'] = mosaic_bboxes results['gt_bboxes_labels'] = mosaic_bboxes_labels results['gt_ignore_flags'] = mosaic_ignore_flags @@ -2523,7 +2522,7 @@ def transform(self, results: dict) -> dict: mixup_gt_ignore_flags = mixup_gt_ignore_flags[inside_inds] results['img'] = mixup_img.astype(np.uint8) - results['img_shape'] = mixup_img.shape + results['img_shape'] = mixup_img.shape[:2] results['gt_bboxes'] = mixup_gt_bboxes results['gt_bboxes_labels'] = mixup_gt_bboxes_labels results['gt_ignore_flags'] = mixup_gt_ignore_flags @@ -2646,7 +2645,7 @@ def transform(self, results: dict) -> dict: dsize=(width, height), borderValue=self.border_val) results['img'] = img - results['img_shape'] = img.shape + results['img_shape'] = img.shape[:2] bboxes = results['gt_bboxes'] num_bboxes = len(bboxes) @@ -3335,7 +3334,7 @@ def transform(self, results: dict) -> dict: mosaic_ignore_flags = mosaic_ignore_flags[inside_inds] results['img'] = mosaic_img - results['img_shape'] = mosaic_img.shape + results['img_shape'] = mosaic_img.shape[:2] results['gt_bboxes'] = mosaic_bboxes results['gt_bboxes_labels'] = mosaic_bboxes_labels results['gt_ignore_flags'] = mosaic_ignore_flags @@ -3615,7 +3614,7 @@ def transform(self, results: dict) -> dict: mixup_gt_masks = mixup_gt_masks[inside_inds] results['img'] = mixup_img.astype(np.uint8) - results['img_shape'] = mixup_img.shape + results['img_shape'] = mixup_img.shape[:2] results['gt_bboxes'] = mixup_gt_bboxes results['gt_bboxes_labels'] = mixup_gt_bboxes_labels results['gt_ignore_flags'] = mixup_gt_ignore_flags diff --git a/mmdet/datasets/transforms/wrappers.py b/mmdet/datasets/transforms/wrappers.py index e5daf64fa22..3a17711c06b 100644 --- a/mmdet/datasets/transforms/wrappers.py +++ b/mmdet/datasets/transforms/wrappers.py @@ -28,8 +28,7 @@ class MultiBranch(BaseTransform): Examples: >>> branch_field = ['sup', 'unsup_teacher', 'unsup_student'] >>> sup_pipeline = [ - >>> dict(type='LoadImageFromFile', - >>> file_client_args=dict(backend='disk')), + >>> dict(type='LoadImageFromFile'), >>> dict(type='LoadAnnotations', with_bbox=True), >>> dict(type='Resize', scale=(1333, 800), keep_ratio=True), >>> dict(type='RandomFlip', prob=0.5), @@ -39,8 +38,7 @@ class MultiBranch(BaseTransform): >>> sup=dict(type='PackDetInputs')) >>> ] >>> weak_pipeline = [ - >>> dict(type='LoadImageFromFile', - >>> file_client_args=dict(backend='disk')), + >>> dict(type='LoadImageFromFile'), >>> dict(type='LoadAnnotations', with_bbox=True), >>> dict(type='Resize', scale=(1333, 800), keep_ratio=True), >>> dict(type='RandomFlip', prob=0.0), @@ -50,8 +48,7 @@ class MultiBranch(BaseTransform): >>> sup=dict(type='PackDetInputs')) >>> ] >>> strong_pipeline = [ - >>> dict(type='LoadImageFromFile', - >>> file_client_args=dict(backend='disk')), + >>> dict(type='LoadImageFromFile'), >>> dict(type='LoadAnnotations', with_bbox=True), >>> dict(type='Resize', scale=(1333, 800), keep_ratio=True), >>> dict(type='RandomFlip', prob=1.0), @@ -61,8 +58,7 @@ class MultiBranch(BaseTransform): >>> sup=dict(type='PackDetInputs')) >>> ] >>> unsup_pipeline = [ - >>> dict(type='LoadImageFromFile', - >>> file_client_args=file_client_args), + >>> dict(type='LoadImageFromFile'), >>> dict(type='LoadEmptyAnnotations'), >>> dict( >>> type='MultiBranch', @@ -75,15 +71,15 @@ class MultiBranch(BaseTransform): >>> unsup_branch = Compose(unsup_pipeline) >>> print(sup_branch) >>> Compose( - >>> LoadImageFromFile(ignore_empty=False, to_float32=False, color_type='color', imdecode_backend='cv2', file_client_args={'backend': 'disk'}) # noqa - >>> LoadAnnotations(with_bbox=True, with_label=True, with_mask=False, with_seg=False, poly2mask=True, imdecode_backend='cv2', file_client_args={'backend': 'disk'}) # noqa + >>> LoadImageFromFile(ignore_empty=False, to_float32=False, color_type='color', imdecode_backend='cv2') # noqa + >>> LoadAnnotations(with_bbox=True, with_label=True, with_mask=False, with_seg=False, poly2mask=True, imdecode_backend='cv2') # noqa >>> Resize(scale=(1333, 800), scale_factor=None, keep_ratio=True, clip_object_border=True), backend=cv2), interpolation=bilinear) # noqa >>> RandomFlip(prob=0.5, direction=horizontal) >>> MultiBranch(branch_pipelines=['sup']) >>> ) >>> print(unsup_branch) >>> Compose( - >>> LoadImageFromFile(ignore_empty=False, to_float32=False, color_type='color', imdecode_backend='cv2', file_client_args={'backend': 'disk'}) # noqa + >>> LoadImageFromFile(ignore_empty=False, to_float32=False, color_type='color', imdecode_backend='cv2') # noqa >>> LoadEmptyAnnotations(with_bbox=True, with_label=True, with_mask=False, with_seg=False, seg_ignore_label=255) # noqa >>> MultiBranch(branch_pipelines=['unsup_teacher', 'unsup_student']) >>> ) diff --git a/mmdet/datasets/wider_face.py b/mmdet/datasets/wider_face.py index 9edeb80eb55..62c7fff869a 100644 --- a/mmdet/datasets/wider_face.py +++ b/mmdet/datasets/wider_face.py @@ -2,9 +2,12 @@ import os.path as osp import xml.etree.ElementTree as ET -from mmengine.fileio import list_from_file +from mmengine.dist import is_main_process +from mmengine.fileio import get_local_path, list_from_file +from mmengine.utils import ProgressBar from mmdet.registry import DATASETS +from mmdet.utils.typing_utils import List, Union from .xml_style import XMLDataset @@ -17,36 +20,71 @@ class WIDERFaceDataset(XMLDataset): """ METAINFO = {'classes': ('face', ), 'palette': [(0, 255, 0)]} - def __init__(self, **kwargs): - super(WIDERFaceDataset, self).__init__(**kwargs) + def load_data_list(self) -> List[dict]: + """Load annotation from XML style ann_file. - def load_annotations(self, ann_file): - """Load annotation from WIDERFace XML style annotation file. + Returns: + list[dict]: Annotation info from XML file. + """ + assert self._metainfo.get('classes', None) is not None, \ + 'classes in `XMLDataset` can not be None.' + self.cat2label = { + cat: i + for i, cat in enumerate(self._metainfo['classes']) + } + + data_list = [] + img_ids = list_from_file(self.ann_file, backend_args=self.backend_args) + + # loading process takes around 10 mins + if is_main_process(): + prog_bar = ProgressBar(len(img_ids)) + + for img_id in img_ids: + raw_img_info = {} + raw_img_info['img_id'] = img_id + raw_img_info['file_name'] = f'{img_id}.jpg' + parsed_data_info = self.parse_data_info(raw_img_info) + data_list.append(parsed_data_info) + + if is_main_process(): + prog_bar.update() + return data_list + + def parse_data_info(self, img_info: dict) -> Union[dict, List[dict]]: + """Parse raw annotation to target format. Args: - ann_file (str): Path of XML file. + img_info (dict): Raw image information, usually it includes + `img_id`, `file_name`, and `xml_path`. Returns: - list[dict]: Annotation info from XML file. + Union[dict, List[dict]]: Parsed annotation. """ + data_info = {} + img_id = img_info['img_id'] + xml_path = osp.join(self.data_prefix['img'], 'Annotations', + f'{img_id}.xml') + data_info['img_id'] = img_id + data_info['xml_path'] = xml_path - data_infos = [] - img_ids = list_from_file(ann_file) - for img_id in img_ids: - filename = f'{img_id}.jpg' - xml_path = osp.join(self.img_prefix, 'Annotations', - f'{img_id}.xml') - tree = ET.parse(xml_path) - root = tree.getroot() - size = root.find('size') - width = int(size.find('width').text) - height = int(size.find('height').text) - folder = root.find('folder').text - data_infos.append( - dict( - id=img_id, - filename=osp.join(folder, filename), - width=width, - height=height)) - - return data_infos + # deal with xml file + with get_local_path( + xml_path, backend_args=self.backend_args) as local_path: + raw_ann_info = ET.parse(local_path) + root = raw_ann_info.getroot() + size = root.find('size') + width = int(size.find('width').text) + height = int(size.find('height').text) + folder = root.find('folder').text + img_path = osp.join(self.data_prefix['img'], folder, + img_info['file_name']) + data_info['img_path'] = img_path + + data_info['height'] = height + data_info['width'] = width + + # Coordinates are in range [0, width - 1 or height - 1] + data_info['instances'] = self._parse_instance_info( + raw_ann_info, minus_one=False) + return data_info diff --git a/mmdet/datasets/xml_style.py b/mmdet/datasets/xml_style.py index 4f1ba5965d5..f5a6d8ca9b9 100644 --- a/mmdet/datasets/xml_style.py +++ b/mmdet/datasets/xml_style.py @@ -4,7 +4,7 @@ from typing import List, Optional, Union import mmcv -from mmengine.fileio import list_from_file +from mmengine.fileio import get, get_local_path, list_from_file from mmdet.registry import DATASETS from .base_det_dataset import BaseDetDataset @@ -17,9 +17,8 @@ class XMLDataset(BaseDetDataset): Args: img_subdir (str): Subdir where images are stored. Default: JPEGImages. ann_subdir (str): Subdir where annotations are. Default: Annotations. - file_client_args (dict): Arguments to instantiate a FileClient. - See :class:`mmengine.fileio.FileClient` for details. - Defaults to ``dict(backend='disk')``. + backend_args (dict, optional): Arguments to instantiate the + corresponding backend. Defaults to None. """ def __init__(self, @@ -49,8 +48,7 @@ def load_data_list(self) -> List[dict]: } data_list = [] - img_ids = list_from_file( - self.ann_file, file_client_args=self.file_client_args) + img_ids = list_from_file(self.ann_file, backend_args=self.backend_args) for img_id in img_ids: file_name = osp.join(self.img_subdir, f'{img_id}.jpg') xml_path = osp.join(self.sub_data_root, self.ann_subdir, @@ -90,8 +88,9 @@ def parse_data_info(self, img_info: dict) -> Union[dict, List[dict]]: data_info['xml_path'] = img_info['xml_path'] # deal with xml file - with self.file_client.get_local_path( - img_info['xml_path']) as local_path: + with get_local_path( + img_info['xml_path'], + backend_args=self.backend_args) as local_path: raw_ann_info = ET.parse(local_path) root = raw_ann_info.getroot() size = root.find('size') @@ -99,7 +98,7 @@ def parse_data_info(self, img_info: dict) -> Union[dict, List[dict]]: width = int(size.find('width').text) height = int(size.find('height').text) else: - img_bytes = self.file_client.get(img_path) + img_bytes = get(img_path, backend_args=self.backend_args) img = mmcv.imfrombytes(img_bytes, backend='cv2') height, width = img.shape[:2] del img, img_bytes @@ -107,6 +106,24 @@ def parse_data_info(self, img_info: dict) -> Union[dict, List[dict]]: data_info['height'] = height data_info['width'] = width + data_info['instances'] = self._parse_instance_info( + raw_ann_info, minus_one=True) + + return data_info + + def _parse_instance_info(self, + raw_ann_info: ET, + minus_one: bool = True) -> List[dict]: + """parse instance information. + + Args: + raw_ann_info (ElementTree): ElementTree object. + minus_one (bool): Whether to subtract 1 from the coordinates. + Defaults to True. + + Returns: + List[dict]: List of instances. + """ instances = [] for obj in raw_ann_info.findall('object'): instance = {} @@ -117,11 +134,16 @@ def parse_data_info(self, img_info: dict) -> Union[dict, List[dict]]: difficult = 0 if difficult is None else int(difficult.text) bnd_box = obj.find('bndbox') bbox = [ - int(float(bnd_box.find('xmin').text)) - 1, - int(float(bnd_box.find('ymin').text)) - 1, - int(float(bnd_box.find('xmax').text)) - 1, - int(float(bnd_box.find('ymax').text)) - 1 + int(float(bnd_box.find('xmin').text)), + int(float(bnd_box.find('ymin').text)), + int(float(bnd_box.find('xmax').text)), + int(float(bnd_box.find('ymax').text)) ] + + # VOC needs to subtract 1 from the coordinates + if minus_one: + bbox = [x - 1 for x in bbox] + ignore = False if self.bbox_min_size is not None: assert not self.test_mode @@ -136,8 +158,7 @@ def parse_data_info(self, img_info: dict) -> Union[dict, List[dict]]: instance['bbox'] = bbox instance['bbox_label'] = self.cat2label[name] instances.append(instance) - data_info['instances'] = instances - return data_info + return instances def filter_data(self) -> List[dict]: """Filter annotations according to filter_cfg. diff --git a/mmdet/engine/hooks/visualization_hook.py b/mmdet/engine/hooks/visualization_hook.py index 1319ee55ac0..a8372433bd3 100644 --- a/mmdet/engine/hooks/visualization_hook.py +++ b/mmdet/engine/hooks/visualization_hook.py @@ -4,7 +4,7 @@ from typing import Optional, Sequence import mmcv -from mmengine.fileio import FileClient +from mmengine.fileio import get from mmengine.hooks import Hook from mmengine.runner import Runner from mmengine.utils import mkdir_or_exist @@ -42,9 +42,8 @@ class DetVisualizationHook(Hook): wait_time (float): The interval of show (s). Defaults to 0. test_out_dir (str, optional): directory where painted images will be saved in testing process. - file_client_args (dict): Arguments to instantiate a FileClient. - See :class:`mmengine.fileio.FileClient` for details. - Defaults to ``dict(backend='disk')``. + backend_args (dict, optional): Arguments to instantiate the + corresponding backend. Defaults to None. """ def __init__(self, @@ -54,7 +53,7 @@ def __init__(self, show: bool = False, wait_time: float = 0., test_out_dir: Optional[str] = None, - file_client_args: dict = dict(backend='disk')): + backend_args: dict = None): self._visualizer: Visualizer = Visualizer.get_current_instance() self.interval = interval self.score_thr = score_thr @@ -68,8 +67,7 @@ def __init__(self, 'needs to be excluded.') self.wait_time = wait_time - self.file_client_args = file_client_args.copy() - self.file_client = None + self.backend_args = backend_args self.draw = draw self.test_out_dir = test_out_dir self._test_index = 0 @@ -88,16 +86,13 @@ def after_val_iter(self, runner: Runner, batch_idx: int, data_batch: dict, if self.draw is False: return - if self.file_client is None: - self.file_client = FileClient(**self.file_client_args) - # There is no guarantee that the same batch of images # is visualized for each evaluation. total_curr_iter = runner.iter + batch_idx # Visualize only the first data img_path = outputs[0].img_path - img_bytes = self.file_client.get(img_path) + img_bytes = get(img_path, backend_args=self.backend_args) img = mmcv.imfrombytes(img_bytes, channel_order='rgb') if total_curr_iter % self.interval == 0: @@ -129,14 +124,11 @@ def after_test_iter(self, runner: Runner, batch_idx: int, data_batch: dict, self.test_out_dir) mkdir_or_exist(self.test_out_dir) - if self.file_client is None: - self.file_client = FileClient(**self.file_client_args) - for data_sample in outputs: self._test_index += 1 img_path = data_sample.img_path - img_bytes = self.file_client.get(img_path) + img_bytes = get(img_path, backend_args=self.backend_args) img = mmcv.imfrombytes(img_bytes, channel_order='rgb') out_file = None diff --git a/mmdet/evaluation/functional/__init__.py b/mmdet/evaluation/functional/__init__.py index 6125ba74cd5..6f139f7bc4f 100644 --- a/mmdet/evaluation/functional/__init__.py +++ b/mmdet/evaluation/functional/__init__.py @@ -1,5 +1,6 @@ # Copyright (c) OpenMMLab. All rights reserved. from .bbox_overlaps import bbox_overlaps +from .cityscapes_utils import evaluateImgLists from .class_names import (cityscapes_classes, coco_classes, coco_panoptic_classes, dataset_aliases, get_classes, imagenet_det_classes, imagenet_vid_classes, @@ -18,5 +19,6 @@ 'print_recall_summary', 'plot_num_recall', 'plot_iou_recall', 'oid_v6_classes', 'oid_challenge_classes', 'INSTANCE_OFFSET', 'pq_compute_single_core', 'pq_compute_multi_core', 'bbox_overlaps', - 'objects365v1_classes', 'objects365v2_classes', 'coco_panoptic_classes' + 'objects365v1_classes', 'objects365v2_classes', 'coco_panoptic_classes', + 'evaluateImgLists' ] diff --git a/mmdet/evaluation/functional/cityscapes_utils.py b/mmdet/evaluation/functional/cityscapes_utils.py new file mode 100644 index 00000000000..5ced3680dee --- /dev/null +++ b/mmdet/evaluation/functional/cityscapes_utils.py @@ -0,0 +1,302 @@ +# Copyright (c) OpenMMLab. All rights reserved. +# Copyright (c) https://github.com/mcordts/cityscapesScripts +# A wrapper of `cityscapesscripts` which supports loading groundtruth +# image from `backend_args`. +import json +import os +import sys +from pathlib import Path +from typing import Optional, Union + +import mmcv +import numpy as np +from mmengine.fileio import get + +try: + import cityscapesscripts.evaluation.evalInstanceLevelSemanticLabeling as CSEval # noqa: E501 + from cityscapesscripts.evaluation.evalInstanceLevelSemanticLabeling import \ + CArgs # noqa: E501 + from cityscapesscripts.evaluation.instance import Instance + from cityscapesscripts.helpers.csHelpers import (id2label, labels, + writeDict2JSON) + HAS_CITYSCAPESAPI = True +except ImportError: + CArgs = object + HAS_CITYSCAPESAPI = False + + +def evaluateImgLists(prediction_list: list, + groundtruth_list: list, + args: CArgs, + backend_args: Optional[dict] = None, + dump_matches: bool = False) -> dict: + """A wrapper of obj:``cityscapesscripts.evaluation. + + evalInstanceLevelSemanticLabeling.evaluateImgLists``. Support loading + groundtruth image from file backend. + Args: + prediction_list (list): A list of prediction txt file. + groundtruth_list (list): A list of groundtruth image file. + args (CArgs): A global object setting in + obj:``cityscapesscripts.evaluation. + evalInstanceLevelSemanticLabeling`` + backend_args (dict, optional): Arguments to instantiate the + preifx of uri corresponding backend. Defaults to None. + dump_matches (bool): whether dump matches.json. Defaults to False. + Returns: + dict: The computed metric. + """ + if not HAS_CITYSCAPESAPI: + raise RuntimeError('Failed to import `cityscapesscripts`.' + 'Please try to install official ' + 'cityscapesscripts by ' + '"pip install cityscapesscripts"') + # determine labels of interest + CSEval.setInstanceLabels(args) + # get dictionary of all ground truth instances + gt_instances = getGtInstances( + groundtruth_list, args, backend_args=backend_args) + # match predictions and ground truth + matches = matchGtWithPreds(prediction_list, groundtruth_list, gt_instances, + args, backend_args) + if dump_matches: + CSEval.writeDict2JSON(matches, 'matches.json') + # evaluate matches + apScores = CSEval.evaluateMatches(matches, args) + # averages + avgDict = CSEval.computeAverages(apScores, args) + # result dict + resDict = CSEval.prepareJSONDataForResults(avgDict, apScores, args) + if args.JSONOutput: + # create output folder if necessary + path = os.path.dirname(args.exportFile) + CSEval.ensurePath(path) + # Write APs to JSON + CSEval.writeDict2JSON(resDict, args.exportFile) + + CSEval.printResults(avgDict, args) + + return resDict + + +def matchGtWithPreds(prediction_list: list, + groundtruth_list: list, + gt_instances: dict, + args: CArgs, + backend_args=None): + """A wrapper of obj:``cityscapesscripts.evaluation. + + evalInstanceLevelSemanticLabeling.matchGtWithPreds``. Support loading + groundtruth image from file backend. + Args: + prediction_list (list): A list of prediction txt file. + groundtruth_list (list): A list of groundtruth image file. + gt_instances (dict): Groundtruth dict. + args (CArgs): A global object setting in + obj:``cityscapesscripts.evaluation. + evalInstanceLevelSemanticLabeling`` + backend_args (dict, optional): Arguments to instantiate the + preifx of uri corresponding backend. Defaults to None. + Returns: + dict: The processed prediction and groundtruth result. + """ + if not HAS_CITYSCAPESAPI: + raise RuntimeError('Failed to import `cityscapesscripts`.' + 'Please try to install official ' + 'cityscapesscripts by ' + '"pip install cityscapesscripts"') + matches: dict = dict() + if not args.quiet: + print(f'Matching {len(prediction_list)} pairs of images...') + + count = 0 + for (pred, gt) in zip(prediction_list, groundtruth_list): + # Read input files + gt_image = readGTImage(gt, backend_args) + pred_info = readPredInfo(pred) + # Get and filter ground truth instances + unfiltered_instances = gt_instances[gt] + cur_gt_instances_orig = CSEval.filterGtInstances( + unfiltered_instances, args) + + # Try to assign all predictions + (cur_gt_instances, + cur_pred_instances) = CSEval.assignGt2Preds(cur_gt_instances_orig, + gt_image, pred_info, args) + + # append to global dict + matches[gt] = {} + matches[gt]['groundTruth'] = cur_gt_instances + matches[gt]['prediction'] = cur_pred_instances + + count += 1 + if not args.quiet: + print(f'\rImages Processed: {count}', end=' ') + sys.stdout.flush() + + if not args.quiet: + print('') + + return matches + + +def readGTImage(image_file: Union[str, Path], + backend_args: Optional[dict] = None) -> np.ndarray: + """Read an image from path. + + Same as obj:``cityscapesscripts.evaluation. + evalInstanceLevelSemanticLabeling.readGTImage``, but support loading + groundtruth image from file backend. + Args: + image_file (str or Path): Either a str or pathlib.Path. + backend_args (dict, optional): Instantiates the corresponding file + backend. It may contain `backend` key to specify the file + backend. If it contains, the file backend corresponding to this + value will be used and initialized with the remaining values, + otherwise the corresponding file backend will be selected + based on the prefix of the file path. Defaults to None. + Returns: + np.ndarray: The groundtruth image. + """ + img_bytes = get(image_file, backend_args=backend_args) + img = mmcv.imfrombytes(img_bytes, flag='unchanged', backend='pillow') + return img + + +def readPredInfo(prediction_file: str) -> dict: + """A wrapper of obj:``cityscapesscripts.evaluation. + + evalInstanceLevelSemanticLabeling.readPredInfo``. + Args: + prediction_file (str): The prediction txt file. + Returns: + dict: The processed prediction results. + """ + if not HAS_CITYSCAPESAPI: + raise RuntimeError('Failed to import `cityscapesscripts`.' + 'Please try to install official ' + 'cityscapesscripts by ' + '"pip install cityscapesscripts"') + printError = CSEval.printError + + predInfo = {} + if (not os.path.isfile(prediction_file)): + printError(f"Infofile '{prediction_file}' " + 'for the predictions not found.') + with open(prediction_file) as f: + for line in f: + splittedLine = line.split(' ') + if len(splittedLine) != 3: + printError('Invalid prediction file. Expected content: ' + 'relPathPrediction1 labelIDPrediction1 ' + 'confidencePrediction1') + if os.path.isabs(splittedLine[0]): + printError('Invalid prediction file. First entry in each ' + 'line must be a relative path.') + + filename = os.path.join( + os.path.dirname(prediction_file), splittedLine[0]) + + imageInfo = {} + imageInfo['labelID'] = int(float(splittedLine[1])) + imageInfo['conf'] = float(splittedLine[2]) # type: ignore + predInfo[filename] = imageInfo + + return predInfo + + +def getGtInstances(groundtruth_list: list, + args: CArgs, + backend_args: Optional[dict] = None) -> dict: + """A wrapper of obj:``cityscapesscripts.evaluation. + + evalInstanceLevelSemanticLabeling.getGtInstances``. Support loading + groundtruth image from file backend. + Args: + groundtruth_list (list): A list of groundtruth image file. + args (CArgs): A global object setting in + obj:``cityscapesscripts.evaluation. + evalInstanceLevelSemanticLabeling`` + backend_args (dict, optional): Arguments to instantiate the + preifx of uri corresponding backend. Defaults to None. + Returns: + dict: The computed metric. + """ + if not HAS_CITYSCAPESAPI: + raise RuntimeError('Failed to import `cityscapesscripts`.' + 'Please try to install official ' + 'cityscapesscripts by ' + '"pip install cityscapesscripts"') + # if there is a global statistics json, then load it + if (os.path.isfile(args.gtInstancesFile)): + if not args.quiet: + print('Loading ground truth instances from JSON.') + with open(args.gtInstancesFile) as json_file: + gt_instances = json.load(json_file) + # otherwise create it + else: + if (not args.quiet): + print('Creating ground truth instances from png files.') + gt_instances = instances2dict( + groundtruth_list, args, backend_args=backend_args) + writeDict2JSON(gt_instances, args.gtInstancesFile) + + return gt_instances + + +def instances2dict(image_list: list, + args: CArgs, + backend_args: Optional[dict] = None) -> dict: + """A wrapper of obj:``cityscapesscripts.evaluation. + + evalInstanceLevelSemanticLabeling.instances2dict``. Support loading + groundtruth image from file backend. + Args: + image_list (list): A list of image file. + args (CArgs): A global object setting in + obj:``cityscapesscripts.evaluation. + evalInstanceLevelSemanticLabeling`` + backend_args (dict, optional): Arguments to instantiate the + preifx of uri corresponding backend. Defaults to None. + Returns: + dict: The processed groundtruth results. + """ + if not HAS_CITYSCAPESAPI: + raise RuntimeError('Failed to import `cityscapesscripts`.' + 'Please try to install official ' + 'cityscapesscripts by ' + '"pip install cityscapesscripts"') + imgCount = 0 + instanceDict = {} + + if not isinstance(image_list, list): + image_list = [image_list] + + if not args.quiet: + print(f'Processing {len(image_list)} images...') + + for image_name in image_list: + # Load image + img_bytes = get(image_name, backend_args=backend_args) + imgNp = mmcv.imfrombytes(img_bytes, flag='unchanged', backend='pillow') + + # Initialize label categories + instances: dict = {} + for label in labels: + instances[label.name] = [] + + # Loop through all instance ids in instance image + for instanceId in np.unique(imgNp): + instanceObj = Instance(imgNp, instanceId) + + instances[id2label[instanceObj.labelID].name].append( + instanceObj.toDict()) + + instanceDict[image_name] = instances + imgCount += 1 + + if not args.quiet: + print(f'\rImages Processed: {imgCount}', end=' ') + sys.stdout.flush() + + return instanceDict diff --git a/mmdet/evaluation/functional/panoptic_utils.py b/mmdet/evaluation/functional/panoptic_utils.py index 77c6cd22ec1..6faa8ed52bc 100644 --- a/mmdet/evaluation/functional/panoptic_utils.py +++ b/mmdet/evaluation/functional/panoptic_utils.py @@ -1,7 +1,7 @@ # Copyright (c) OpenMMLab. All rights reserved. # Copyright (c) 2018, Alexander Kirillov -# This file supports `file_client` for `panopticapi`, +# This file supports `backend_args` for `panopticapi`, # the source code is copied from `panopticapi`, # only the way to load the gt images is modified. import multiprocessing @@ -9,7 +9,7 @@ import mmcv import numpy as np -from mmengine.fileio import FileClient +from mmengine.fileio import get # A custom value to distinguish instance ID and category ID; need to # be greater than the number of categories. @@ -32,7 +32,7 @@ def pq_compute_single_core(proc_id, gt_folder, pred_folder, categories, - file_client=None, + backend_args=None, print_log=False): """The single core function to evaluate the metric of Panoptic Segmentation. @@ -45,8 +45,8 @@ def pq_compute_single_core(proc_id, gt_folder (str): The path of the ground truth images. pred_folder (str): The path of the prediction images. categories (str): The categories of the dataset. - file_client (object): The file client of the dataset. If None, - the backend will be set to `disk`. + backend_args (object): The Backend of the dataset. If None, + the backend will be set to `local`. print_log (bool): Whether to print the log. Defaults to False. """ if PQStat is None: @@ -55,10 +55,6 @@ def pq_compute_single_core(proc_id, 'pip install git+https://github.com/cocodataset/' 'panopticapi.git.') - if file_client is None: - file_client_args = dict(backend='disk') - file_client = FileClient(**file_client_args) - pq_stat = PQStat() idx = 0 @@ -68,9 +64,10 @@ def pq_compute_single_core(proc_id, proc_id, idx, len(annotation_set))) idx += 1 # The gt images can be on the local disk or `ceph`, so we use - # file_client here. - img_bytes = file_client.get( - os.path.join(gt_folder, gt_ann['file_name'])) + # backend here. + img_bytes = get( + os.path.join(gt_folder, gt_ann['file_name']), + backend_args=backend_args) pan_gt = mmcv.imfrombytes(img_bytes, flag='color', channel_order='rgb') pan_gt = rgb2id(pan_gt) @@ -181,7 +178,7 @@ def pq_compute_multi_core(matched_annotations_list, gt_folder, pred_folder, categories, - file_client=None, + backend_args=None, nproc=32): """Evaluate the metrics of Panoptic Segmentation with multithreading. @@ -194,8 +191,8 @@ def pq_compute_multi_core(matched_annotations_list, gt_folder (str): The path of the ground truth images. pred_folder (str): The path of the prediction images. categories (str): The categories of the dataset. - file_client (object): The file client of the dataset. If None, - the backend will be set to `disk`. + backend_args (object): The file client of the dataset. If None, + the backend will be set to `local`. nproc (int): Number of processes for panoptic quality computing. Defaults to 32. When `nproc` exceeds the number of cpu cores, the number of cpu cores is used. @@ -206,10 +203,6 @@ def pq_compute_multi_core(matched_annotations_list, 'pip install git+https://github.com/cocodataset/' 'panopticapi.git.') - if file_client is None: - file_client_args = dict(backend='disk') - file_client = FileClient(**file_client_args) - cpu_num = min(nproc, multiprocessing.cpu_count()) annotations_split = np.array_split(matched_annotations_list, cpu_num) @@ -220,7 +213,7 @@ def pq_compute_multi_core(matched_annotations_list, for proc_id, annotation_set in enumerate(annotations_split): p = workers.apply_async(pq_compute_single_core, (proc_id, annotation_set, gt_folder, - pred_folder, categories, file_client)) + pred_folder, categories, backend_args)) processes.append(p) # Close the process pool, otherwise it will lead to memory diff --git a/mmdet/evaluation/metrics/cityscapes_metric.py b/mmdet/evaluation/metrics/cityscapes_metric.py index 2b28100aff4..e5cdc179a3c 100644 --- a/mmdet/evaluation/metrics/cityscapes_metric.py +++ b/mmdet/evaluation/metrics/cityscapes_metric.py @@ -2,26 +2,26 @@ import os import os.path as osp import shutil +import tempfile from collections import OrderedDict from typing import Dict, Optional, Sequence import mmcv import numpy as np -from mmengine.dist import is_main_process, master_only +from mmengine.dist import is_main_process from mmengine.evaluator import BaseMetric from mmengine.logging import MMLogger from mmdet.registry import METRICS try: - import cityscapesscripts - from cityscapesscripts.evaluation import \ - evalInstanceLevelSemanticLabeling as CSEval - from cityscapesscripts.helpers import labels as CSLabels + import cityscapesscripts.evaluation.evalInstanceLevelSemanticLabeling as CSEval # noqa: E501 + import cityscapesscripts.helpers.labels as CSLabels + + from mmdet.evaluation.functional import evaluateImgLists + HAS_CITYSCAPESAPI = True except ImportError: - cityscapesscripts = None - CSLabels = None - CSEval = None + HAS_CITYSCAPESAPI = False @METRICS.register_module() @@ -40,8 +40,6 @@ class CityScapesMetric(BaseMetric): evaluation. It is useful when you want to format the result to a specific format and submit it to the test server. Defaults to False. - keep_results (bool): Whether to keep the results. When ``format_only`` - is True, ``keep_results`` must be True. Defaults to False. collect_device (str): Device name used for collecting results from different ranks during distributed training. Must be 'cpu' or 'gpu'. Defaults to 'cpu'. @@ -49,6 +47,12 @@ class CityScapesMetric(BaseMetric): names to disambiguate homonymous metrics of different evaluators. If prefix is not provided in the argument, self.default_prefix will be used instead. Defaults to None. + dump_matches (bool): Whether dump matches.json file during evaluating. + Defaults to False. + file_client_args (dict, optional): Arguments to instantiate the + corresponding backend in mmdet <= 3.0.0rc6. Defaults to None. + backend_args (dict, optional): Arguments to instantiate the + corresponding backend. Defaults to None. """ default_prefix: Optional[str] = 'cityscapes' @@ -56,33 +60,59 @@ def __init__(self, outfile_prefix: str, seg_prefix: Optional[str] = None, format_only: bool = False, - keep_results: bool = False, collect_device: str = 'cpu', - prefix: Optional[str] = None) -> None: - if cityscapesscripts is None: - raise RuntimeError('Please run "pip install cityscapesscripts" to ' - 'install cityscapesscripts first.') - - assert outfile_prefix, 'outfile_prefix must be not None.' - - if format_only: - assert keep_results, 'keep_results must be True when ' - 'format_only is True' - + prefix: Optional[str] = None, + dump_matches: bool = False, + file_client_args: dict = None, + backend_args: dict = None) -> None: + + if not HAS_CITYSCAPESAPI: + raise RuntimeError('Failed to import `cityscapesscripts`.' + 'Please try to install official ' + 'cityscapesscripts by ' + '"pip install cityscapesscripts"') super().__init__(collect_device=collect_device, prefix=prefix) + + self.tmp_dir = None self.format_only = format_only - self.keep_results = keep_results - self.seg_out_dir = osp.abspath(f'{outfile_prefix}.results') - self.seg_prefix = seg_prefix + if self.format_only: + assert outfile_prefix is not None, 'outfile_prefix must be not' + 'None when format_only is True, otherwise the result files will' + 'be saved to a temp directory which will be cleaned up at the end.' + else: + assert seg_prefix is not None, '`seg_prefix` is necessary when ' + 'computing the CityScapes metrics' + + if outfile_prefix is None: + self.tmp_dir = tempfile.TemporaryDirectory() + self.outfile_prefix = osp.join(self.tmp_dir.name, 'results') + else: + # the directory to save predicted panoptic segmentation mask + self.outfile_prefix = osp.join(outfile_prefix, 'results') # type: ignore # yapf: disable # noqa: E501 + + dir_name = osp.expanduser(self.outfile_prefix) + + if osp.exists(dir_name) and is_main_process(): + logger: MMLogger = MMLogger.get_current_instance() + logger.info('remove previous results.') + shutil.rmtree(dir_name) + os.makedirs(dir_name, exist_ok=True) + + self.backend_args = backend_args + if file_client_args is not None: + raise RuntimeError( + 'The `file_client_args` is deprecated, ' + 'please use `backend_args` instead, please refer to' + 'https://github.com/open-mmlab/mmdetection/blob/main/configs/_base_/datasets/coco_detection.py' # noqa: E501 + ) - if is_main_process(): - os.makedirs(self.seg_out_dir, exist_ok=True) + self.seg_prefix = seg_prefix + self.dump_matches = dump_matches - @master_only def __del__(self) -> None: - """Clean up.""" - if not self.keep_results: - shutil.rmtree(self.seg_out_dir) + """Clean up the results if necessary.""" + if self.tmp_dir is not None: + self.tmp_dir.cleanup() # TODO: data_batch is no longer needed, consider adjusting the # parameter position @@ -102,7 +132,7 @@ def process(self, data_batch: dict, data_samples: Sequence[dict]) -> None: pred = data_sample['pred_instances'] filename = data_sample['img_path'] basename = osp.splitext(osp.basename(filename))[0] - pred_txt = osp.join(self.seg_out_dir, basename + '_pred.txt') + pred_txt = osp.join(self.outfile_prefix, basename + '_pred.txt') result['pred_txt'] = pred_txt labels = pred['labels'].cpu().numpy() masks = pred['masks'].cpu().numpy().astype(np.uint8) @@ -118,7 +148,8 @@ def process(self, data_batch: dict, data_samples: Sequence[dict]) -> None: class_name = self.dataset_meta['classes'][label] class_id = CSLabels.name2label[class_name].id png_filename = osp.join( - self.seg_out_dir, basename + f'_{i}_{class_name}.png') + self.outfile_prefix, + basename + f'_{i}_{class_name}.png') mmcv.imwrite(mask, png_filename) f.write(f'{osp.basename(png_filename)} ' f'{class_id} {mask_score}\n') @@ -127,8 +158,7 @@ def process(self, data_batch: dict, data_samples: Sequence[dict]) -> None: gt = dict() img_path = filename.replace('leftImg8bit.png', 'gtFine_instanceIds.png') - img_path = img_path.replace('leftImg8bit', 'gtFine') - gt['file_name'] = osp.join(self.seg_prefix, img_path) + gt['file_name'] = img_path.replace('leftImg8bit', 'gtFine') self.results.append((gt, result)) @@ -146,25 +176,28 @@ def compute_metrics(self, results: list) -> Dict[str, float]: if self.format_only: logger.info( - f'results are saved to {osp.dirname(self.seg_out_dir)}') + f'results are saved to {osp.dirname(self.outfile_prefix)}') return OrderedDict() logger.info('starts to compute metric') gts, preds = zip(*results) # set global states in cityscapes evaluation API - CSEval.args.cityscapesPath = osp.join(self.seg_prefix, '../..') - CSEval.args.predictionPath = self.seg_out_dir - CSEval.args.predictionWalk = None + gt_instances_file = osp.join(self.outfile_prefix, 'gtInstances.json') # type: ignore # yapf: disable # noqa: E501 + # split gt and prediction list + gts, preds = zip(*results) CSEval.args.JSONOutput = False CSEval.args.colorized = False - CSEval.args.gtInstancesFile = osp.join(self.seg_out_dir, - 'gtInstances.json') + CSEval.args.gtInstancesFile = gt_instances_file groundTruthImgList = [gt['file_name'] for gt in gts] predictionImgList = [pred['pred_txt'] for pred in preds] - CSEval_results = CSEval.evaluateImgLists(predictionImgList, - groundTruthImgList, - CSEval.args)['averages'] + CSEval_results = evaluateImgLists( + predictionImgList, + groundTruthImgList, + CSEval.args, + self.backend_args, + dump_matches=self.dump_matches)['averages'] + eval_results = OrderedDict() eval_results['mAP'] = CSEval_results['allAp'] eval_results['AP@50'] = CSEval_results['allAp50%'] diff --git a/mmdet/evaluation/metrics/coco_metric.py b/mmdet/evaluation/metrics/coco_metric.py index bd56803da3d..f77d6516bfa 100644 --- a/mmdet/evaluation/metrics/coco_metric.py +++ b/mmdet/evaluation/metrics/coco_metric.py @@ -9,7 +9,7 @@ import numpy as np import torch from mmengine.evaluator import BaseMetric -from mmengine.fileio import FileClient, dump, load +from mmengine.fileio import dump, get_local_path, load from mmengine.logging import MMLogger from terminaltables import AsciiTable @@ -50,9 +50,10 @@ class CocoMetric(BaseMetric): outfile_prefix (str, optional): The prefix of json files. It includes the file path and the prefix of filename, e.g., "a/b/prefix". If not specified, a temp file will be created. Defaults to None. - file_client_args (dict): Arguments to instantiate a FileClient. - See :class:`mmengine.fileio.FileClient` for details. - Defaults to ``dict(backend='disk')``. + file_client_args (dict, optional): Arguments to instantiate the + corresponding backend in mmdet <= 3.0.0rc6. Defaults to None. + backend_args (dict, optional): Arguments to instantiate the + corresponding backend. Defaults to None. collect_device (str): Device name used for collecting results from different ranks during distributed training. Must be 'cpu' or 'gpu'. Defaults to 'cpu'. @@ -74,7 +75,8 @@ def __init__(self, metric_items: Optional[Sequence[str]] = None, format_only: bool = False, outfile_prefix: Optional[str] = None, - file_client_args: dict = dict(backend='disk'), + file_client_args: dict = None, + backend_args: dict = None, collect_device: str = 'cpu', prefix: Optional[str] = None, sort_categories: bool = False) -> None: @@ -108,13 +110,19 @@ def __init__(self, self.outfile_prefix = outfile_prefix - self.file_client_args = file_client_args - self.file_client = FileClient(**file_client_args) + self.backend_args = backend_args + if file_client_args is not None: + raise RuntimeError( + 'The `file_client_args` is deprecated, ' + 'please use `backend_args` instead, please refer to' + 'https://github.com/open-mmlab/mmdetection/blob/main/configs/_base_/datasets/coco_detection.py' # noqa: E501 + ) # if ann_file is not specified, # initialize coco api with the converted dataset if ann_file is not None: - with self.file_client.get_local_path(ann_file) as local_path: + with get_local_path( + ann_file, backend_args=self.backend_args) as local_path: self._coco_api = COCO(local_path) if sort_categories: # 'categories' list in objects365_train.json and @@ -511,6 +519,7 @@ def compute_metrics(self, results: list) -> Dict[str, float]: results_per_category = [] for idx, cat_id in enumerate(self.cat_ids): + t = [] # area range index 0: all area ranges # max dets index -1: typically 100 per image nm = self._coco_api.loadCats(cat_id)[0] @@ -520,14 +529,38 @@ def compute_metrics(self, results: list) -> Dict[str, float]: ap = np.mean(precision) else: ap = float('nan') - results_per_category.append( - (f'{nm["name"]}', f'{round(ap, 3)}')) + t.append(f'{nm["name"]}') + t.append(f'{round(ap, 3)}') eval_results[f'{nm["name"]}_precision'] = round(ap, 3) - num_columns = min(6, len(results_per_category) * 2) + # indexes of IoU @50 and @75 + for iou in [0, 5]: + precision = precisions[iou, :, idx, 0, -1] + precision = precision[precision > -1] + if precision.size: + ap = np.mean(precision) + else: + ap = float('nan') + t.append(f'{round(ap, 3)}') + + # indexes of area of small, median and large + for area in [1, 2, 3]: + precision = precisions[:, :, idx, area, -1] + precision = precision[precision > -1] + if precision.size: + ap = np.mean(precision) + else: + ap = float('nan') + t.append(f'{round(ap, 3)}') + results_per_category.append(tuple(t)) + + num_columns = len(results_per_category[0]) results_flatten = list( itertools.chain(*results_per_category)) - headers = ['category', 'AP'] * (num_columns // 2) + headers = [ + 'category', 'mAP', 'mAP_50', 'mAP_75', 'mAP_s', + 'mAP_m', 'mAP_l' + ] results_2d = itertools.zip_longest(*[ results_flatten[i::num_columns] for i in range(num_columns) diff --git a/mmdet/evaluation/metrics/coco_occluded_metric.py b/mmdet/evaluation/metrics/coco_occluded_metric.py index 544ff4426ba..81235a04e6e 100644 --- a/mmdet/evaluation/metrics/coco_occluded_metric.py +++ b/mmdet/evaluation/metrics/coco_occluded_metric.py @@ -1,6 +1,4 @@ # Copyright (c) OpenMMLab. All rights reserved. - -import os.path as osp from typing import Dict, List, Optional, Union import mmengine @@ -68,11 +66,6 @@ def __init__( metric: Union[str, List[str]] = ['bbox', 'segm'], **kwargs) -> None: super().__init__(*args, metric=metric, **kwargs) - # load from local file - if osp.isfile(occluded_ann) and not osp.isabs(occluded_ann): - occluded_ann = osp.join(self.data_root, occluded_ann) - if osp.isfile(separated_ann) and not osp.isabs(separated_ann): - separated_ann = osp.join(self.data_root, separated_ann) self.occluded_ann = load(occluded_ann) self.separated_ann = load(separated_ann) self.score_thr = score_thr diff --git a/mmdet/evaluation/metrics/coco_panoptic_metric.py b/mmdet/evaluation/metrics/coco_panoptic_metric.py index bafe275925a..475e51dbc19 100644 --- a/mmdet/evaluation/metrics/coco_panoptic_metric.py +++ b/mmdet/evaluation/metrics/coco_panoptic_metric.py @@ -8,7 +8,7 @@ import mmcv import numpy as np from mmengine.evaluator import BaseMetric -from mmengine.fileio import FileClient, dump, load +from mmengine.fileio import dump, get_local_path, load from mmengine.logging import MMLogger, print_log from terminaltables import AsciiTable @@ -56,9 +56,10 @@ class CocoPanopticMetric(BaseMetric): nproc (int): Number of processes for panoptic quality computing. Defaults to 32. When ``nproc`` exceeds the number of cpu cores, the number of cpu cores is used. - file_client_args (dict): Arguments to instantiate a FileClient. - See :class:`mmengine.fileio.FileClient` for details. - Defaults to ``dict(backend='disk')``. + file_client_args (dict, optional): Arguments to instantiate the + corresponding backend in mmdet <= 3.0.0rc6. Defaults to None. + backend_args (dict, optional): Arguments to instantiate the + corresponding backend. Defaults to None. collect_device (str): Device name used for collecting results from different ranks during distributed training. Must be 'cpu' or 'gpu'. Defaults to 'cpu'. @@ -76,7 +77,8 @@ def __init__(self, format_only: bool = False, outfile_prefix: Optional[str] = None, nproc: int = 32, - file_client_args: dict = dict(backend='disk'), + file_client_args: dict = None, + backend_args: dict = None, collect_device: str = 'cpu', prefix: Optional[str] = None) -> None: if panopticapi is None: @@ -108,19 +110,23 @@ def __init__(self, self.cat_ids = None self.cat2label = None - self.file_client_args = file_client_args - self.file_client = FileClient(**file_client_args) + self.backend_args = backend_args + if file_client_args is not None: + raise RuntimeError( + 'The `file_client_args` is deprecated, ' + 'please use `backend_args` instead, please refer to' + 'https://github.com/open-mmlab/mmdetection/blob/main/configs/_base_/datasets/coco_detection.py' # noqa: E501 + ) if ann_file: - with self.file_client.get_local_path(ann_file) as local_path: + with get_local_path( + ann_file, backend_args=self.backend_args) as local_path: self._coco_api = COCOPanoptic(local_path) self.categories = self._coco_api.cats else: self._coco_api = None self.categories = None - self.file_client = FileClient(**file_client_args) - def __del__(self) -> None: """Clean up.""" if self.tmp_dir is not None: @@ -370,7 +376,7 @@ def _compute_batch_pq_stats(self, data_samples: Sequence[dict]): gt_folder=self.seg_prefix, pred_folder=self.seg_out_dir, categories=categories, - file_client=self.file_client) + backend_args=self.backend_args) self.results.append(pq_stats) @@ -497,7 +503,7 @@ def compute_metrics(self, results: list) -> Dict[str, float]: gt_folder, pred_folder, self.categories, - file_client=self.file_client, + backend_args=self.backend_args, nproc=self.nproc) else: diff --git a/mmdet/evaluation/metrics/crowdhuman_metric.py b/mmdet/evaluation/metrics/crowdhuman_metric.py index a16f4351cde..de2a54edc2b 100644 --- a/mmdet/evaluation/metrics/crowdhuman_metric.py +++ b/mmdet/evaluation/metrics/crowdhuman_metric.py @@ -9,7 +9,7 @@ import numpy as np from mmengine.evaluator import BaseMetric -from mmengine.fileio import FileClient, dump, load +from mmengine.fileio import dump, get_text, load from mmengine.logging import MMLogger from scipy.sparse import csr_matrix from scipy.sparse.csgraph import maximum_bipartite_matching @@ -38,9 +38,10 @@ class CrowdHumanMetric(BaseMetric): outfile_prefix (str, optional): The prefix of json files. It includes the file path and the prefix of filename, e.g., "a/b/prefix". If not specified, a temp file will be created. Defaults to None. - file_client_args (dict): Arguments to instantiate a FileClient. - See :class:`mmengine.fileio.FileClient` for details. - Defaults to ``dict(backend='disk')``. + file_client_args (dict, optional): Arguments to instantiate the + corresponding backend in mmdet <= 3.0.0rc6. Defaults to None. + backend_args (dict, optional): Arguments to instantiate the + corresponding backend. Defaults to None. collect_device (str): Device name used for collecting results from different ranks during distributed training. Must be 'cpu' or 'gpu'. Defaults to 'cpu'. @@ -68,7 +69,8 @@ def __init__(self, metric: Union[str, List[str]] = ['AP', 'MR', 'JI'], format_only: bool = False, outfile_prefix: Optional[str] = None, - file_client_args: dict = dict(backend='disk'), + file_client_args: dict = None, + backend_args: dict = None, collect_device: str = 'cpu', prefix: Optional[str] = None, eval_mode: int = 0, @@ -93,8 +95,13 @@ def __init__(self, 'None when format_only is True, otherwise the result files will' 'be saved to a temp directory which will be cleaned up at the end.' self.outfile_prefix = outfile_prefix - self.file_client_args = file_client_args - self.file_client = FileClient(**file_client_args) + self.backend_args = backend_args + if file_client_args is not None: + raise RuntimeError( + 'The `file_client_args` is deprecated, ' + 'please use `backend_args` instead, please refer to' + 'https://github.com/open-mmlab/mmdetection/blob/main/configs/_base_/datasets/coco_detection.py' # noqa: E501 + ) assert eval_mode in [0, 1, 2], \ "Unknown eval mode. mr_ref should be one of '0', '1', '2'." @@ -221,10 +228,11 @@ def load_eval_samples(self, result_file): Returns: Dict[Image]: The detection result packaged by Image """ - gt_str = self.file_client.get_text(self.ann_file).strip().split('\n') + gt_str = get_text( + self.ann_file, backend_args=self.backend_args).strip().split('\n') gt_records = [json.loads(line) for line in gt_str] - pred_records = load(result_file) + pred_records = load(result_file, backend_args=self.backend_args) eval_samples = dict() for gt_record, pred_record in zip(gt_records, pred_records): assert gt_record['ID'] == pred_record['ID'], \ diff --git a/mmdet/evaluation/metrics/dump_proposals_metric.py b/mmdet/evaluation/metrics/dump_proposals_metric.py index 06ecc78d69b..9e9c53654c1 100644 --- a/mmdet/evaluation/metrics/dump_proposals_metric.py +++ b/mmdet/evaluation/metrics/dump_proposals_metric.py @@ -22,9 +22,10 @@ class DumpProposals(BaseMetric): proposals_file (str): Proposals file path. Defaults to 'proposals.pkl'. num_max_proposals (int, optional): Maximum number of proposals to dump. If not specified, all proposals will be dumped. - file_client_args (dict): Arguments to instantiate a FileClient. - See :class:`mmengine.fileio.FileClient` for details. - Defaults to ``dict(backend='disk')``. + file_client_args (dict, optional): Arguments to instantiate the + corresponding backend in mmdet <= 3.0.0rc6. Defaults to None. + backend_args (dict, optional): Arguments to instantiate the + corresponding backend. Defaults to None. collect_device (str): Device name used for collecting results from different ranks during distributed training. Must be 'cpu' or 'gpu'. Defaults to 'cpu'. @@ -40,13 +41,20 @@ def __init__(self, output_dir: str = '', proposals_file: str = 'proposals.pkl', num_max_proposals: Optional[int] = None, - file_client_args: dict = dict(backend='disk'), + file_client_args: dict = None, + backend_args: dict = None, collect_device: str = 'cpu', prefix: Optional[str] = None) -> None: super().__init__(collect_device=collect_device, prefix=prefix) self.num_max_proposals = num_max_proposals # TODO: update after mmengine finish refactor fileio. - self.file_client_args = file_client_args + self.backend_args = backend_args + if file_client_args is not None: + raise RuntimeError( + 'The `file_client_args` is deprecated, ' + 'please use `backend_args` instead, please refer to' + 'https://github.com/open-mmlab/mmdetection/blob/main/configs/_base_/datasets/coco_detection.py' # noqa: E501 + ) self.output_dir = output_dir assert proposals_file.endswith(('.pkl', '.pickle')), \ 'The output file must be a pkl file.' @@ -106,6 +114,6 @@ def compute_metrics(self, results: list) -> dict: dump( dump_results, file=self.proposals_file, - file_client_args=self.file_client_args) + backend_args=self.backend_args) logger.info(f'Results are saved at {self.proposals_file}') return {} diff --git a/mmdet/evaluation/metrics/lvis_metric.py b/mmdet/evaluation/metrics/lvis_metric.py index 388c097d5ff..e4dd6141c0e 100644 --- a/mmdet/evaluation/metrics/lvis_metric.py +++ b/mmdet/evaluation/metrics/lvis_metric.py @@ -7,6 +7,7 @@ from typing import Dict, List, Optional, Sequence, Union import numpy as np +from mmengine.fileio import get_local_path from mmengine.logging import MMLogger from terminaltables import AsciiTable @@ -62,6 +63,10 @@ class LVISMetric(CocoMetric): names to disambiguate homonymous metrics of different evaluators. If prefix is not provided in the argument, self.default_prefix will be used instead. Defaults to None. + file_client_args (dict, optional): Arguments to instantiate the + corresponding backend in mmdet <= 3.0.0rc6. Defaults to None. + backend_args (dict, optional): Arguments to instantiate the + corresponding backend. Defaults to None. """ default_prefix: Optional[str] = 'lvis' @@ -76,7 +81,9 @@ def __init__(self, format_only: bool = False, outfile_prefix: Optional[str] = None, collect_device: str = 'cpu', - prefix: Optional[str] = None) -> None: + prefix: Optional[str] = None, + file_client_args: dict = None, + backend_args: dict = None) -> None: if lvis is None: raise RuntimeError( 'Package lvis is not installed. Please run "pip install ' @@ -110,10 +117,22 @@ def __init__(self, 'be saved to a temp directory which will be cleaned up at the end.' self.outfile_prefix = outfile_prefix + self.backend_args = backend_args + if file_client_args is not None: + raise RuntimeError( + 'The `file_client_args` is deprecated, ' + 'please use `backend_args` instead, please refer to' + 'https://github.com/open-mmlab/mmdetection/blob/main/configs/_base_/datasets/coco_detection.py' # noqa: E501 + ) # if ann_file is not specified, # initialize lvis api with the converted dataset - self._lvis_api = LVIS(ann_file) if ann_file else None + if ann_file is not None: + with get_local_path( + ann_file, backend_args=self.backend_args) as local_path: + self._lvis_api = LVIS(local_path) + else: + self._lvis_api = None # handle dataset lazy init self.cat_ids = None diff --git a/mmdet/models/dense_heads/rtmdet_ins_head.py b/mmdet/models/dense_heads/rtmdet_ins_head.py index e355bdb79f8..729a4492f0b 100644 --- a/mmdet/models/dense_heads/rtmdet_ins_head.py +++ b/mmdet/models/dense_heads/rtmdet_ins_head.py @@ -565,7 +565,7 @@ def _mask_predict_by_feat_single(self, mask_feat: Tensor, kernels: Tensor, mask_feat.unsqueeze(0) coord = self.prior_generator.single_level_grid_priors( - (h, w), level_idx=0).reshape(1, -1, 2) + (h, w), level_idx=0, device=mask_feat.device).reshape(1, -1, 2) num_inst = priors.shape[0] points = priors[:, :2].reshape(-1, 1, 2) strides = priors[:, 2:].reshape(-1, 1, 2) diff --git a/mmdet/models/task_modules/coders/bucketing_bbox_coder.py b/mmdet/models/task_modules/coders/bucketing_bbox_coder.py index 56abc372bdb..4044e1cd91d 100644 --- a/mmdet/models/task_modules/coders/bucketing_bbox_coder.py +++ b/mmdet/models/task_modules/coders/bucketing_bbox_coder.py @@ -1,10 +1,14 @@ # Copyright (c) OpenMMLab. All rights reserved. +from typing import Optional, Sequence, Tuple, Union + import numpy as np import torch import torch.nn.functional as F +from torch import Tensor from mmdet.registry import TASK_UTILS -from mmdet.structures.bbox import HorizontalBoxes, bbox_rescale, get_box_tensor +from mmdet.structures.bbox import (BaseBoxes, HorizontalBoxes, bbox_rescale, + get_box_tensor) from .base_bbox_coder import BaseBBoxCoder @@ -32,13 +36,13 @@ class BucketingBBoxCoder(BaseBBoxCoder): """ def __init__(self, - num_buckets, - scale_factor, - offset_topk=2, - offset_upperbound=1.0, - cls_ignore_neighbor=True, - clip_border=True, - **kwargs): + num_buckets: int, + scale_factor: int, + offset_topk: int = 2, + offset_upperbound: float = 1.0, + cls_ignore_neighbor: bool = True, + clip_border: bool = True, + **kwargs) -> None: super().__init__(**kwargs) self.num_buckets = num_buckets self.scale_factor = scale_factor @@ -47,7 +51,8 @@ def __init__(self, self.cls_ignore_neighbor = cls_ignore_neighbor self.clip_border = clip_border - def encode(self, bboxes, gt_bboxes): + def encode(self, bboxes: Union[Tensor, BaseBoxes], + gt_bboxes: Union[Tensor, BaseBoxes]) -> Tuple[Tensor]: """Get bucketing estimation and fine regression targets during training. @@ -71,7 +76,12 @@ def encode(self, bboxes, gt_bboxes): self.cls_ignore_neighbor) return encoded_bboxes - def decode(self, bboxes, pred_bboxes, max_shape=None): + def decode( + self, + bboxes: Union[Tensor, BaseBoxes], + pred_bboxes: Tensor, + max_shape: Optional[Tuple[int]] = None + ) -> Tuple[Union[Tensor, BaseBoxes], Tensor]: """Apply transformation `pred_bboxes` to `boxes`. Args: boxes (torch.Tensor or :obj:`BaseBoxes`): Basic boxes. @@ -97,7 +107,9 @@ def decode(self, bboxes, pred_bboxes, max_shape=None): return bboxes, loc_confidence -def generat_buckets(proposals, num_buckets, scale_factor=1.0): +def generat_buckets(proposals: Tensor, + num_buckets: int, + scale_factor: float = 1.0) -> Tuple[Tensor]: """Generate buckets w.r.t bucket number and scale factor of proposals. Args: @@ -145,13 +157,13 @@ def generat_buckets(proposals, num_buckets, scale_factor=1.0): return bucket_w, bucket_h, l_buckets, r_buckets, t_buckets, d_buckets -def bbox2bucket(proposals, - gt, - num_buckets, - scale_factor, - offset_topk=2, - offset_upperbound=1.0, - cls_ignore_neighbor=True): +def bbox2bucket(proposals: Tensor, + gt: Tensor, + num_buckets: int, + scale_factor: float, + offset_topk: int = 2, + offset_upperbound: float = 1.0, + cls_ignore_neighbor: bool = True) -> Tuple[Tensor]: """Generate buckets estimation and fine regression targets. Args: @@ -268,13 +280,14 @@ def bbox2bucket(proposals, return offsets, offsets_weights, bucket_labels, bucket_cls_weights -def bucket2bbox(proposals, - cls_preds, - offset_preds, - num_buckets, - scale_factor=1.0, - max_shape=None, - clip_border=True): +def bucket2bbox(proposals: Tensor, + cls_preds: Tensor, + offset_preds: Tensor, + num_buckets: int, + scale_factor: float = 1.0, + max_shape: Optional[Union[Sequence[int], Tensor, + Sequence[Sequence[int]]]] = None, + clip_border: bool = True) -> Tuple[Tensor]: """Apply bucketing estimation (cls preds) and fine regression (offset preds) to generate det bboxes. diff --git a/mmdet/models/task_modules/coders/delta_xywh_bbox_coder.py b/mmdet/models/task_modules/coders/delta_xywh_bbox_coder.py index 6bc9a9bdfb8..f65748ac347 100644 --- a/mmdet/models/task_modules/coders/delta_xywh_bbox_coder.py +++ b/mmdet/models/task_modules/coders/delta_xywh_bbox_coder.py @@ -1,11 +1,13 @@ # Copyright (c) OpenMMLab. All rights reserved. import warnings +from typing import Optional, Sequence, Union import numpy as np import torch +from torch import Tensor from mmdet.registry import TASK_UTILS -from mmdet.structures.bbox import HorizontalBoxes, get_box_tensor +from mmdet.structures.bbox import BaseBoxes, HorizontalBoxes, get_box_tensor from .base_bbox_coder import BaseBBoxCoder @@ -32,12 +34,12 @@ class DeltaXYWHBBoxCoder(BaseBBoxCoder): """ def __init__(self, - target_means=(0., 0., 0., 0.), - target_stds=(1., 1., 1., 1.), - clip_border=True, - add_ctr_clamp=False, - ctr_clamp=32, - **kwargs): + target_means: Sequence[float] = (0., 0., 0., 0.), + target_stds: Sequence[float] = (1., 1., 1., 1.), + clip_border: bool = True, + add_ctr_clamp: bool = False, + ctr_clamp: int = 32, + **kwargs) -> None: super().__init__(**kwargs) self.means = target_means self.stds = target_stds @@ -45,7 +47,8 @@ def __init__(self, self.add_ctr_clamp = add_ctr_clamp self.ctr_clamp = ctr_clamp - def encode(self, bboxes, gt_bboxes): + def encode(self, bboxes: Union[Tensor, BaseBoxes], + gt_bboxes: Union[Tensor, BaseBoxes]) -> Tensor: """Get box regression transformation deltas that can be used to transform the ``bboxes`` into the ``gt_bboxes``. @@ -65,11 +68,14 @@ def encode(self, bboxes, gt_bboxes): encoded_bboxes = bbox2delta(bboxes, gt_bboxes, self.means, self.stds) return encoded_bboxes - def decode(self, - bboxes, - pred_bboxes, - max_shape=None, - wh_ratio_clip=16 / 1000): + def decode( + self, + bboxes: Union[Tensor, BaseBoxes], + pred_bboxes: Tensor, + max_shape: Optional[Union[Sequence[int], Tensor, + Sequence[Sequence[int]]]] = None, + wh_ratio_clip: Optional[float] = 16 / 1000 + ) -> Union[Tensor, BaseBoxes]: """Apply transformation `pred_bboxes` to `boxes`. Args: @@ -123,7 +129,12 @@ def decode(self, return decoded_bboxes -def bbox2delta(proposals, gt, means=(0., 0., 0., 0.), stds=(1., 1., 1., 1.)): +def bbox2delta( + proposals: Tensor, + gt: Tensor, + means: Sequence[float] = (0., 0., 0., 0.), + stds: Sequence[float] = (1., 1., 1., 1.) +) -> Tensor: """Compute deltas of proposals w.r.t. gt. We usually compute the deltas of x, y, w, h of proposals w.r.t ground @@ -168,15 +179,16 @@ def bbox2delta(proposals, gt, means=(0., 0., 0., 0.), stds=(1., 1., 1., 1.)): return deltas -def delta2bbox(rois, - deltas, - means=(0., 0., 0., 0.), - stds=(1., 1., 1., 1.), - max_shape=None, - wh_ratio_clip=16 / 1000, - clip_border=True, - add_ctr_clamp=False, - ctr_clamp=32): +def delta2bbox(rois: Tensor, + deltas: Tensor, + means: Sequence[float] = (0., 0., 0., 0.), + stds: Sequence[float] = (1., 1., 1., 1.), + max_shape: Optional[Union[Sequence[int], Tensor, + Sequence[Sequence[int]]]] = None, + wh_ratio_clip: float = 16 / 1000, + clip_border: bool = True, + add_ctr_clamp: bool = False, + ctr_clamp: int = 32) -> Tensor: """Apply deltas to shift/scale base boxes. Typically the rois are anchor or proposed bounding boxes and the deltas are @@ -267,15 +279,16 @@ def delta2bbox(rois, return bboxes -def onnx_delta2bbox(rois, - deltas, - means=(0., 0., 0., 0.), - stds=(1., 1., 1., 1.), - max_shape=None, - wh_ratio_clip=16 / 1000, - clip_border=True, - add_ctr_clamp=False, - ctr_clamp=32): +def onnx_delta2bbox(rois: Tensor, + deltas: Tensor, + means: Sequence[float] = (0., 0., 0., 0.), + stds: Sequence[float] = (1., 1., 1., 1.), + max_shape: Optional[Union[Sequence[int], Tensor, + Sequence[Sequence[int]]]] = None, + wh_ratio_clip: float = 16 / 1000, + clip_border: Optional[bool] = True, + add_ctr_clamp: bool = False, + ctr_clamp: int = 32) -> Tensor: """Apply deltas to shift/scale base boxes. Typically the rois are anchor or proposed bounding boxes and the deltas are diff --git a/mmdet/models/task_modules/coders/distance_point_bbox_coder.py b/mmdet/models/task_modules/coders/distance_point_bbox_coder.py index ff2bb54660c..ab26bf4b96c 100644 --- a/mmdet/models/task_modules/coders/distance_point_bbox_coder.py +++ b/mmdet/models/task_modules/coders/distance_point_bbox_coder.py @@ -1,6 +1,10 @@ # Copyright (c) OpenMMLab. All rights reserved. +from typing import Optional, Sequence, Union + +from torch import Tensor + from mmdet.registry import TASK_UTILS -from mmdet.structures.bbox import (HorizontalBoxes, bbox2distance, +from mmdet.structures.bbox import (BaseBoxes, HorizontalBoxes, bbox2distance, distance2bbox, get_box_tensor) from .base_bbox_coder import BaseBBoxCoder @@ -17,11 +21,15 @@ class DistancePointBBoxCoder(BaseBBoxCoder): border of the image. Defaults to True. """ - def __init__(self, clip_border=True, **kwargs): + def __init__(self, clip_border: Optional[bool] = True, **kwargs) -> None: super().__init__(**kwargs) self.clip_border = clip_border - def encode(self, points, gt_bboxes, max_dis=None, eps=0.1): + def encode(self, + points: Tensor, + gt_bboxes: Union[Tensor, BaseBoxes], + max_dis: Optional[float] = None, + eps: float = 0.1) -> Tensor: """Encode bounding box to distances. Args: @@ -41,7 +49,13 @@ def encode(self, points, gt_bboxes, max_dis=None, eps=0.1): assert gt_bboxes.size(-1) == 4 return bbox2distance(points, gt_bboxes, max_dis, eps) - def decode(self, points, pred_bboxes, max_shape=None): + def decode( + self, + points: Tensor, + pred_bboxes: Tensor, + max_shape: Optional[Union[Sequence[int], Tensor, + Sequence[Sequence[int]]]] = None + ) -> Union[Tensor, BaseBoxes]: """Decode distance prediction to bounding box. Args: diff --git a/mmdet/models/task_modules/coders/legacy_delta_xywh_bbox_coder.py b/mmdet/models/task_modules/coders/legacy_delta_xywh_bbox_coder.py index 154016dd6fd..9eb1bedb3fb 100644 --- a/mmdet/models/task_modules/coders/legacy_delta_xywh_bbox_coder.py +++ b/mmdet/models/task_modules/coders/legacy_delta_xywh_bbox_coder.py @@ -1,9 +1,12 @@ # Copyright (c) OpenMMLab. All rights reserved. +from typing import Optional, Sequence, Union + import numpy as np import torch +from torch import Tensor from mmdet.registry import TASK_UTILS -from mmdet.structures.bbox import HorizontalBoxes, get_box_tensor +from mmdet.structures.bbox import BaseBoxes, HorizontalBoxes, get_box_tensor from .base_bbox_coder import BaseBBoxCoder @@ -32,14 +35,15 @@ class LegacyDeltaXYWHBBoxCoder(BaseBBoxCoder): """ def __init__(self, - target_means=(0., 0., 0., 0.), - target_stds=(1., 1., 1., 1.), - **kwargs): + target_means: Sequence[float] = (0., 0., 0., 0.), + target_stds: Sequence[float] = (1., 1., 1., 1.), + **kwargs) -> None: super().__init__(**kwargs) self.means = target_means self.stds = target_stds - def encode(self, bboxes, gt_bboxes): + def encode(self, bboxes: Union[Tensor, BaseBoxes], + gt_bboxes: Union[Tensor, BaseBoxes]) -> Tensor: """Get box regression transformation deltas that can be used to transform the ``bboxes`` into the ``gt_bboxes``. @@ -60,11 +64,14 @@ def encode(self, bboxes, gt_bboxes): self.stds) return encoded_bboxes - def decode(self, - bboxes, - pred_bboxes, - max_shape=None, - wh_ratio_clip=16 / 1000): + def decode( + self, + bboxes: Union[Tensor, BaseBoxes], + pred_bboxes: Tensor, + max_shape: Optional[Union[Sequence[int], Tensor, + Sequence[Sequence[int]]]] = None, + wh_ratio_clip: Optional[float] = 16 / 1000 + ) -> Union[Tensor, BaseBoxes]: """Apply transformation `pred_bboxes` to `boxes`. Args: @@ -91,10 +98,12 @@ def decode(self, return decoded_bboxes -def legacy_bbox2delta(proposals, - gt, - means=(0., 0., 0., 0.), - stds=(1., 1., 1., 1.)): +def legacy_bbox2delta( + proposals: Tensor, + gt: Tensor, + means: Sequence[float] = (0., 0., 0., 0.), + stds: Sequence[float] = (1., 1., 1., 1.) +) -> Tensor: """Compute deltas of proposals w.r.t. gt in the MMDet V1.x manner. We usually compute the deltas of x, y, w, h of proposals w.r.t ground @@ -139,12 +148,14 @@ def legacy_bbox2delta(proposals, return deltas -def legacy_delta2bbox(rois, - deltas, - means=(0., 0., 0., 0.), - stds=(1., 1., 1., 1.), - max_shape=None, - wh_ratio_clip=16 / 1000): +def legacy_delta2bbox(rois: Tensor, + deltas: Tensor, + means: Sequence[float] = (0., 0., 0., 0.), + stds: Sequence[float] = (1., 1., 1., 1.), + max_shape: Optional[ + Union[Sequence[int], Tensor, + Sequence[Sequence[int]]]] = None, + wh_ratio_clip: float = 16 / 1000) -> Tensor: """Apply deltas to shift/scale base boxes in the MMDet V1.x manner. Typically the rois are anchor or proposed bounding boxes and the deltas are diff --git a/mmdet/models/task_modules/coders/pseudo_bbox_coder.py b/mmdet/models/task_modules/coders/pseudo_bbox_coder.py index 0eeeee484dd..9ee74311f6d 100644 --- a/mmdet/models/task_modules/coders/pseudo_bbox_coder.py +++ b/mmdet/models/task_modules/coders/pseudo_bbox_coder.py @@ -1,6 +1,10 @@ # Copyright (c) OpenMMLab. All rights reserved. +from typing import Union + +from torch import Tensor + from mmdet.registry import TASK_UTILS -from mmdet.structures.bbox import HorizontalBoxes, get_box_tensor +from mmdet.structures.bbox import BaseBoxes, HorizontalBoxes, get_box_tensor from .base_bbox_coder import BaseBBoxCoder @@ -11,12 +15,14 @@ class PseudoBBoxCoder(BaseBBoxCoder): def __init__(self, **kwargs): super().__init__(**kwargs) - def encode(self, bboxes, gt_bboxes): + def encode(self, bboxes: Tensor, gt_bboxes: Union[Tensor, + BaseBoxes]) -> Tensor: """torch.Tensor: return the given ``bboxes``""" gt_bboxes = get_box_tensor(gt_bboxes) return gt_bboxes - def decode(self, bboxes, pred_bboxes): + def decode(self, bboxes: Tensor, pred_bboxes: Union[Tensor, + BaseBoxes]) -> Tensor: """torch.Tensor: return the given ``pred_bboxes``""" if self.use_box_type: pred_bboxes = HorizontalBoxes(pred_bboxes) diff --git a/mmdet/models/task_modules/coders/tblr_bbox_coder.py b/mmdet/models/task_modules/coders/tblr_bbox_coder.py index f4a92ff14e3..74b388f7bad 100644 --- a/mmdet/models/task_modules/coders/tblr_bbox_coder.py +++ b/mmdet/models/task_modules/coders/tblr_bbox_coder.py @@ -1,8 +1,11 @@ # Copyright (c) OpenMMLab. All rights reserved. +from typing import Optional, Sequence, Union + import torch +from torch import Tensor from mmdet.registry import TASK_UTILS -from mmdet.structures.bbox import HorizontalBoxes, get_box_tensor +from mmdet.structures.bbox import BaseBoxes, HorizontalBoxes, get_box_tensor from .base_bbox_coder import BaseBBoxCoder @@ -23,12 +26,16 @@ class TBLRBBoxCoder(BaseBBoxCoder): border of the image. Defaults to True. """ - def __init__(self, normalizer=4.0, clip_border=True, **kwargs): + def __init__(self, + normalizer: Union[Sequence[float], float] = 4.0, + clip_border: bool = True, + **kwargs) -> None: super().__init__(**kwargs) self.normalizer = normalizer self.clip_border = clip_border - def encode(self, bboxes, gt_bboxes): + def encode(self, bboxes: Union[Tensor, BaseBoxes], + gt_bboxes: Union[Tensor, BaseBoxes]) -> Tensor: """Get box regression transformation deltas that can be used to transform the ``bboxes`` into the ``gt_bboxes`` in the (top, left, bottom, right) order. @@ -50,7 +57,13 @@ def encode(self, bboxes, gt_bboxes): bboxes, gt_bboxes, normalizer=self.normalizer) return encoded_bboxes - def decode(self, bboxes, pred_bboxes, max_shape=None): + def decode( + self, + bboxes: Union[Tensor, BaseBoxes], + pred_bboxes: Tensor, + max_shape: Optional[Union[Sequence[int], Tensor, + Sequence[Sequence[int]]]] = None + ) -> Union[Tensor, BaseBoxes]: """Apply transformation `pred_bboxes` to `boxes`. Args: @@ -80,7 +93,10 @@ def decode(self, bboxes, pred_bboxes, max_shape=None): return decoded_bboxes -def bboxes2tblr(priors, gts, normalizer=4.0, normalize_by_wh=True): +def bboxes2tblr(priors: Tensor, + gts: Tensor, + normalizer: Union[Sequence[float], float] = 4.0, + normalize_by_wh: bool = True) -> Tensor: """Encode ground truth boxes to tblr coordinate. It first convert the gt coordinate to tblr format, @@ -126,12 +142,13 @@ def bboxes2tblr(priors, gts, normalizer=4.0, normalize_by_wh=True): return loc / normalizer -def tblr2bboxes(priors, - tblr, - normalizer=4.0, - normalize_by_wh=True, - max_shape=None, - clip_border=True): +def tblr2bboxes(priors: Tensor, + tblr: Tensor, + normalizer: Union[Sequence[float], float] = 4.0, + normalize_by_wh: bool = True, + max_shape: Optional[Union[Sequence[int], Tensor, + Sequence[Sequence[int]]]] = None, + clip_border: bool = True) -> Tensor: """Decode tblr outputs to prediction boxes. The process includes 3 steps: 1) De-normalize tblr coordinates by diff --git a/mmdet/models/task_modules/coders/yolo_bbox_coder.py b/mmdet/models/task_modules/coders/yolo_bbox_coder.py index b903c16dbf2..2e1c766789b 100644 --- a/mmdet/models/task_modules/coders/yolo_bbox_coder.py +++ b/mmdet/models/task_modules/coders/yolo_bbox_coder.py @@ -1,8 +1,11 @@ # Copyright (c) OpenMMLab. All rights reserved. +from typing import Union + import torch +from torch import Tensor from mmdet.registry import TASK_UTILS -from mmdet.structures.bbox import HorizontalBoxes, get_box_tensor +from mmdet.structures.bbox import BaseBoxes, HorizontalBoxes, get_box_tensor from .base_bbox_coder import BaseBBoxCoder @@ -19,11 +22,13 @@ class YOLOBBoxCoder(BaseBBoxCoder): eps (float): Min value of cx, cy when encoding. """ - def __init__(self, eps=1e-6, **kwargs): + def __init__(self, eps: float = 1e-6, **kwargs): super().__init__(**kwargs) self.eps = eps - def encode(self, bboxes, gt_bboxes, stride): + def encode(self, bboxes: Union[Tensor, BaseBoxes], + gt_bboxes: Union[Tensor, BaseBoxes], + stride: Union[Tensor, int]) -> Tensor: """Get box regression transformation deltas that can be used to transform the ``bboxes`` into the ``gt_bboxes``. @@ -59,7 +64,8 @@ def encode(self, bboxes, gt_bboxes, stride): [x_center_target, y_center_target, w_target, h_target], dim=-1) return encoded_bboxes - def decode(self, bboxes, pred_bboxes, stride): + def decode(self, bboxes: Union[Tensor, BaseBoxes], pred_bboxes: Tensor, + stride: Union[Tensor, int]) -> Union[Tensor, BaseBoxes]: """Apply transformation `pred_bboxes` to `boxes`. Args: diff --git a/mmdet/models/test_time_augs/det_tta.py b/mmdet/models/test_time_augs/det_tta.py index 66f0817a9f8..95f91db9e12 100644 --- a/mmdet/models/test_time_augs/det_tta.py +++ b/mmdet/models/test_time_augs/det_tta.py @@ -27,7 +27,7 @@ class DetTTAModel(BaseTTAModel): >>> >>> tta_pipeline = [ >>> dict(type='LoadImageFromFile', - >>> file_client_args=dict(backend='disk')), + >>> backend_args=None), >>> dict( >>> type='TestTimeAug', >>> transforms=[[ diff --git a/mmdet/structures/det_data_sample.py b/mmdet/structures/det_data_sample.py index 71bc404a269..d7b7f354a85 100644 --- a/mmdet/structures/det_data_sample.py +++ b/mmdet/structures/det_data_sample.py @@ -30,8 +30,8 @@ class DetDataSample(BaseDataElement): >>> from mmdet.structures import DetDataSample >>> data_sample = DetDataSample() - >>> img_meta = dict(img_shape=(800, 1196, 3), - ... pad_shape=(800, 1216, 3)) + >>> img_meta = dict(img_shape=(800, 1196), + ... pad_shape=(800, 1216)) >>> gt_instances = InstanceData(metainfo=img_meta) >>> gt_instances.bboxes = torch.rand((5, 4)) >>> gt_instances.labels = torch.rand((5,)) @@ -48,8 +48,8 @@ class DetDataSample(BaseDataElement): gt_instances: =3.11.2 + # reference: https://github.com/shapely/shapely/issues/1345 + initial_settings = np.seterr() + np.seterr(invalid='ignore') for poly_per_obj in self.masks: cropped_poly_per_obj = [] for p in poly_per_obj: - # pycocotools will clip the boundary p = p.copy() - p[0::2] = p[0::2] - bbox[0] - p[1::2] = p[1::2] - bbox[1] - cropped_poly_per_obj.append(p) + p = geometry.Polygon(p.reshape(-1, 2)).buffer(0.0) + # polygon must be valid to perform intersection. + if not p.is_valid: + continue + cropped = p.intersection(crop_box) + if cropped.is_empty: + continue + if isinstance(cropped, + geometry.collection.BaseMultipartGeometry): + cropped = cropped.geoms + else: + cropped = [cropped] + # one polygon may be cropped to multiple ones + for poly in cropped: + # ignore lines or points + if not isinstance( + poly, geometry.Polygon) or not poly.is_valid: + continue + coords = np.asarray(poly.exterior.coords) + # remove an extra identical vertex at the end + coords = coords[:-1] + coords[:, 0] -= x1 + coords[:, 1] -= y1 + cropped_poly_per_obj.append(coords.reshape(-1)) + # a dummy polygon to avoid misalignment between masks and boxes + if len(cropped_poly_per_obj) == 0: + cropped_poly_per_obj = [np.array([0, 0, 0, 0, 0, 0])] cropped_masks.append(cropped_poly_per_obj) + np.seterr(**initial_settings) cropped_masks = PolygonMasks(cropped_masks, h, w) return cropped_masks diff --git a/mmdet/testing/_utils.py b/mmdet/testing/_utils.py index 471a6bd3a7b..ce74376250e 100644 --- a/mmdet/testing/_utils.py +++ b/mmdet/testing/_utils.py @@ -274,7 +274,7 @@ def demo_mm_sampling_results(proposals_list, # TODO: Support full ceph def replace_to_ceph(cfg): - file_client_args = dict( + backend_args = dict( backend='petrel', path_mapping=dict({ './data/': 's3://openmmlab/datasets/detection/', @@ -286,12 +286,12 @@ def _process_pipeline(dataset, name): def replace_img(pipeline): if pipeline['type'] == 'LoadImageFromFile': - pipeline['file_client_args'] = file_client_args + pipeline['backend_args'] = backend_args def replace_ann(pipeline): if pipeline['type'] == 'LoadAnnotations' or pipeline[ 'type'] == 'LoadPanopticAnnotations': - pipeline['file_client_args'] = file_client_args + pipeline['backend_args'] = backend_args if 'pipeline' in dataset: replace_img(dataset.pipeline[0]) @@ -307,7 +307,7 @@ def replace_ann(pipeline): def _process_evaluator(evaluator, name): if evaluator['type'] == 'CocoPanopticMetric': - evaluator['file_client_args'] = file_client_args + evaluator['backend_args'] = backend_args # half ceph _process_pipeline(cfg.train_dataloader.dataset, cfg.filename) diff --git a/mmdet/utils/__init__.py b/mmdet/utils/__init__.py index 12047895936..1a864342563 100644 --- a/mmdet/utils/__init__.py +++ b/mmdet/utils/__init__.py @@ -8,7 +8,8 @@ from .misc import (find_latest_checkpoint, get_test_pipeline_cfg, update_data_root) from .replace_cfg_vals import replace_cfg_vals -from .setup_env import register_all_modules, setup_multi_processes +from .setup_env import (register_all_modules, setup_cache_size_limit_of_dynamo, + setup_multi_processes) from .split_batch import split_batch from .typing_utils import (ConfigType, InstanceList, MultiConfig, OptConfigType, OptInstanceList, OptMultiConfig, @@ -21,5 +22,6 @@ 'AvoidCUDAOOM', 'all_reduce_dict', 'allreduce_grads', 'reduce_mean', 'sync_random_seed', 'ConfigType', 'InstanceList', 'MultiConfig', 'OptConfigType', 'OptInstanceList', 'OptMultiConfig', 'OptPixelList', - 'PixelList', 'RangeType', 'get_test_pipeline_cfg' + 'PixelList', 'RangeType', 'get_test_pipeline_cfg', + 'setup_cache_size_limit_of_dynamo' ] diff --git a/mmdet/utils/benchmark.py b/mmdet/utils/benchmark.py index 18070c05fd2..1714b464740 100644 --- a/mmdet/utils/benchmark.py +++ b/mmdet/utils/benchmark.py @@ -160,7 +160,6 @@ def __init__(self, print_log('before build: ', self.logger) print_process_memory(self._process, self.logger) - self.cfg.model.pretrained = None self.model = self._init_model(checkpoint, is_fuse_conv_bn) # Because multiple processes will occupy additional CPU resources, @@ -213,7 +212,7 @@ def run_once(self) -> dict: start_time = time.perf_counter() with torch.no_grad(): - self.model(data, return_loss=False) + self.model.test_step(data) torch.cuda.synchronize() elapsed = time.perf_counter() - start_time diff --git a/mmdet/utils/setup_env.py b/mmdet/utils/setup_env.py index 0e56218db96..a7b37845a88 100644 --- a/mmdet/utils/setup_env.py +++ b/mmdet/utils/setup_env.py @@ -1,5 +1,6 @@ # Copyright (c) OpenMMLab. All rights reserved. import datetime +import logging import os import platform import warnings @@ -7,6 +8,33 @@ import cv2 import torch.multiprocessing as mp from mmengine import DefaultScope +from mmengine.logging import print_log +from mmengine.utils import digit_version + + +def setup_cache_size_limit_of_dynamo(): + """Setup cache size limit of dynamo. + + Note: Due to the dynamic shape of the loss calculation and + post-processing parts in the object detection algorithm, these + functions must be compiled every time they are run. + Setting a large value for torch._dynamo.config.cache_size_limit + may result in repeated compilation, which can slow down training + and testing speed. Therefore, we need to set the default value of + cache_size_limit smaller. An empirical value is 4. + """ + + import torch + if digit_version(torch.__version__) >= digit_version('2.0.0'): + if 'DYNAMO_CACHE_SIZE_LIMIT' in os.environ: + import torch._dynamo + cache_size_limit = int(os.environ['DYNAMO_CACHE_SIZE_LIMIT']) + torch._dynamo.config.cache_size_limit = cache_size_limit + print_log( + f'torch._dynamo.config.cache_size_limit is force ' + f'set to {cache_size_limit}.', + logger='current', + level=logging.WARNING) def setup_multi_processes(cfg): diff --git a/mmdet/version.py b/mmdet/version.py index 56a7e9d62ce..24951882f40 100644 --- a/mmdet/version.py +++ b/mmdet/version.py @@ -1,6 +1,6 @@ # Copyright (c) OpenMMLab. All rights reserved. -__version__ = '3.0.0rc6' +__version__ = '3.0.0' short_version = __version__ diff --git a/model-index.yml b/model-index.yml index 1e71f450d8d..d810c14e03d 100644 --- a/model-index.yml +++ b/model-index.yml @@ -1,20 +1,26 @@ Import: + - configs/albu_example/metafile.yml - configs/atss/metafile.yml - configs/autoassign/metafile.yml + - configs/boxinst/metafile.yml - configs/carafe/metafile.yml - configs/cascade_rcnn/metafile.yml - configs/cascade_rpn/metafile.yml - configs/centernet/metafile.yml - configs/centripetalnet/metafile.yml - - configs/cornernet/metafile.yml - configs/condinst/metafile.yml + - configs/conditional_detr/metafile.yml + - configs/cornernet/metafile.yml - configs/convnext/metafile.yml + - configs/crowddet/metafile.yml + - configs/dab_detr/metafile.yml - configs/dcn/metafile.yml - configs/dcnv2/metafile.yml - configs/ddod/metafile.yml - configs/deformable_detr/metafile.yml - configs/detectors/metafile.yml - configs/detr/metafile.yml + - configs/dino/metafile.yml - configs/double_heads/metafile.yml - configs/dyhead/metafile.yml - configs/dynamic_rcnn/metafile.yml @@ -40,6 +46,7 @@ Import: - configs/lad/metafile.yml - configs/ld/metafile.yml - configs/libra_rcnn/metafile.yml + - configs/lvis/metafile.yml - configs/mask2former/metafile.yml - configs/mask_rcnn/metafile.yml - configs/maskformer/metafile.yml @@ -54,13 +61,13 @@ Import: - configs/pisa/metafile.yml - configs/point_rend/metafile.yml - configs/queryinst/metafile.yml - - configs/rtmdet/metafile.yml - configs/regnet/metafile.yml - configs/reppoints/metafile.yml - configs/res2net/metafile.yml - configs/resnest/metafile.yml - configs/resnet_strikes_back/metafile.yml - configs/retinanet/metafile.yml + - configs/rpn/metafile.yml - configs/rtmdet/metafile.yml - configs/sabl/metafile.yml - configs/scnet/metafile.yml @@ -71,6 +78,7 @@ Import: - configs/solo/metafile.yml - configs/solov2/metafile.yml - configs/ssd/metafile.yml + - configs/strong_baselines/metafile.yml - configs/swin/metafile.yml - configs/tridentnet/metafile.yml - configs/tood/metafile.yml diff --git a/projects/ConvNeXt-V2/configs/mask-rcnn_convnext-v2-b_fpn_lsj-3x-fcmae_coco.py b/projects/ConvNeXt-V2/configs/mask-rcnn_convnext-v2-b_fpn_lsj-3x-fcmae_coco.py index f5815d8ecdf..95b960df92f 100644 --- a/projects/ConvNeXt-V2/configs/mask-rcnn_convnext-v2-b_fpn_lsj-3x-fcmae_coco.py +++ b/projects/ConvNeXt-V2/configs/mask-rcnn_convnext-v2-b_fpn_lsj-3x-fcmae_coco.py @@ -31,7 +31,7 @@ rcnn=dict(nms=dict(type='soft_nms')))) train_pipeline = [ - dict(type='LoadImageFromFile', file_client_args=_base_.file_client_args), + dict(type='LoadImageFromFile', backend_args=_base_.backend_args), dict(type='LoadAnnotations', with_bbox=True, with_mask=True), dict( type='RandomResize', diff --git a/projects/Detic/README.md b/projects/Detic/README.md index 4e99779342d..871b426e895 100644 --- a/projects/Detic/README.md +++ b/projects/Detic/README.md @@ -145,10 +145,10 @@ A project does not necessarily have to be finished in a single PR, but it's esse - [ ] Metafile.yml - + - [ ] Move your modules into the core package following the codebase's file hierarchy structure. - + - [ ] Refactor your modules into the core package following the codebase's file hierarchy structure. diff --git a/projects/Detic/configs/detic_centernet2_swin-b_fpn_4x_lvis-coco-in21k.py b/projects/Detic/configs/detic_centernet2_swin-b_fpn_4x_lvis-coco-in21k.py index 19a17aea7bc..d554c40ec20 100644 --- a/projects/Detic/configs/detic_centernet2_swin-b_fpn_4x_lvis-coco-in21k.py +++ b/projects/Detic/configs/detic_centernet2_swin-b_fpn_4x_lvis-coco-in21k.py @@ -252,7 +252,7 @@ test_pipeline = [ dict( type='LoadImageFromFile', - file_client_args=_base_.file_client_args, + backend_args=_base_.backend_args, imdecode_backend=backend), dict(type='Resize', scale=(1333, 800), keep_ratio=True, backend=backend), dict( diff --git a/projects/DiffusionDet/README.md b/projects/DiffusionDet/README.md index c96f3c82943..5542d9a59a0 100644 --- a/projects/DiffusionDet/README.md +++ b/projects/DiffusionDet/README.md @@ -1,6 +1,6 @@ ## Description -This is an implementation of [DiffusionDet](https://github.com/ShoufaChen/DiffusionDet) based on [MMDetection](https://github.com/open-mmlab/mmdetection/tree/3.x), [MMCV](https://github.com/open-mmlab/mmcv), and [MMEngine](https://github.com/open-mmlab/mmengine). +This is an implementation of [DiffusionDet](https://github.com/ShoufaChen/DiffusionDet) based on [MMDetection](https://github.com/open-mmlab/mmdetection/tree/main), [MMCV](https://github.com/open-mmlab/mmcv), and [MMEngine](https://github.com/open-mmlab/mmengine).
@@ -163,10 +163,10 @@ A project does not necessarily have to be finished in a single PR, but it's esse - [ ] Metafile.yml - + - [ ] Move your modules into the core package following the codebase's file hierarchy structure. - + - [ ] Refactor your modules into the core package following the codebase's file hierarchy structure. diff --git a/projects/DiffusionDet/configs/diffusiondet_r50_fpn_500-proposals_1-step_crop-ms-480-800-450k_coco.py b/projects/DiffusionDet/configs/diffusiondet_r50_fpn_500-proposals_1-step_crop-ms-480-800-450k_coco.py index 310cdc4cf2b..187cdc39734 100644 --- a/projects/DiffusionDet/configs/diffusiondet_r50_fpn_500-proposals_1-step_crop-ms-480-800-450k_coco.py +++ b/projects/DiffusionDet/configs/diffusiondet_r50_fpn_500-proposals_1-step_crop-ms-480-800-450k_coco.py @@ -95,7 +95,7 @@ train_pipeline = [ dict( type='LoadImageFromFile', - file_client_args=_base_.file_client_args, + backend_args=_base_.backend_args, imdecode_backend=backend), dict(type='LoadAnnotations', with_bbox=True), dict(type='RandomFlip', prob=0.5), @@ -136,7 +136,7 @@ test_pipeline = [ dict( type='LoadImageFromFile', - file_client_args=_base_.file_client_args, + backend_args=_base_.backend_args, imdecode_backend=backend), dict(type='Resize', scale=(1333, 800), keep_ratio=True, backend=backend), # If you don't have a gt annotation, delete the pipeline diff --git a/projects/EfficientDet/README.md b/projects/EfficientDet/README.md index 7bc073f0df5..36f4ed403a3 100644 --- a/projects/EfficientDet/README.md +++ b/projects/EfficientDet/README.md @@ -6,7 +6,7 @@ ## Abstract -This is an implementation of [EfficientDet](https://github.com/google/automl) based on [MMDetection](https://github.com/open-mmlab/mmdetection/tree/3.x), [MMCV](https://github.com/open-mmlab/mmcv), and [MMEngine](https://github.com/open-mmlab/mmengine). +This is an implementation of [EfficientDet](https://github.com/google/automl) based on [MMDetection](https://github.com/open-mmlab/mmdetection/tree/main), [MMCV](https://github.com/open-mmlab/mmcv), and [MMEngine](https://github.com/open-mmlab/mmengine).
EfficientDet a new family of object detectors, which consistently achieve much better efficiency than prior art across a wide spectrum of resource constraints. @@ -22,6 +22,10 @@ In contrast to other feature pyramid network, such as FPN, FPN + PAN, NAS-FPN, B ## Usage +## Official TensorFlow Model + +This project also supports [official tensorflow model](https://github.com/google/automl), it uses 90 categories and yxyx box encoding in training. If you want to use the original model weight to get official results, please refer to the following steps. + ### Model conversion Firstly, download EfficientDet [weights](https://github.com/google/automl/tree/master/efficientdet) and unzip, please use the following command @@ -47,20 +51,40 @@ python projects/EfficientDet/convert_tf_to_pt.py --backbone {BACKBONE_NAME} --te In MMDetection's root directory, run the following command to test the model: ```bash -python tools/test.py projects/EfficientDet/configs/efficientdet_effb0_bifpn_8xb16-crop512-300e_coco.py ${CHECKPOINT_PATH} +python tools/test.py projects/EfficientDet/configs/tensorflow/efficientdet_effb0_bifpn_8xb16-crop512-300e_coco_tf.py ${CHECKPOINT_PATH} +``` + +## Reproduce Model + +For convenience, we recommend the current implementation version, it uses 80 categories and xyxy encoding in training. On this basis, a higher result was finally achieved. + +### Training commands + +In MMDetection's root directory, run the following command to train the model: + +```bash +python tools/train.py projects/EfficientDet/configs/efficientdet_effb3_bifpn_8xb16-crop896-300e_coco.py +``` + +### Testing commands + +In MMDetection's root directory, run the following command to test the model: + +```bash +python tools/test.py projects/EfficientDet/configs/efficientdet_effb3_bifpn_8xb16-crop896-300e_coco.py ${CHECKPOINT_PATH} ``` ## Results -Based on mmdetection, this project aligns the test accuracy of the [official model](https://github.com/google/automl). -
-If you want to reproduce the test results, you need to convert model weights first, then run the test command. -
-The training accuracy will also be aligned with the official in the future +Based on mmdetection, this project aligns the accuracy of the [official model](https://github.com/google/automl). + +| Method | Backbone | Pretrained Model | Training set | Test set | Epoch | Val Box AP | Official AP | Download | +| :------------------------------------------------------------------------------------------------------------------: | :-------------: | :--------------: | :------------: | :----------: | :---: | :--------: | :---------: | :----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | +| [efficientdet-d0\*](projects/EfficientDet/configs/tensorflow/efficientdet_effb0_bifpn_8xb16-crop512-300e_coco_tf.py) | efficientnet-b0 | ImageNet | COCO2017 Train | COCO2017 Val | 300 | 34.4 | 34.3 | | +| [efficientdet-d3](projects/EfficientDet/configs/efficientdet_effb3_bifpn_8xb16-crop896-300e_coco.py) | efficientnet-b3 | ImageNet | COCO2017 Train | COCO2017 Val | 300 | 47.2 | 46.8 | [model](https://download.openmmlab.com/mmdetection/v3.0/efficientdet/efficientdet_effb3_bifpn_8xb16-crop896-300e_coco/efficientdet_effb3_bifpn_8xb16-crop896-300e_coco_20230223_122457-e6f7a833.pth) \| [log](https://download.openmmlab.com/mmdetection/v3.0/efficientdet/efficientdet_effb3_bifpn_8xb16-crop896-300e_coco/efficientdet_effb3_bifpn_8xb16-crop896-300e_coco_20230223_122457.log.json) | -| Method | Backbone | Pretrained Model | Training set | Test set | Epoch | Val Box AP | Official AP | -| :------------------------------------------------------------------------------: | :-------------: | :--------------: | :------------: | :----------: | :---: | :--------: | :---------: | -| [efficientdet-d0](./configs/efficientdet_effb0_bifpn_8xb16-crop512-300e_coco.py) | efficientnet-b0 | ImageNet | COCO2017 Train | COCO2017 Val | 300 | 34.4 | 34.3 | +**Note**: +\*means use [official tensorflow model](https://github.com/google/automl) weights to test. ## Citation @@ -99,9 +123,9 @@ A project does not necessarily have to be finished in a single PR, but it's esse -- [ ] Milestone 2: Indicates a successful model implementation. +- [x] Milestone 2: Indicates a successful model implementation. - - [ ] Training-time correctness + - [x] Training-time correctness @@ -121,10 +145,10 @@ A project does not necessarily have to be finished in a single PR, but it's esse - [ ] Metafile.yml - + - [ ] Move your modules into the core package following the codebase's file hierarchy structure. - + - [ ] Refactor your modules into the core package following the codebase's file hierarchy structure. diff --git a/projects/EfficientDet/configs/efficientdet_effb0_bifpn_8xb16-crop512-300e_coco.py b/projects/EfficientDet/configs/efficientdet_effb0_bifpn_8xb16-crop512-300e_coco.py new file mode 100644 index 00000000000..c7a3b309237 --- /dev/null +++ b/projects/EfficientDet/configs/efficientdet_effb0_bifpn_8xb16-crop512-300e_coco.py @@ -0,0 +1,171 @@ +_base_ = [ + 'mmdet::_base_/datasets/coco_detection.py', + 'mmdet::_base_/schedules/schedule_1x.py', + 'mmdet::_base_/default_runtime.py' +] +custom_imports = dict( + imports=['projects.EfficientDet.efficientdet'], allow_failed_imports=False) + +image_size = 512 +batch_augments = [ + dict(type='BatchFixedSizePad', size=(image_size, image_size)) +] +dataset_type = 'CocoDataset' +evalute_type = 'CocoMetric' +norm_cfg = dict(type='SyncBN', requires_grad=True, eps=1e-3, momentum=0.01) +checkpoint = 'https://download.openmmlab.com/mmclassification/v0/efficientnet/efficientnet-b0_3rdparty_8xb32-aa-advprop_in1k_20220119-26434485.pth' # noqa +model = dict( + type='EfficientDet', + data_preprocessor=dict( + type='DetDataPreprocessor', + mean=[123.675, 116.28, 103.53], + std=[58.395, 57.12, 57.375], + bgr_to_rgb=True, + pad_size_divisor=image_size, + batch_augments=batch_augments), + backbone=dict( + type='EfficientNet', + arch='b0', + drop_path_rate=0.2, + out_indices=(3, 4, 5), + frozen_stages=0, + conv_cfg=dict(type='Conv2dSamePadding'), + norm_cfg=norm_cfg, + norm_eval=False, + init_cfg=dict( + type='Pretrained', prefix='backbone', checkpoint=checkpoint)), + neck=dict( + type='BiFPN', + num_stages=3, + in_channels=[40, 112, 320], + out_channels=64, + start_level=0, + norm_cfg=norm_cfg), + bbox_head=dict( + type='EfficientDetSepBNHead', + num_classes=80, + num_ins=5, + in_channels=64, + feat_channels=64, + stacked_convs=3, + norm_cfg=norm_cfg, + anchor_generator=dict( + type='AnchorGenerator', + octave_base_scale=4, + scales_per_octave=3, + ratios=[1.0, 0.5, 2.0], + strides=[8, 16, 32, 64, 128], + center_offset=0.5), + bbox_coder=dict( + type='DeltaXYWHBBoxCoder', + target_means=[.0, .0, .0, .0], + target_stds=[1.0, 1.0, 1.0, 1.0]), + loss_cls=dict( + type='FocalLoss', + use_sigmoid=True, + gamma=1.5, + alpha=0.25, + loss_weight=1.0), + loss_bbox=dict(type='HuberLoss', beta=0.1, loss_weight=50)), + # training and testing settings + train_cfg=dict( + assigner=dict( + type='MaxIoUAssigner', + pos_iou_thr=0.5, + neg_iou_thr=0.5, + min_pos_iou=0, + ignore_iof_thr=-1), + sampler=dict( + type='PseudoSampler'), # Focal loss should use PseudoSampler + allowed_border=-1, + pos_weight=-1, + debug=False), + test_cfg=dict( + nms_pre=1000, + min_bbox_size=0, + score_thr=0.05, + nms=dict( + type='soft_nms', + iou_threshold=0.3, + sigma=0.5, + min_score=1e-3, + method='gaussian'), + max_per_img=100)) + +# dataset settings +train_pipeline = [ + dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), + dict(type='LoadAnnotations', with_bbox=True), + dict( + type='RandomResize', + scale=(image_size, image_size), + ratio_range=(0.1, 2.0), + keep_ratio=True), + dict(type='RandomCrop', crop_size=(image_size, image_size)), + dict(type='RandomFlip', prob=0.5), + dict(type='PackDetInputs') +] +test_pipeline = [ + dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), + dict(type='Resize', scale=(image_size, image_size), keep_ratio=True), + dict(type='LoadAnnotations', with_bbox=True), + dict( + type='PackDetInputs', + meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape', + 'scale_factor')) +] + +train_dataloader = dict( + batch_size=16, + num_workers=8, + dataset=dict(type=dataset_type, pipeline=train_pipeline)) +val_dataloader = dict(dataset=dict(type=dataset_type, pipeline=test_pipeline)) +test_dataloader = val_dataloader + +val_evaluator = dict(type=evalute_type) +test_evaluator = val_evaluator + +optim_wrapper = dict( + optimizer=dict(lr=0.16, weight_decay=4e-5), + paramwise_cfg=dict( + norm_decay_mult=0, bias_decay_mult=0, bypass_duplicate=True), + clip_grad=dict(max_norm=10, norm_type=2)) + +# learning policy +max_epochs = 300 +param_scheduler = [ + dict(type='LinearLR', start_factor=0.1, by_epoch=False, begin=0, end=917), + dict( + type='CosineAnnealingLR', + eta_min=0.0, + begin=1, + T_max=299, + end=300, + by_epoch=True, + convert_to_iter_based=True) +] +train_cfg = dict(max_epochs=max_epochs, val_interval=1) + +vis_backends = [ + dict(type='LocalVisBackend'), + dict(type='TensorboardVisBackend') +] +visualizer = dict( + type='DetLocalVisualizer', vis_backends=vis_backends, name='visualizer') + +default_hooks = dict(checkpoint=dict(type='CheckpointHook', interval=15)) +custom_hooks = [ + dict( + type='EMAHook', + ema_type='ExpMomentumEMA', + momentum=0.0002, + update_buffers=True, + priority=49) +] +# cudnn_benchmark=True can accelerate fix-size training +env_cfg = dict(cudnn_benchmark=True) + +# NOTE: `auto_scale_lr` is for automatically scaling LR, +# USER SHOULD NOT CHANGE ITS VALUES. +# base_batch_size = (8 GPUs) x (16 samples per GPU) +auto_scale_lr = dict(base_batch_size=128) diff --git a/projects/EfficientDet/configs/efficientdet_effb3_bifpn_8xb16-crop896-300e_coco-90cls.py b/projects/EfficientDet/configs/efficientdet_effb3_bifpn_8xb16-crop896-300e_coco-90cls.py new file mode 100644 index 00000000000..fe82a5e1b94 --- /dev/null +++ b/projects/EfficientDet/configs/efficientdet_effb3_bifpn_8xb16-crop896-300e_coco-90cls.py @@ -0,0 +1,171 @@ +_base_ = [ + 'mmdet::_base_/datasets/coco_detection.py', + 'mmdet::_base_/schedules/schedule_1x.py', + 'mmdet::_base_/default_runtime.py' +] +custom_imports = dict( + imports=['projects.EfficientDet.efficientdet'], allow_failed_imports=False) + +image_size = 896 +batch_augments = [ + dict(type='BatchFixedSizePad', size=(image_size, image_size)) +] +dataset_type = 'Coco90Dataset' +evalute_type = 'Coco90Metric' +norm_cfg = dict(type='SyncBN', requires_grad=True, eps=1e-3, momentum=0.01) +checkpoint = 'https://download.openmmlab.com/mmclassification/v0/efficientnet/efficientnet-b3_3rdparty_8xb32-aa-advprop_in1k_20220119-53b41118.pth' # noqa +model = dict( + type='EfficientDet', + data_preprocessor=dict( + type='DetDataPreprocessor', + mean=[123.675, 116.28, 103.53], + std=[58.395, 57.12, 57.375], + bgr_to_rgb=True, + pad_size_divisor=image_size, + batch_augments=batch_augments), + backbone=dict( + type='EfficientNet', + arch='b3', + drop_path_rate=0.3, + out_indices=(3, 4, 5), + frozen_stages=0, + conv_cfg=dict(type='Conv2dSamePadding'), + norm_cfg=norm_cfg, + norm_eval=False, + init_cfg=dict( + type='Pretrained', prefix='backbone', checkpoint=checkpoint)), + neck=dict( + type='BiFPN', + num_stages=6, + in_channels=[48, 136, 384], + out_channels=160, + start_level=0, + norm_cfg=norm_cfg), + bbox_head=dict( + type='EfficientDetSepBNHead', + num_classes=90, + num_ins=5, + in_channels=160, + feat_channels=160, + stacked_convs=4, + norm_cfg=norm_cfg, + anchor_generator=dict( + type='AnchorGenerator', + octave_base_scale=4, + scales_per_octave=3, + ratios=[1.0, 0.5, 2.0], + strides=[8, 16, 32, 64, 128], + center_offset=0.5), + bbox_coder=dict( + type='DeltaXYWHBBoxCoder', + target_means=[.0, .0, .0, .0], + target_stds=[1.0, 1.0, 1.0, 1.0]), + loss_cls=dict( + type='FocalLoss', + use_sigmoid=True, + gamma=1.5, + alpha=0.25, + loss_weight=1.0), + loss_bbox=dict(type='HuberLoss', beta=0.1, loss_weight=50)), + # training and testing settings + train_cfg=dict( + assigner=dict( + type='MaxIoUAssigner', + pos_iou_thr=0.5, + neg_iou_thr=0.5, + min_pos_iou=0, + ignore_iof_thr=-1), + sampler=dict( + type='PseudoSampler'), # Focal loss should use PseudoSampler + allowed_border=-1, + pos_weight=-1, + debug=False), + test_cfg=dict( + nms_pre=1000, + min_bbox_size=0, + score_thr=0.05, + nms=dict( + type='soft_nms', + iou_threshold=0.3, + sigma=0.5, + min_score=1e-3, + method='gaussian'), + max_per_img=100)) + +# dataset settings +train_pipeline = [ + dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), + dict(type='LoadAnnotations', with_bbox=True), + dict( + type='RandomResize', + scale=(image_size, image_size), + ratio_range=(0.1, 2.0), + keep_ratio=True), + dict(type='RandomCrop', crop_size=(image_size, image_size)), + dict(type='RandomFlip', prob=0.5), + dict(type='PackDetInputs') +] +test_pipeline = [ + dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), + dict(type='Resize', scale=(image_size, image_size), keep_ratio=True), + dict(type='LoadAnnotations', with_bbox=True), + dict( + type='PackDetInputs', + meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape', + 'scale_factor')) +] + +train_dataloader = dict( + batch_size=16, + num_workers=8, + dataset=dict(type=dataset_type, pipeline=train_pipeline)) +val_dataloader = dict(dataset=dict(type=dataset_type, pipeline=test_pipeline)) +test_dataloader = val_dataloader + +val_evaluator = dict(type=evalute_type) +test_evaluator = val_evaluator + +optim_wrapper = dict( + optimizer=dict(lr=0.16, weight_decay=4e-5), + paramwise_cfg=dict( + norm_decay_mult=0, bias_decay_mult=0, bypass_duplicate=True), + clip_grad=dict(max_norm=10, norm_type=2)) + +# learning policy +max_epochs = 300 +param_scheduler = [ + dict(type='LinearLR', start_factor=0.1, by_epoch=False, begin=0, end=917), + dict( + type='CosineAnnealingLR', + eta_min=0.0, + begin=1, + T_max=299, + end=300, + by_epoch=True, + convert_to_iter_based=True) +] +train_cfg = dict(max_epochs=max_epochs, val_interval=1) + +vis_backends = [ + dict(type='LocalVisBackend'), + dict(type='TensorboardVisBackend') +] +visualizer = dict( + type='DetLocalVisualizer', vis_backends=vis_backends, name='visualizer') + +default_hooks = dict(checkpoint=dict(type='CheckpointHook', interval=15)) +custom_hooks = [ + dict( + type='EMAHook', + ema_type='ExpMomentumEMA', + momentum=0.0002, + update_buffers=True, + priority=49) +] +# cudnn_benchmark=True can accelerate fix-size training +env_cfg = dict(cudnn_benchmark=True) + +# NOTE: `auto_scale_lr` is for automatically scaling LR, +# USER SHOULD NOT CHANGE ITS VALUES. +# base_batch_size = (8 GPUs) x (16 samples per GPU) +auto_scale_lr = dict(base_batch_size=128) diff --git a/projects/EfficientDet/configs/efficientdet_effb3_bifpn_8xb16-crop896-300e_coco.py b/projects/EfficientDet/configs/efficientdet_effb3_bifpn_8xb16-crop896-300e_coco.py new file mode 100644 index 00000000000..2079e2ac65a --- /dev/null +++ b/projects/EfficientDet/configs/efficientdet_effb3_bifpn_8xb16-crop896-300e_coco.py @@ -0,0 +1,171 @@ +_base_ = [ + 'mmdet::_base_/datasets/coco_detection.py', + 'mmdet::_base_/schedules/schedule_1x.py', + 'mmdet::_base_/default_runtime.py' +] +custom_imports = dict( + imports=['projects.EfficientDet.efficientdet'], allow_failed_imports=False) + +image_size = 896 +batch_augments = [ + dict(type='BatchFixedSizePad', size=(image_size, image_size)) +] +dataset_type = 'CocoDataset' +evalute_type = 'CocoMetric' +norm_cfg = dict(type='SyncBN', requires_grad=True, eps=1e-3, momentum=0.01) +checkpoint = 'https://download.openmmlab.com/mmclassification/v0/efficientnet/efficientnet-b3_3rdparty_8xb32-aa-advprop_in1k_20220119-53b41118.pth' # noqa +model = dict( + type='EfficientDet', + data_preprocessor=dict( + type='DetDataPreprocessor', + mean=[123.675, 116.28, 103.53], + std=[58.395, 57.12, 57.375], + bgr_to_rgb=True, + pad_size_divisor=image_size, + batch_augments=batch_augments), + backbone=dict( + type='EfficientNet', + arch='b3', + drop_path_rate=0.3, + out_indices=(3, 4, 5), + frozen_stages=0, + conv_cfg=dict(type='Conv2dSamePadding'), + norm_cfg=norm_cfg, + norm_eval=False, + init_cfg=dict( + type='Pretrained', prefix='backbone', checkpoint=checkpoint)), + neck=dict( + type='BiFPN', + num_stages=6, + in_channels=[48, 136, 384], + out_channels=160, + start_level=0, + norm_cfg=norm_cfg), + bbox_head=dict( + type='EfficientDetSepBNHead', + num_classes=80, + num_ins=5, + in_channels=160, + feat_channels=160, + stacked_convs=4, + norm_cfg=norm_cfg, + anchor_generator=dict( + type='AnchorGenerator', + octave_base_scale=4, + scales_per_octave=3, + ratios=[1.0, 0.5, 2.0], + strides=[8, 16, 32, 64, 128], + center_offset=0.5), + bbox_coder=dict( + type='DeltaXYWHBBoxCoder', + target_means=[.0, .0, .0, .0], + target_stds=[1.0, 1.0, 1.0, 1.0]), + loss_cls=dict( + type='FocalLoss', + use_sigmoid=True, + gamma=1.5, + alpha=0.25, + loss_weight=1.0), + loss_bbox=dict(type='HuberLoss', beta=0.1, loss_weight=50)), + # training and testing settings + train_cfg=dict( + assigner=dict( + type='MaxIoUAssigner', + pos_iou_thr=0.5, + neg_iou_thr=0.5, + min_pos_iou=0, + ignore_iof_thr=-1), + sampler=dict( + type='PseudoSampler'), # Focal loss should use PseudoSampler + allowed_border=-1, + pos_weight=-1, + debug=False), + test_cfg=dict( + nms_pre=1000, + min_bbox_size=0, + score_thr=0.05, + nms=dict( + type='soft_nms', + iou_threshold=0.3, + sigma=0.5, + min_score=1e-3, + method='gaussian'), + max_per_img=100)) + +# dataset settings +train_pipeline = [ + dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), + dict(type='LoadAnnotations', with_bbox=True), + dict( + type='RandomResize', + scale=(image_size, image_size), + ratio_range=(0.1, 2.0), + keep_ratio=True), + dict(type='RandomCrop', crop_size=(image_size, image_size)), + dict(type='RandomFlip', prob=0.5), + dict(type='PackDetInputs') +] +test_pipeline = [ + dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), + dict(type='Resize', scale=(image_size, image_size), keep_ratio=True), + dict(type='LoadAnnotations', with_bbox=True), + dict( + type='PackDetInputs', + meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape', + 'scale_factor')) +] + +train_dataloader = dict( + batch_size=16, + num_workers=8, + dataset=dict(type=dataset_type, pipeline=train_pipeline)) +val_dataloader = dict(dataset=dict(type=dataset_type, pipeline=test_pipeline)) +test_dataloader = val_dataloader + +val_evaluator = dict(type=evalute_type) +test_evaluator = val_evaluator + +optim_wrapper = dict( + optimizer=dict(lr=0.16, weight_decay=4e-5), + paramwise_cfg=dict( + norm_decay_mult=0, bias_decay_mult=0, bypass_duplicate=True), + clip_grad=dict(max_norm=10, norm_type=2)) + +# learning policy +max_epochs = 300 +param_scheduler = [ + dict(type='LinearLR', start_factor=0.1, by_epoch=False, begin=0, end=917), + dict( + type='CosineAnnealingLR', + eta_min=0.0, + begin=1, + T_max=299, + end=300, + by_epoch=True, + convert_to_iter_based=True) +] +train_cfg = dict(max_epochs=max_epochs, val_interval=1) + +vis_backends = [ + dict(type='LocalVisBackend'), + dict(type='TensorboardVisBackend') +] +visualizer = dict( + type='DetLocalVisualizer', vis_backends=vis_backends, name='visualizer') + +default_hooks = dict(checkpoint=dict(type='CheckpointHook', interval=15)) +custom_hooks = [ + dict( + type='EMAHook', + ema_type='ExpMomentumEMA', + momentum=0.0002, + update_buffers=True, + priority=49) +] +# cudnn_benchmark=True can accelerate fix-size training +env_cfg = dict(cudnn_benchmark=True) + +# NOTE: `auto_scale_lr` is for automatically scaling LR, +# USER SHOULD NOT CHANGE ITS VALUES. +# base_batch_size = (8 GPUs) x (16 samples per GPU) +auto_scale_lr = dict(base_batch_size=128) diff --git a/projects/EfficientDet/configs/efficientdet_effb0_bifpn_16xb8-crop512-300e_coco.py b/projects/EfficientDet/configs/tensorflow/efficientdet_effb0_bifpn_8xb16-crop512-300e_coco_tf.py similarity index 85% rename from projects/EfficientDet/configs/efficientdet_effb0_bifpn_16xb8-crop512-300e_coco.py rename to projects/EfficientDet/configs/tensorflow/efficientdet_effb0_bifpn_8xb16-crop512-300e_coco_tf.py index 080b7963b95..bf3d3fc1799 100644 --- a/projects/EfficientDet/configs/efficientdet_effb0_bifpn_16xb8-crop512-300e_coco.py +++ b/projects/EfficientDet/configs/tensorflow/efficientdet_effb0_bifpn_8xb16-crop512-300e_coco_tf.py @@ -7,11 +7,11 @@ imports=['projects.EfficientDet.efficientdet'], allow_failed_imports=False) image_size = 512 -dataset_type = 'Coco90Dataset' -evalute_type = 'Coco90Metric' batch_augments = [ dict(type='BatchFixedSizePad', size=(image_size, image_size)) ] +dataset_type = 'Coco90Dataset' +evalute_type = 'Coco90Metric' norm_cfg = dict(type='SyncBN', requires_grad=True, eps=1e-3, momentum=0.01) checkpoint = 'https://download.openmmlab.com/mmclassification/v0/efficientnet/efficientnet-b0_3rdparty_8xb32-aa-advprop_in1k_20220119-26434485.pth' # noqa model = dict( @@ -29,6 +29,7 @@ drop_path_rate=0.2, out_indices=(3, 4, 5), frozen_stages=0, + conv_cfg=dict(type='Conv2dSamePadding'), norm_cfg=norm_cfg, norm_eval=False, init_cfg=dict( @@ -62,10 +63,10 @@ loss_cls=dict( type='FocalLoss', use_sigmoid=True, - gamma=2.0, + gamma=1.5, alpha=0.25, loss_weight=1.0), - loss_bbox=dict(type='SmoothL1Loss', beta=0.11, loss_weight=1.0)), + loss_bbox=dict(type='HuberLoss', beta=0.1, loss_weight=50)), # training and testing settings train_cfg=dict( assigner=dict( @@ -93,9 +94,7 @@ # dataset settings train_pipeline = [ - dict( - type='LoadImageFromFile', - file_client_args={{_base_.file_client_args}}), + dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), dict(type='LoadAnnotations', with_bbox=True), dict( type='RandomResize', @@ -107,9 +106,7 @@ dict(type='PackDetInputs') ] test_pipeline = [ - dict( - type='LoadImageFromFile', - file_client_args={{_base_.file_client_args}}), + dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), dict(type='Resize', scale=(image_size, image_size), keep_ratio=True), dict(type='LoadAnnotations', with_bbox=True), dict( @@ -120,7 +117,7 @@ train_dataloader = dict( batch_size=16, - num_workers=16, + num_workers=8, dataset=dict(type=dataset_type, pipeline=train_pipeline)) val_dataloader = dict(dataset=dict(type=dataset_type, pipeline=test_pipeline)) test_dataloader = val_dataloader @@ -129,8 +126,10 @@ test_evaluator = val_evaluator optim_wrapper = dict( - optimizer=dict(lr=0.16), - paramwise_cfg=dict(norm_decay_mult=0, bypass_duplicate=True)) + optimizer=dict(lr=0.16, weight_decay=4e-5), + paramwise_cfg=dict( + norm_decay_mult=0, bias_decay_mult=0, bypass_duplicate=True), + clip_grad=dict(max_norm=10, norm_type=2)) # learning policy max_epochs = 300 @@ -138,10 +137,10 @@ dict(type='LinearLR', start_factor=0.1, by_epoch=False, begin=0, end=917), dict( type='CosineAnnealingLR', - eta_min=0.0016, + eta_min=0.0, begin=1, - T_max=284, - end=285, + T_max=299, + end=300, by_epoch=True, convert_to_iter_based=True) ] @@ -155,10 +154,18 @@ type='DetLocalVisualizer', vis_backends=vis_backends, name='visualizer') default_hooks = dict(checkpoint=dict(type='CheckpointHook', interval=15)) +custom_hooks = [ + dict( + type='EMAHook', + ema_type='ExpMomentumEMA', + momentum=0.0002, + update_buffers=True, + priority=49) +] # cudnn_benchmark=True can accelerate fix-size training env_cfg = dict(cudnn_benchmark=True) # NOTE: `auto_scale_lr` is for automatically scaling LR, # USER SHOULD NOT CHANGE ITS VALUES. -# base_batch_size = (8 GPUs) x (32 samples per GPU) +# base_batch_size = (8 GPUs) x (16 samples per GPU) auto_scale_lr = dict(base_batch_size=128) diff --git a/projects/EfficientDet/convert_tf_to_pt.py b/projects/EfficientDet/convert_tf_to_pt.py index 6132a6ba241..f3b127f2aaf 100644 --- a/projects/EfficientDet/convert_tf_to_pt.py +++ b/projects/EfficientDet/convert_tf_to_pt.py @@ -164,8 +164,8 @@ def convert_key(model_name, bifpn_repeats, weights): elif seg[0] == 'resample_p6': prefix = 'neck.bifpn.0.p5_to_p6.0' mapping = { - 'conv2d/kernel': 'down_conv.conv.weight', - 'conv2d/bias': 'down_conv.conv.bias', + 'conv2d/kernel': 'down_conv.weight', + 'conv2d/bias': 'down_conv.bias', 'bn/beta': 'bn.bias', 'bn/gamma': 'bn.weight', 'bn/moving_mean': 'bn.running_mean', @@ -180,11 +180,11 @@ def convert_key(model_name, bifpn_repeats, weights): if fnode_id == 0: mapping = { 'op_after_combine5/conv/depthwise_kernel': - 'conv6_up.depthwise_conv.conv.weight', + 'conv6_up.depthwise_conv.weight', 'op_after_combine5/conv/pointwise_kernel': - 'conv6_up.pointwise_conv.conv.weight', + 'conv6_up.pointwise_conv.weight', 'op_after_combine5/conv/bias': - 'conv6_up.pointwise_conv.conv.bias', + 'conv6_up.pointwise_conv.bias', 'op_after_combine5/bn/beta': 'conv6_up.bn.bias', 'op_after_combine5/bn/gamma': @@ -208,11 +208,11 @@ def convert_key(model_name, bifpn_repeats, weights): elif fnode_id == 1: base_mapping = { 'op_after_combine6/conv/depthwise_kernel': - 'conv5_up.depthwise_conv.conv.weight', + 'conv5_up.depthwise_conv.weight', 'op_after_combine6/conv/pointwise_kernel': - 'conv5_up.pointwise_conv.conv.weight', + 'conv5_up.pointwise_conv.weight', 'op_after_combine6/conv/bias': - 'conv5_up.pointwise_conv.conv.bias', + 'conv5_up.pointwise_conv.bias', 'op_after_combine6/bn/beta': 'conv5_up.bn.bias', 'op_after_combine6/bn/gamma': @@ -225,9 +225,9 @@ def convert_key(model_name, bifpn_repeats, weights): if fpn_idx == 0: mapping = { 'resample_0_2_6/conv2d/kernel': - 'p5_down_channel.down_conv.conv.weight', + 'p5_down_channel.down_conv.weight', 'resample_0_2_6/conv2d/bias': - 'p5_down_channel.down_conv.conv.bias', + 'p5_down_channel.down_conv.bias', 'resample_0_2_6/bn/beta': 'p5_down_channel.bn.bias', 'resample_0_2_6/bn/gamma': @@ -252,11 +252,11 @@ def convert_key(model_name, bifpn_repeats, weights): elif fnode_id == 2: base_mapping = { 'op_after_combine7/conv/depthwise_kernel': - 'conv4_up.depthwise_conv.conv.weight', + 'conv4_up.depthwise_conv.weight', 'op_after_combine7/conv/pointwise_kernel': - 'conv4_up.pointwise_conv.conv.weight', + 'conv4_up.pointwise_conv.weight', 'op_after_combine7/conv/bias': - 'conv4_up.pointwise_conv.conv.bias', + 'conv4_up.pointwise_conv.bias', 'op_after_combine7/bn/beta': 'conv4_up.bn.bias', 'op_after_combine7/bn/gamma': @@ -269,9 +269,9 @@ def convert_key(model_name, bifpn_repeats, weights): if fpn_idx == 0: mapping = { 'resample_0_1_7/conv2d/kernel': - 'p4_down_channel.down_conv.conv.weight', + 'p4_down_channel.down_conv.weight', 'resample_0_1_7/conv2d/bias': - 'p4_down_channel.down_conv.conv.bias', + 'p4_down_channel.down_conv.bias', 'resample_0_1_7/bn/beta': 'p4_down_channel.bn.bias', 'resample_0_1_7/bn/gamma': @@ -297,11 +297,11 @@ def convert_key(model_name, bifpn_repeats, weights): base_mapping = { 'op_after_combine8/conv/depthwise_kernel': - 'conv3_up.depthwise_conv.conv.weight', + 'conv3_up.depthwise_conv.weight', 'op_after_combine8/conv/pointwise_kernel': - 'conv3_up.pointwise_conv.conv.weight', + 'conv3_up.pointwise_conv.weight', 'op_after_combine8/conv/bias': - 'conv3_up.pointwise_conv.conv.bias', + 'conv3_up.pointwise_conv.bias', 'op_after_combine8/bn/beta': 'conv3_up.bn.bias', 'op_after_combine8/bn/gamma': @@ -314,9 +314,9 @@ def convert_key(model_name, bifpn_repeats, weights): if fpn_idx == 0: mapping = { 'resample_0_0_8/conv2d/kernel': - 'p3_down_channel.down_conv.conv.weight', + 'p3_down_channel.down_conv.weight', 'resample_0_0_8/conv2d/bias': - 'p3_down_channel.down_conv.conv.bias', + 'p3_down_channel.down_conv.bias', 'resample_0_0_8/bn/beta': 'p3_down_channel.bn.bias', 'resample_0_0_8/bn/gamma': @@ -341,11 +341,11 @@ def convert_key(model_name, bifpn_repeats, weights): elif fnode_id == 4: base_mapping = { 'op_after_combine9/conv/depthwise_kernel': - 'conv4_down.depthwise_conv.conv.weight', + 'conv4_down.depthwise_conv.weight', 'op_after_combine9/conv/pointwise_kernel': - 'conv4_down.pointwise_conv.conv.weight', + 'conv4_down.pointwise_conv.weight', 'op_after_combine9/conv/bias': - 'conv4_down.pointwise_conv.conv.bias', + 'conv4_down.pointwise_conv.bias', 'op_after_combine9/bn/beta': 'conv4_down.bn.bias', 'op_after_combine9/bn/gamma': @@ -358,9 +358,9 @@ def convert_key(model_name, bifpn_repeats, weights): if fpn_idx == 0: mapping = { 'resample_0_1_9/conv2d/kernel': - 'p4_level_connection.down_conv.conv.weight', + 'p4_level_connection.down_conv.weight', 'resample_0_1_9/conv2d/bias': - 'p4_level_connection.down_conv.conv.bias', + 'p4_level_connection.down_conv.bias', 'resample_0_1_9/bn/beta': 'p4_level_connection.bn.bias', 'resample_0_1_9/bn/gamma': @@ -387,11 +387,11 @@ def convert_key(model_name, bifpn_repeats, weights): elif fnode_id == 5: base_mapping = { 'op_after_combine10/conv/depthwise_kernel': - 'conv5_down.depthwise_conv.conv.weight', + 'conv5_down.depthwise_conv.weight', 'op_after_combine10/conv/pointwise_kernel': - 'conv5_down.pointwise_conv.conv.weight', + 'conv5_down.pointwise_conv.weight', 'op_after_combine10/conv/bias': - 'conv5_down.pointwise_conv.conv.bias', + 'conv5_down.pointwise_conv.bias', 'op_after_combine10/bn/beta': 'conv5_down.bn.bias', 'op_after_combine10/bn/gamma': @@ -404,9 +404,9 @@ def convert_key(model_name, bifpn_repeats, weights): if fpn_idx == 0: mapping = { 'resample_0_2_10/conv2d/kernel': - 'p5_level_connection.down_conv.conv.weight', + 'p5_level_connection.down_conv.weight', 'resample_0_2_10/conv2d/bias': - 'p5_level_connection.down_conv.conv.bias', + 'p5_level_connection.down_conv.bias', 'resample_0_2_10/bn/beta': 'p5_level_connection.bn.bias', 'resample_0_2_10/bn/gamma': @@ -433,11 +433,11 @@ def convert_key(model_name, bifpn_repeats, weights): elif fnode_id == 6: base_mapping = { 'op_after_combine11/conv/depthwise_kernel': - 'conv6_down.depthwise_conv.conv.weight', + 'conv6_down.depthwise_conv.weight', 'op_after_combine11/conv/pointwise_kernel': - 'conv6_down.pointwise_conv.conv.weight', + 'conv6_down.pointwise_conv.weight', 'op_after_combine11/conv/bias': - 'conv6_down.pointwise_conv.conv.bias', + 'conv6_down.pointwise_conv.bias', 'op_after_combine11/bn/beta': 'conv6_down.bn.bias', 'op_after_combine11/bn/gamma': @@ -463,11 +463,11 @@ def convert_key(model_name, bifpn_repeats, weights): elif fnode_id == 7: base_mapping = { 'op_after_combine12/conv/depthwise_kernel': - 'conv7_down.depthwise_conv.conv.weight', + 'conv7_down.depthwise_conv.weight', 'op_after_combine12/conv/pointwise_kernel': - 'conv7_down.pointwise_conv.conv.weight', + 'conv7_down.pointwise_conv.weight', 'op_after_combine12/conv/bias': - 'conv7_down.pointwise_conv.conv.bias', + 'conv7_down.pointwise_conv.bias', 'op_after_combine12/bn/beta': 'conv7_down.bn.bias', 'op_after_combine12/bn/gamma': @@ -492,9 +492,9 @@ def convert_key(model_name, bifpn_repeats, weights): if 'box-predict' in seg[1]: prefix = '.'.join(['bbox_head', 'reg_header']) base_mapping = { - 'depthwise_kernel': 'depthwise_conv.conv.weight', - 'pointwise_kernel': 'pointwise_conv.conv.weight', - 'bias': 'pointwise_conv.conv.bias' + 'depthwise_kernel': 'depthwise_conv.weight', + 'pointwise_kernel': 'pointwise_conv.weight', + 'bias': 'pointwise_conv.bias' } suffix = base_mapping['/'.join(seg[2:])] if 'depthwise_conv' in suffix: @@ -522,9 +522,9 @@ def convert_key(model_name, bifpn_repeats, weights): ['bbox_head', 'reg_conv_list', str(bbox_conv_idx)]) base_mapping = { - 'depthwise_kernel': 'depthwise_conv.conv.weight', - 'pointwise_kernel': 'pointwise_conv.conv.weight', - 'bias': 'pointwise_conv.conv.bias' + 'depthwise_kernel': 'depthwise_conv.weight', + 'pointwise_kernel': 'pointwise_conv.weight', + 'bias': 'pointwise_conv.bias' } suffix = base_mapping['/'.join(seg[2:])] if 'depthwise_conv' in suffix: @@ -534,9 +534,9 @@ def convert_key(model_name, bifpn_repeats, weights): if 'class-predict' in seg[1]: prefix = '.'.join(['bbox_head', 'cls_header']) base_mapping = { - 'depthwise_kernel': 'depthwise_conv.conv.weight', - 'pointwise_kernel': 'pointwise_conv.conv.weight', - 'bias': 'pointwise_conv.conv.bias' + 'depthwise_kernel': 'depthwise_conv.weight', + 'pointwise_kernel': 'pointwise_conv.weight', + 'bias': 'pointwise_conv.bias' } suffix = base_mapping['/'.join(seg[2:])] if 'depthwise_conv' in suffix: @@ -564,9 +564,9 @@ def convert_key(model_name, bifpn_repeats, weights): ['bbox_head', 'cls_conv_list', str(cls_conv_idx)]) base_mapping = { - 'depthwise_kernel': 'depthwise_conv.conv.weight', - 'pointwise_kernel': 'pointwise_conv.conv.weight', - 'bias': 'pointwise_conv.conv.bias' + 'depthwise_kernel': 'depthwise_conv.weight', + 'pointwise_kernel': 'pointwise_conv.weight', + 'bias': 'pointwise_conv.bias' } suffix = base_mapping['/'.join(seg[2:])] if 'depthwise_conv' in suffix: @@ -616,7 +616,6 @@ def main(): n: torch.as_tensor(tf2pth(reader.get_tensor(n))) for (n, _) in reader.get_variable_to_shape_map().items() } - print(weights.keys()) bifpn_repeats = repeat_map[int(model_name[14])] out = convert_key(model_name, bifpn_repeats, weights) result = {'state_dict': out} diff --git a/projects/EfficientDet/efficientdet/__init__.py b/projects/EfficientDet/efficientdet/__init__.py index dca95d53a35..b6c66bcc353 100644 --- a/projects/EfficientDet/efficientdet/__init__.py +++ b/projects/EfficientDet/efficientdet/__init__.py @@ -1,14 +1,16 @@ -from .anchor_generator import YXYXAnchorGenerator from .bifpn import BiFPN -from .coco_90class import Coco90Dataset -from .coco_90metric import Coco90Metric from .efficientdet import EfficientDet from .efficientdet_head import EfficientDetSepBNHead -from .trans_max_iou_assigner import TransMaxIoUAssigner -from .yxyx_bbox_coder import YXYXDeltaXYWHBBoxCoder +from .huber_loss import HuberLoss +from .tensorflow.anchor_generator import YXYXAnchorGenerator +from .tensorflow.coco_90class import Coco90Dataset +from .tensorflow.coco_90metric import Coco90Metric +from .tensorflow.trans_max_iou_assigner import TransMaxIoUAssigner +from .tensorflow.yxyx_bbox_coder import YXYXDeltaXYWHBBoxCoder +from .utils import Conv2dSamePadding __all__ = [ - 'EfficientDet', 'BiFPN', 'EfficientDetSepBNHead', 'YXYXAnchorGenerator', - 'YXYXDeltaXYWHBBoxCoder', 'Coco90Dataset', 'Coco90Metric', - 'TransMaxIoUAssigner' + 'EfficientDet', 'BiFPN', 'HuberLoss', 'EfficientDetSepBNHead', + 'Conv2dSamePadding', 'Coco90Dataset', 'Coco90Metric', + 'YXYXAnchorGenerator', 'TransMaxIoUAssigner', 'YXYXDeltaXYWHBBoxCoder' ] diff --git a/projects/EfficientDet/efficientdet/bifpn.py b/projects/EfficientDet/efficientdet/bifpn.py index 114af7b16c7..56356c3c555 100644 --- a/projects/EfficientDet/efficientdet/bifpn.py +++ b/projects/EfficientDet/efficientdet/bifpn.py @@ -1,5 +1,3 @@ -# Copyright (c) OpenMMLab. All rights reserved. -# Modified from https://github.com/zylo117/Yet-Another-EfficientDet-Pytorch from typing import List import torch @@ -9,21 +7,19 @@ from mmdet.registry import MODELS from mmdet.utils import MultiConfig, OptConfigType -from .utils import (DepthWiseConvBlock, DownChannelBlock, MaxPool2dSamePadding, - MemoryEfficientSwish) +from .utils import DepthWiseConvBlock, DownChannelBlock, MaxPool2dSamePadding class BiFPNStage(nn.Module): - ''' + """ in_channels: List[int], input dim for P3, P4, P5 out_channels: int, output dim for P2 - P7 first_time: int, whether is the first bifpnstage - num_outs: int, BiFPN need feature maps num - use_swish: whether use MemoryEfficientSwish + conv_bn_act_pattern: bool, whether use conv_bn_act_pattern norm_cfg: (:obj:`ConfigDict` or dict, optional): Config dict for normalization layer. epsilon: float, hyperparameter in fusion features - ''' + """ def __init__(self, in_channels: List[int], @@ -31,7 +27,6 @@ def __init__(self, first_time: bool = False, apply_bn_for_resampling: bool = True, conv_bn_act_pattern: bool = False, - use_meswish: bool = True, norm_cfg: OptConfigType = dict( type='BN', momentum=1e-2, eps=1e-3), epsilon: float = 1e-4) -> None: @@ -42,7 +37,6 @@ def __init__(self, self.first_time = first_time self.apply_bn_for_resampling = apply_bn_for_resampling self.conv_bn_act_pattern = conv_bn_act_pattern - self.use_meswish = use_meswish self.norm_cfg = norm_cfg self.epsilon = epsilon @@ -173,7 +167,7 @@ def __init__(self, torch.ones(2, dtype=torch.float32), requires_grad=True) self.p7_w2_relu = nn.ReLU() - self.swish = MemoryEfficientSwish() if use_meswish else Swish() + self.swish = Swish() def combine(self, x): if not self.conv_bn_act_pattern: @@ -268,7 +262,7 @@ def forward(self, x): @MODELS.register_module() class BiFPN(BaseModule): - ''' + """ num_stages: int, bifpn number of repeats in_channels: List[int], input dim for P3, P4, P5 out_channels: int, output dim for P2 - P7 @@ -276,11 +270,10 @@ class BiFPN(BaseModule): epsilon: float, hyperparameter in fusion features apply_bn_for_resampling: bool, whether use bn after resampling conv_bn_act_pattern: bool, whether use conv_bn_act_pattern - use_swish: whether use MemoryEfficientSwish norm_cfg: (:obj:`ConfigDict` or dict, optional): Config dict for normalization layer. init_cfg: MultiConfig: init method - ''' + """ def __init__(self, num_stages: int, @@ -290,11 +283,9 @@ def __init__(self, epsilon: float = 1e-4, apply_bn_for_resampling: bool = True, conv_bn_act_pattern: bool = False, - use_meswish: bool = True, norm_cfg: OptConfigType = dict( type='BN', momentum=1e-2, eps=1e-3), init_cfg: MultiConfig = None) -> None: - super().__init__(init_cfg=init_cfg) self.start_level = start_level self.bifpn = nn.Sequential(*[ @@ -304,7 +295,6 @@ def __init__(self, first_time=True if _ == 0 else False, apply_bn_for_resampling=apply_bn_for_resampling, conv_bn_act_pattern=conv_bn_act_pattern, - use_meswish=use_meswish, norm_cfg=norm_cfg, epsilon=epsilon) for _ in range(num_stages) ]) diff --git a/projects/EfficientDet/efficientdet/efficientdet_head.py b/projects/EfficientDet/efficientdet/efficientdet_head.py index 6ed6521d091..ae3efbe2c7d 100644 --- a/projects/EfficientDet/efficientdet/efficientdet_head.py +++ b/projects/EfficientDet/efficientdet/efficientdet_head.py @@ -1,30 +1,34 @@ # Copyright (c) OpenMMLab. All rights reserved. -from typing import Tuple +from typing import List, Tuple +import torch import torch.nn as nn -from mmcv.cnn.bricks import build_norm_layer +from mmcv.cnn.bricks import Swish, build_norm_layer from mmengine.model import bias_init_with_prob from torch import Tensor from mmdet.models.dense_heads.anchor_head import AnchorHead +from mmdet.models.utils import images_to_levels, multi_apply from mmdet.registry import MODELS -from mmdet.utils import OptConfigType, OptMultiConfig -from .utils import DepthWiseConvBlock, MemoryEfficientSwish +from mmdet.structures.bbox import cat_boxes, get_box_tensor +from mmdet.utils import (InstanceList, OptConfigType, OptInstanceList, + OptMultiConfig, reduce_mean) +from .utils import DepthWiseConvBlock @MODELS.register_module() class EfficientDetSepBNHead(AnchorHead): """EfficientDetHead with separate BN. - num_classes (int): Number of categories excluding the background - category. in_channels (int): Number of channels in the input feature map. - feat_channels (int): Number of hidden channels. stacked_convs (int): Number - of repetitions of conv norm_cfg (dict): Config dict for normalization - layer. anchor_generator (dict): Config dict for anchor generator bbox_coder - (dict): Config of bounding box coder. loss_cls (dict): Config of - classification loss. loss_bbox (dict): Config of localization loss. - train_cfg (dict): Training config of anchor head. test_cfg (dict): Testing - config of anchor head. init_cfg (dict or list[dict], optional): + num_classes (int): Number of categories num_ins (int): Number of the input + feature map. in_channels (int): Number of channels in the input feature + map. feat_channels (int): Number of hidden channels. stacked_convs (int): + Number of repetitions of conv norm_cfg (dict): Config dict for + normalization layer. anchor_generator (dict): Config dict for anchor + generator bbox_coder (dict): Config of bounding box coder. loss_cls (dict): + Config of classification loss. loss_bbox (dict): Config of localization + loss. train_cfg (dict): Training config of anchor head. test_cfg (dict): + Testing config of anchor head. init_cfg (dict or list[dict], optional): Initialization config dict. """ @@ -83,17 +87,17 @@ def _init_layers(self) -> None: apply_norm=False) self.reg_header = DepthWiseConvBlock( self.in_channels, self.num_base_priors * 4, apply_norm=False) - self.swish = MemoryEfficientSwish() + self.swish = Swish() def init_weights(self) -> None: """Initialize weights of the head.""" for m in self.reg_conv_list: - nn.init.constant_(m.pointwise_conv.conv.bias, 0.0) + nn.init.constant_(m.pointwise_conv.bias, 0.0) for m in self.cls_conv_list: - nn.init.constant_(m.pointwise_conv.conv.bias, 0.0) + nn.init.constant_(m.pointwise_conv.bias, 0.0) bias_cls = bias_init_with_prob(0.01) - nn.init.constant_(self.cls_header.pointwise_conv.conv.bias, bias_cls) - nn.init.constant_(self.reg_header.pointwise_conv.conv.bias, 0.0) + nn.init.constant_(self.cls_header.pointwise_conv.bias, bias_cls) + nn.init.constant_(self.reg_header.pointwise_conv.bias, 0.0) def forward_single_bbox(self, feat: Tensor, level_id: int, i: int) -> Tensor: @@ -134,3 +138,124 @@ def forward(self, feats: Tuple[Tensor]) -> tuple: cls_scores.append(cls_score) return cls_scores, bbox_preds + + def loss_by_feat( + self, + cls_scores: List[Tensor], + bbox_preds: List[Tensor], + batch_gt_instances: InstanceList, + batch_img_metas: List[dict], + batch_gt_instances_ignore: OptInstanceList = None) -> dict: + """Calculate the loss based on the features extracted by the detection + head. + + Args: + cls_scores (list[Tensor]): Box scores for each scale level + has shape (N, num_anchors * num_classes, H, W). + bbox_preds (list[Tensor]): Box energies / deltas for each scale + level with shape (N, num_anchors * 4, H, W). + batch_gt_instances (list[:obj:`InstanceData`]): Batch of + gt_instance. It usually includes ``bboxes`` and ``labels`` + attributes. + batch_img_metas (list[dict]): Meta information of each image, e.g., + image size, scaling factor, etc. + batch_gt_instances_ignore (list[:obj:`InstanceData`], optional): + Batch of gt_instances_ignore. It includes ``bboxes`` attribute + data that is ignored during training and testing. + Defaults to None. + + Returns: + dict: A dictionary of loss components. + """ + featmap_sizes = [featmap.size()[-2:] for featmap in cls_scores] + assert len(featmap_sizes) == self.prior_generator.num_levels + + device = cls_scores[0].device + + anchor_list, valid_flag_list = self.get_anchors( + featmap_sizes, batch_img_metas, device=device) + cls_reg_targets = self.get_targets( + anchor_list, + valid_flag_list, + batch_gt_instances, + batch_img_metas, + batch_gt_instances_ignore=batch_gt_instances_ignore) + (labels_list, label_weights_list, bbox_targets_list, bbox_weights_list, + avg_factor) = cls_reg_targets + + # anchor number of multi levels + num_level_anchors = [anchors.size(0) for anchors in anchor_list[0]] + # concat all level anchors and flags to a single tensor + concat_anchor_list = [] + for i in range(len(anchor_list)): + concat_anchor_list.append(cat_boxes(anchor_list[i])) + all_anchor_list = images_to_levels(concat_anchor_list, + num_level_anchors) + + avg_factor = reduce_mean( + torch.tensor(avg_factor, dtype=torch.float, device=device)).item() + avg_factor = max(avg_factor, 1.0) + losses_cls, losses_bbox = multi_apply( + self.loss_by_feat_single, + cls_scores, + bbox_preds, + all_anchor_list, + labels_list, + label_weights_list, + bbox_targets_list, + bbox_weights_list, + avg_factor=avg_factor) + return dict(loss_cls=losses_cls, loss_bbox=losses_bbox) + + def loss_by_feat_single(self, cls_score: Tensor, bbox_pred: Tensor, + anchors: Tensor, labels: Tensor, + label_weights: Tensor, bbox_targets: Tensor, + bbox_weights: Tensor, avg_factor: int) -> tuple: + """Calculate the loss of a single scale level based on the features + extracted by the detection head. + + Args: + cls_score (Tensor): Box scores for each scale level + Has shape (N, num_anchors * num_classes, H, W). + bbox_pred (Tensor): Box energies / deltas for each scale + level with shape (N, num_anchors * 4, H, W). + anchors (Tensor): Box reference for each scale level with shape + (N, num_total_anchors, 4). + labels (Tensor): Labels of each anchors with shape + (N, num_total_anchors). + label_weights (Tensor): Label weights of each anchor with shape + (N, num_total_anchors) + bbox_targets (Tensor): BBox regression targets of each anchor + weight shape (N, num_total_anchors, 4). + bbox_weights (Tensor): BBox regression loss weights of each anchor + with shape (N, num_total_anchors, 4). + avg_factor (int): Average factor that is used to average the loss. + + Returns: + tuple: loss components. + """ + + # classification loss + labels = labels.reshape(-1) + label_weights = label_weights.reshape(-1) + cls_score = cls_score.permute(0, 2, 3, + 1).reshape(-1, self.cls_out_channels) + loss_cls = self.loss_cls( + cls_score, labels, label_weights, avg_factor=avg_factor) + # regression loss + target_dim = bbox_targets.size(-1) + bbox_targets = bbox_targets.reshape(-1, target_dim) + bbox_weights = bbox_weights.reshape(-1, target_dim) + bbox_pred = bbox_pred.permute(0, 2, 3, + 1).reshape(-1, + self.bbox_coder.encode_size) + if self.reg_decoded_bbox: + # When the regression loss (e.g. `IouLoss`, `GIouLoss`) + # is applied directly on the decoded bounding boxes, it + # decodes the already encoded coordinates to absolute format. + anchors = anchors.reshape(-1, anchors.size(-1)) + bbox_pred = self.bbox_coder.decode(anchors, bbox_pred) + bbox_pred = get_box_tensor(bbox_pred) + loss_bbox = self.loss_bbox( + bbox_pred, bbox_targets, bbox_weights, avg_factor=avg_factor * 4) + return loss_cls, loss_bbox diff --git a/projects/EfficientDet/efficientdet/huber_loss.py b/projects/EfficientDet/efficientdet/huber_loss.py new file mode 100644 index 00000000000..091963fa9d6 --- /dev/null +++ b/projects/EfficientDet/efficientdet/huber_loss.py @@ -0,0 +1,91 @@ +# Copyright (c) OpenMMLab. All rights reserved. +from typing import Optional + +import torch +import torch.nn as nn +from torch import Tensor + +from mmdet.models.losses.utils import weighted_loss +from mmdet.registry import MODELS + + +@weighted_loss +def huber_loss(pred: Tensor, target: Tensor, beta: float = 1.0) -> Tensor: + """Huber loss. + + Args: + pred (Tensor): The prediction. + target (Tensor): The learning target of the prediction. + beta (float, optional): The threshold in the piecewise function. + Defaults to 1.0. + + Returns: + Tensor: Calculated loss + """ + assert beta > 0 + if target.numel() == 0: + return pred.sum() * 0 + + assert pred.size() == target.size() + diff = torch.abs(pred - target) + loss = torch.where(diff < beta, 0.5 * diff * diff, + beta * diff - 0.5 * beta * beta) + return loss + + +@MODELS.register_module() +class HuberLoss(nn.Module): + """Huber loss. + + Args: + beta (float, optional): The threshold in the piecewise function. + Defaults to 1.0. + reduction (str, optional): The method to reduce the loss. + Options are "none", "mean" and "sum". Defaults to "mean". + loss_weight (float, optional): The weight of loss. + """ + + def __init__(self, + beta: float = 1.0, + reduction: str = 'mean', + loss_weight: float = 1.0) -> None: + super().__init__() + self.beta = beta + self.reduction = reduction + self.loss_weight = loss_weight + + def forward(self, + pred: Tensor, + target: Tensor, + weight: Optional[Tensor] = None, + avg_factor: Optional[int] = None, + reduction_override: Optional[str] = None, + **kwargs) -> Tensor: + """Forward function. + + Args: + pred (Tensor): The prediction. + target (Tensor): The learning target of the prediction. + weight (Tensor, optional): The weight of loss for each + prediction. Defaults to None. + avg_factor (int, optional): Average factor that is used to average + the loss. Defaults to None. + reduction_override (str, optional): The reduction method used to + override the original reduction method of the loss. + Defaults to None. + + Returns: + Tensor: Calculated loss + """ + assert reduction_override in (None, 'none', 'mean', 'sum') + reduction = ( + reduction_override if reduction_override else self.reduction) + loss_bbox = self.loss_weight * huber_loss( + pred, + target, + weight, + beta=self.beta, + reduction=reduction, + avg_factor=avg_factor, + **kwargs) + return loss_bbox diff --git a/projects/EfficientDet/efficientdet/anchor_generator.py b/projects/EfficientDet/efficientdet/tensorflow/anchor_generator.py similarity index 100% rename from projects/EfficientDet/efficientdet/anchor_generator.py rename to projects/EfficientDet/efficientdet/tensorflow/anchor_generator.py diff --git a/projects/EfficientDet/efficientdet/api_wrappers/__init__.py b/projects/EfficientDet/efficientdet/tensorflow/api_wrappers/__init__.py similarity index 100% rename from projects/EfficientDet/efficientdet/api_wrappers/__init__.py rename to projects/EfficientDet/efficientdet/tensorflow/api_wrappers/__init__.py diff --git a/projects/EfficientDet/efficientdet/api_wrappers/coco_api.py b/projects/EfficientDet/efficientdet/tensorflow/api_wrappers/coco_api.py similarity index 98% rename from projects/EfficientDet/efficientdet/api_wrappers/coco_api.py rename to projects/EfficientDet/efficientdet/tensorflow/api_wrappers/coco_api.py index ffaf33e0185..142f27d7f94 100644 --- a/projects/EfficientDet/efficientdet/api_wrappers/coco_api.py +++ b/projects/EfficientDet/efficientdet/tensorflow/api_wrappers/coco_api.py @@ -30,7 +30,6 @@ def get_ann_ids(self, img_ids=[], cat_ids=[], area_rng=[], iscrowd=None): return self.getAnnIds(img_ids, cat_ids, area_rng, iscrowd) def get_cat_ids(self, cat_names=[], sup_names=[], cat_ids=[]): - # return self.getCatIds(cat_names, sup_names, cat_ids) cat_ids_coco = self.getCatIds(cat_names, sup_names, cat_ids) if None in cat_names: index = [i for i, v in enumerate(cat_names) if v is not None] diff --git a/projects/EfficientDet/efficientdet/coco_90class.py b/projects/EfficientDet/efficientdet/tensorflow/coco_90class.py similarity index 98% rename from projects/EfficientDet/efficientdet/coco_90class.py rename to projects/EfficientDet/efficientdet/tensorflow/coco_90class.py index b0742af0be9..d2996ccb8fc 100644 --- a/projects/EfficientDet/efficientdet/coco_90class.py +++ b/projects/EfficientDet/efficientdet/tensorflow/coco_90class.py @@ -3,6 +3,8 @@ import os.path as osp from typing import List, Union +from mmengine.fileio import get_local_path + from mmdet.datasets.base_det_dataset import BaseDetDataset from mmdet.registry import DATASETS from .api_wrappers import COCO @@ -62,7 +64,8 @@ def load_data_list(self) -> List[dict]: Returns: List[dict]: A list of annotation. """ # noqa: E501 - with self.file_client.get_local_path(self.ann_file) as local_path: + with get_local_path( + self.ann_file, backend_args=self.backend_args) as local_path: self.coco = self.COCOAPI(local_path) # The order of returned `cat_ids` will not # change with the order of the `classes` diff --git a/projects/EfficientDet/efficientdet/coco_90metric.py b/projects/EfficientDet/efficientdet/tensorflow/coco_90metric.py similarity index 97% rename from projects/EfficientDet/efficientdet/coco_90metric.py rename to projects/EfficientDet/efficientdet/tensorflow/coco_90metric.py index 7bc12d00956..eed65224018 100644 --- a/projects/EfficientDet/efficientdet/coco_90metric.py +++ b/projects/EfficientDet/efficientdet/tensorflow/coco_90metric.py @@ -8,7 +8,7 @@ import numpy as np from mmengine.evaluator import BaseMetric -from mmengine.fileio import FileClient, dump, load +from mmengine.fileio import dump, get_local_path, load from mmengine.logging import MMLogger from terminaltables import AsciiTable @@ -49,9 +49,8 @@ class Coco90Metric(BaseMetric): outfile_prefix (str, optional): The prefix of json files. It includes the file path and the prefix of filename, e.g., "a/b/prefix". If not specified, a temp file will be created. Defaults to None. - file_client_args (dict): Arguments to instantiate a FileClient. - See :class:`mmengine.fileio.FileClient` for details. - Defaults to ``dict(backend='disk')``. + backend_args (dict, optional): Arguments to instantiate the + corresponding backend. Defaults to None. collect_device (str): Device name used for collecting results from different ranks during distributed training. Must be 'cpu' or 'gpu'. Defaults to 'cpu'. @@ -71,7 +70,7 @@ def __init__(self, metric_items: Optional[Sequence[str]] = None, format_only: bool = False, outfile_prefix: Optional[str] = None, - file_client_args: dict = dict(backend='disk'), + backend_args: dict = None, collect_device: str = 'cpu', prefix: Optional[str] = None) -> None: super().__init__(collect_device=collect_device, prefix=prefix) @@ -104,13 +103,13 @@ def __init__(self, self.outfile_prefix = outfile_prefix - self.file_client_args = file_client_args - self.file_client = FileClient(**file_client_args) + self.backend_args = backend_args # if ann_file is not specified, # initialize coco api with the converted dataset if ann_file is not None: - with self.file_client.get_local_path(ann_file) as local_path: + with get_local_path( + ann_file, backend_args=self.backend_args) as local_path: self._coco_api = COCO(local_path) else: self._coco_api = None diff --git a/projects/EfficientDet/efficientdet/trans_max_iou_assigner.py b/projects/EfficientDet/efficientdet/tensorflow/trans_max_iou_assigner.py similarity index 100% rename from projects/EfficientDet/efficientdet/trans_max_iou_assigner.py rename to projects/EfficientDet/efficientdet/tensorflow/trans_max_iou_assigner.py diff --git a/projects/EfficientDet/efficientdet/yxyx_bbox_coder.py b/projects/EfficientDet/efficientdet/tensorflow/yxyx_bbox_coder.py similarity index 100% rename from projects/EfficientDet/efficientdet/yxyx_bbox_coder.py rename to projects/EfficientDet/efficientdet/tensorflow/yxyx_bbox_coder.py diff --git a/projects/EfficientDet/efficientdet/utils.py b/projects/EfficientDet/efficientdet/utils.py index 5fc898a64a7..9c30a01fc8b 100644 --- a/projects/EfficientDet/efficientdet/utils.py +++ b/projects/EfficientDet/efficientdet/utils.py @@ -1,4 +1,3 @@ -# Copyright (c) OpenMMLab. All rights reserved. import math from typing import Tuple, Union @@ -6,67 +5,49 @@ import torch.nn as nn from mmcv.cnn.bricks import Swish, build_norm_layer from torch.nn import functional as F +from torch.nn.init import _calculate_fan_in_and_fan_out, trunc_normal_ +from mmdet.registry import MODELS from mmdet.utils import OptConfigType -class SwishImplementation(torch.autograd.Function): +def variance_scaling_trunc(tensor, gain=1.): + fan_in, _ = _calculate_fan_in_and_fan_out(tensor) + gain /= max(1.0, fan_in) + std = math.sqrt(gain) / .87962566103423978 + return trunc_normal_(tensor, 0., std) - @staticmethod - def forward(ctx, i): - result = i * torch.sigmoid(i) - ctx.save_for_backward(i) - return result - @staticmethod - def backward(ctx, grad_output): - i = ctx.saved_variables[0] - sigmoid_i = torch.sigmoid(i) - return grad_output * (sigmoid_i * (1 + i * (1 - sigmoid_i))) - - -class MemoryEfficientSwish(nn.Module): - - def forward(self, x): - return SwishImplementation.apply(x) - - -class Conv2dSamePadding(nn.Module): +@MODELS.register_module() +class Conv2dSamePadding(nn.Conv2d): def __init__(self, in_channels: int, out_channels: int, kernel_size: Union[int, Tuple[int, int]], stride: Union[int, Tuple[int, int]] = 1, + padding: Union[int, Tuple[int, int]] = 0, + dilation: Union[int, Tuple[int, int]] = 1, groups: int = 1, bias: bool = True): - super().__init__() - self.conv = nn.Conv2d( - in_channels, - out_channels, - kernel_size, - stride=stride, - bias=bias, - groups=groups) - self.stride = self.conv.stride - self.kernel_size = self.conv.kernel_size - - def forward(self, x): - h, w = x.shape[-2:] - extra_h = (math.ceil(w / self.stride[1]) - - 1) * self.stride[1] - w + self.kernel_size[1] - extra_v = (math.ceil(h / self.stride[0]) - - 1) * self.stride[0] - h + self.kernel_size[0] - - left = extra_h // 2 - right = extra_h - left - top = extra_v // 2 - bottom = extra_v - top - + super().__init__(in_channels, out_channels, kernel_size, stride, 0, + dilation, groups, bias) + + def forward(self, x: torch.Tensor) -> torch.Tensor: + img_h, img_w = x.size()[-2:] + kernel_h, kernel_w = self.weight.size()[-2:] + extra_w = (math.ceil(img_w / self.stride[1]) - + 1) * self.stride[1] - img_w + kernel_w + extra_h = (math.ceil(img_h / self.stride[0]) - + 1) * self.stride[0] - img_h + kernel_h + + left = extra_w // 2 + right = extra_w - left + top = extra_h // 2 + bottom = extra_h - top x = F.pad(x, [left, right, top, bottom]) - x = self.conv(x) - - return x + return F.conv2d(x, self.weight, self.bias, self.stride, self.padding, + self.dilation, self.groups) class MaxPool2dSamePadding(nn.Module): @@ -112,7 +93,6 @@ def __init__( out_channels: int, apply_norm: bool = True, conv_bn_act_pattern: bool = False, - use_meswish: bool = True, norm_cfg: OptConfigType = dict(type='BN', momentum=1e-2, eps=1e-3) ) -> None: super(DepthWiseConvBlock, self).__init__() @@ -132,7 +112,7 @@ def __init__( self.apply_activation = conv_bn_act_pattern if self.apply_activation: - self.swish = MemoryEfficientSwish() if use_meswish else Swish() + self.swish = Swish() def forward(self, x): x = self.depthwise_conv(x) @@ -153,7 +133,6 @@ def __init__( out_channels: int, apply_norm: bool = True, conv_bn_act_pattern: bool = False, - use_meswish: bool = True, norm_cfg: OptConfigType = dict(type='BN', momentum=1e-2, eps=1e-3) ) -> None: super(DownChannelBlock, self).__init__() @@ -163,7 +142,7 @@ def __init__( self.bn = build_norm_layer(norm_cfg, num_features=out_channels)[1] self.apply_activation = conv_bn_act_pattern if self.apply_activation: - self.swish = MemoryEfficientSwish() if use_meswish else Swish() + self.swish = Swish() def forward(self, x): x = self.down_conv(x) diff --git a/projects/LabelStudio/backend_template/_wsgi.py b/projects/LabelStudio/backend_template/_wsgi.py new file mode 100644 index 00000000000..1f8fb68cdf8 --- /dev/null +++ b/projects/LabelStudio/backend_template/_wsgi.py @@ -0,0 +1,145 @@ +# Copyright (c) OpenMMLab. All rights reserved. +import argparse +import json +import logging +import logging.config +import os + +logging.config.dictConfig({ + 'version': 1, + 'formatters': { + 'standard': { + 'format': + '[%(asctime)s] [%(levelname)s] [%(name)s::%(funcName)s::%(lineno)d] %(message)s' # noqa E501 + } + }, + 'handlers': { + 'console': { + 'class': 'logging.StreamHandler', + 'level': 'DEBUG', + 'stream': 'ext://sys.stdout', + 'formatter': 'standard' + } + }, + 'root': { + 'level': 'ERROR', + 'handlers': ['console'], + 'propagate': True + } +}) + +_DEFAULT_CONFIG_PATH = os.path.join(os.path.dirname(__file__), 'config.json') + + +def get_kwargs_from_config(config_path=_DEFAULT_CONFIG_PATH): + if not os.path.exists(config_path): + return dict() + with open(config_path) as f: + config = json.load(f) + assert isinstance(config, dict) + return config + + +if __name__ == '__main__': + + from label_studio_ml.api import init_app + + from projects.LabelStudio.backend_template.mmdetection import MMDetection + + parser = argparse.ArgumentParser(description='Label studio') + parser.add_argument( + '-p', + '--port', + dest='port', + type=int, + default=9090, + help='Server port') + parser.add_argument( + '--host', dest='host', type=str, default='0.0.0.0', help='Server host') + parser.add_argument( + '--kwargs', + '--with', + dest='kwargs', + metavar='KEY=VAL', + nargs='+', + type=lambda kv: kv.split('='), + help='Additional LabelStudioMLBase model initialization kwargs') + parser.add_argument( + '-d', + '--debug', + dest='debug', + action='store_true', + help='Switch debug mode') + parser.add_argument( + '--log-level', + dest='log_level', + choices=['DEBUG', 'INFO', 'WARNING', 'ERROR'], + default=None, + help='Logging level') + parser.add_argument( + '--model-dir', + dest='model_dir', + default=os.path.dirname(__file__), + help='Directory models are store', + ) + parser.add_argument( + '--check', + dest='check', + action='store_true', + help='Validate model instance before launching server') + + args = parser.parse_args() + + # setup logging level + if args.log_level: + logging.root.setLevel(args.log_level) + + def isfloat(value): + try: + float(value) + return True + except ValueError: + return False + + def parse_kwargs(): + param = dict() + for k, v in args.kwargs: + if v.isdigit(): + param[k] = int(v) + elif v == 'True' or v == 'true': + param[k] = True + elif v == 'False' or v == 'False': + param[k] = False + elif isfloat(v): + param[k] = float(v) + else: + param[k] = v + return param + + kwargs = get_kwargs_from_config() + + if args.kwargs: + kwargs.update(parse_kwargs()) + + if args.check: + print('Check "' + MMDetection.__name__ + '" instance creation..') + model = MMDetection(**kwargs) + + app = init_app( + model_class=MMDetection, + model_dir=os.environ.get('MODEL_DIR', args.model_dir), + redis_queue=os.environ.get('RQ_QUEUE_NAME', 'default'), + redis_host=os.environ.get('REDIS_HOST', 'localhost'), + redis_port=os.environ.get('REDIS_PORT', 6379), + **kwargs) + + app.run(host=args.host, port=args.port, debug=args.debug) + +else: + # for uWSGI use + app = init_app( + model_class=MMDetection, + model_dir=os.environ.get('MODEL_DIR', os.path.dirname(__file__)), + redis_queue=os.environ.get('RQ_QUEUE_NAME', 'default'), + redis_host=os.environ.get('REDIS_HOST', 'localhost'), + redis_port=os.environ.get('REDIS_PORT', 6379)) diff --git a/projects/LabelStudio/backend_template/mmdetection.py b/projects/LabelStudio/backend_template/mmdetection.py new file mode 100644 index 00000000000..f25e80e8fc9 --- /dev/null +++ b/projects/LabelStudio/backend_template/mmdetection.py @@ -0,0 +1,148 @@ +# Copyright (c) OpenMMLab. All rights reserved. +import io +import json +import logging +import os +from urllib.parse import urlparse + +import boto3 +from botocore.exceptions import ClientError +from label_studio_ml.model import LabelStudioMLBase +from label_studio_ml.utils import (DATA_UNDEFINED_NAME, get_image_size, + get_single_tag_keys) +from label_studio_tools.core.utils.io import get_data_dir + +from mmdet.apis import inference_detector, init_detector + +logger = logging.getLogger(__name__) + + +class MMDetection(LabelStudioMLBase): + """Object detector based on https://github.com/open-mmlab/mmdetection.""" + + def __init__(self, + config_file=None, + checkpoint_file=None, + image_dir=None, + labels_file=None, + score_threshold=0.5, + device='cpu', + **kwargs): + + super(MMDetection, self).__init__(**kwargs) + config_file = config_file or os.environ['config_file'] + checkpoint_file = checkpoint_file or os.environ['checkpoint_file'] + self.config_file = config_file + self.checkpoint_file = checkpoint_file + self.labels_file = labels_file + # default Label Studio image upload folder + upload_dir = os.path.join(get_data_dir(), 'media', 'upload') + self.image_dir = image_dir or upload_dir + logger.debug( + f'{self.__class__.__name__} reads images from {self.image_dir}') + if self.labels_file and os.path.exists(self.labels_file): + self.label_map = json_load(self.labels_file) + else: + self.label_map = {} + + self.from_name, self.to_name, self.value, self.labels_in_config = get_single_tag_keys( # noqa E501 + self.parsed_label_config, 'RectangleLabels', 'Image') + schema = list(self.parsed_label_config.values())[0] + self.labels_in_config = set(self.labels_in_config) + + # Collect label maps from `predicted_values="airplane,car"` attribute in