Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Implement of RAM with a gradio interface #1802

Merged
merged 12 commits into from
Oct 25, 2023
Merged

Conversation

Coobiw
Copy link
Contributor

@Coobiw Coobiw commented Sep 25, 2023

Thanks for your contribution and we appreciate it a lot. The following instructions would make your pull request more healthy and more easily get feedback. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers.

Motivation

After implementing CLIPZeroShot in mmpretrain, zero-shot image classification is well done by it. However, if there is an image with multi-objects, CLIP cannot do well on these multi-classification tasks. Recently, image tagging is a hot topic. Tag2Text and RAM(Recognize Anything Model) can recognize multi-objects in one image well. What's more coincidental is the implement of RAM depends on implement of CLIP. So it's natural for this feature.

Modification

  • convert the checkpoint of RAM(especially, SwinTransformer) to mmpretrain style
  • implement RAM based on mmpretrain components and register it in MODELS interface
  • implement RAM inference in two modes (normal and openset(users can define the category for themselves))
  • the openset inference relies on mmpretrain CLIP
  • wrapped to a gradio interface for users to experence

BC-breaking (Optional)

Does the modification introduce changes that break the backward compatibility of the downstream repositories?
If so, please describe how it breaks the compatibility and how the downstream projects should modify their code to keep compatibility with this PR.

Use cases (Optional)

  1. Convert RAM weights to mmpretrain style(Optional)
python tools/model_converters/ram2mmpretrain.py /xxx/ram_swin_large_14m.pth /xxx/ram_swin_large_14m_mmpretrain.pth
  1. Weights Preparation
    You need to prepare RAM weights and CLIP weights. You can get them in ram_swin_large_14m_mmpretrain and clip-vit-b-p16_converted.
    The step of converting CLIP weights is in my previous PR(Implement of Zero-Shot CLIP Classifier #1737).
  2. Gradio Installation
pip install gradio==3.44.0
  1. Launch Gradio WebUI
cd mmpretrain
python -m mmpretrain.models.multimodal.ram.gradio_demo /xxx/ram_swin_large_14m_mmpretrain.pth /xxx/clip-vit-b-p16_converted.pth
  1. Demos
  • If you choose normal, you don't need to set the threshold and tag_list, just uploading the image, because these two are not used.
image
  • If you choose openset, the threshold must be set. The tag_list is optional(default category includes something rare and unseen).
image

Checklist

Before PR:

  • Pre-commit or other linting tools are used to fix the potential lint issues.
  • Bug fixes are fully covered by unit tests, the case that causes the bug should be added in the unit tests.
  • The modification is covered by complete unit tests. If not, please add more unit test to ensure the correctness.
  • The documentation has been modified accordingly, like docstring or example tutorials.

After PR:

  • If the modification has potential influence on downstream or other related projects, this PR should be tested with those projects, like MMDet or MMSeg.
  • CLA has been signed and all committers have signed the CLA in this PR.

LALBJ and others added 4 commits August 23, 2023 10:45
…pen-mmlab#1756)

* feat: impelemt DINO

* chore: delete debug code

* chore: impplement pre-commit

* fix: fix imported package

* chore: pre-commit check
…open-mmlab#1774)

* add new config adapting MobileNetV2,V3

* add base model config for mobile net v3, modified all training configs of mobile net v3 inherit from the base model config

* removed directory _base_/models/mobilenet_v3
* zero-shot CLIP

* modify zero-shot clip config

* add in1k_sub_prompt(8 prompts) for improvement

* add some annotations doc

* clip base class & clip_zs sub-class

* some modifications of details after review

* convert into and use mmpretrain-vit

* modify names of some files and directories
@codecov
Copy link

codecov bot commented Sep 25, 2023

Codecov Report

Attention: 1 lines in your changes are missing coverage. Please review.

Files Coverage Δ
configs/_base_/datasets/imagenet_bs128_mbv3.py 100.00% <ø> (ø)
configs/_base_/datasets/imagenet_bs32.py 100.00% <ø> (ø)
...onfigs/_base_/datasets/imagenet_bs32_pil_resize.py 100.00% <ø> (ø)
configs/_base_/datasets/imagenet_bs64_hivit_224.py 100.00% <100.00%> (ø)
configs/_base_/datasets/imagenet_bs64_swin_224.py 100.00% <ø> (ø)
configs/_base_/datasets/imagenet_bs64_swin_384.py 100.00% <ø> (ø)
configs/_base_/models/hivit/tiny_224.py 100.00% <100.00%> (ø)
...gs/_base_/schedules/imagenet_bs1024_adamw_hivit.py 100.00% <100.00%> (ø)
configs/dinov2/vit-base-p14_dinov2-pre_headless.py 100.00% <100.00%> (ø)
configs/sam/vit-base-p16_sam_headless.py 100.00% <100.00%> (ø)
... and 1 more

... and 189 files with indirect coverage changes

📢 Thoughts on this report? Let us know!.

@mzr1996 mzr1996 changed the base branch from main to dev October 25, 2023 08:23
@mzr1996 mzr1996 merged commit ed5924b into open-mmlab:dev Oct 25, 2023
6 of 9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants