Skip to content

Latest commit

 

History

History
159 lines (141 loc) · 39.5 KB

MODELHUB.md

File metadata and controls

159 lines (141 loc) · 39.5 KB

Access code for baidu is swin.

ImageNet-1K and ImageNet-22K Pretrained Swin-V1 Models

name pretrain resolution acc@1 acc@5 #params FLOPs FPS 22K model 1K model
Swin-T ImageNet-1K 224x224 81.2 95.5 28M 4.5G 755 - github/baidu/config/log
Swin-S ImageNet-1K 224x224 83.2 96.2 50M 8.7G 437 - github/baidu/config/log
Swin-B ImageNet-1K 224x224 83.5 96.5 88M 15.4G 278 - github/baidu/config/log
Swin-B ImageNet-1K 384x384 84.5 97.0 88M 47.1G 85 - github/baidu/config
Swin-T ImageNet-22K 224x224 80.9 96.0 28M 4.5G 755 github/baidu/config github/baidu/config
Swin-S ImageNet-22K 224x224 83.2 97.0 50M 8.7G 437 github/baidu/config github/baidu/config
Swin-B ImageNet-22K 224x224 85.2 97.5 88M 15.4G 278 github/baidu/config github/baidu/config
Swin-B ImageNet-22K 384x384 86.4 98.0 88M 47.1G 85 github/baidu github/baidu/config
Swin-L ImageNet-22K 224x224 86.3 97.9 197M 34.5G 141 github/baidu/config github/baidu/config
Swin-L ImageNet-22K 384x384 87.3 98.2 197M 103.9G 42 github/baidu github/baidu/config

ImageNet-1K and ImageNet-22K Pretrained Swin-V2 Models

name pretrain resolution window acc@1 acc@5 #params FLOPs FPS 22K model 1K model
SwinV2-T ImageNet-1K 256x256 8x8 81.8 95.9 28M 5.9G 572 - github/baidu/config
SwinV2-S ImageNet-1K 256x256 8x8 83.7 96.6 50M 11.5G 327 - github/baidu/config
SwinV2-B ImageNet-1K 256x256 8x8 84.2 96.9 88M 20.3G 217 - github/baidu/config
SwinV2-T ImageNet-1K 256x256 16x16 82.8 96.2 28M 6.6G 437 - github/baidu/config
SwinV2-S ImageNet-1K 256x256 16x16 84.1 96.8 50M 12.6G 257 - github/baidu/config
SwinV2-B ImageNet-1K 256x256 16x16 84.6 97.0 88M 21.8G 174 - github/baidu/config
SwinV2-B* ImageNet-22K 256x256 16x16 86.2 97.9 88M 21.8G 174 github/baidu/config github/baidu/config
SwinV2-B* ImageNet-22K 384x384 24x24 87.1 98.2 88M 54.7G 57 github/baidu/config github/baidu/config
SwinV2-L* ImageNet-22K 256x256 16x16 86.9 98.0 197M 47.5G 95 github/baidu/config github/baidu/config
SwinV2-L* ImageNet-22K 384x384 24x24 87.6 98.3 197M 115.4G 33 github/baidu/config github/baidu/config

Note:

  • SwinV2-B* (SwinV2-L*) with input resolution of 256x256 and 384x384 both fine-tuned from the same pre-training model using a smaller input resolution of 192x192.
  • SwinV2-B* (384x384) achieves 78.08 acc@1 on ImageNet-1K-V2 while SwinV2-L* (384x384) achieves 78.31.

ImageNet-1K Pretrained Swin MLP Models

name pretrain resolution acc@1 acc@5 #params FLOPs FPS 1K model
Mixer-B/16 ImageNet-1K 224x224 76.4 - 59M 12.7G - official repo
ResMLP-S24 ImageNet-1K 224x224 79.4 - 30M 6.0G 715 timm
ResMLP-B24 ImageNet-1K 224x224 81.0 - 116M 23.0G 231 timm
Swin-T/C24 ImageNet-1K 256x256 81.6 95.7 28M 5.9G 563 github/baidu/config
SwinMLP-T/C24 ImageNet-1K 256x256 79.4 94.6 20M 4.0G 807 github/baidu/config
SwinMLP-T/C12 ImageNet-1K 256x256 79.6 94.7 21M 4.0G 792 github/baidu/config
SwinMLP-T/C6 ImageNet-1K 256x256 79.7 94.9 23M 4.0G 766 github/baidu/config
SwinMLP-B ImageNet-1K 224x224 81.3 95.3 61M 10.4G 409 github/baidu/config

Note: C24 means each head has 24 channels.

ImageNet-22K Pretrained Swin-MoE Models

name #experts k router resolution window IN-22K acc@1 IN-1K/ft acc@1 IN-1K/5-shot acc@1 22K model
Swin-MoE-S 1 (dense) - - 192x192 8x8 35.5 83.5 70.3 github/baidu/config
Swin-MoE-S 8 1 Linear 192x192 8x8 36.8 84.5 75.2 github/baidu/config
Swin-MoE-S 16 1 Linear 192x192 8x8 37.6 84.9 76.5 github/baidu/config
Swin-MoE-S 32 1 Linear 192x192 8x8 37.4 84.7 75.9 github/baidu/config
Swin-MoE-S 32 1 Cosine 192x192 8x8 37.2 84.3 75.2 github/baidu/config
Swin-MoE-S 64 1 Linear 192x192 8x8 37.8 84.7 75.7 -
Swin-MoE-S 128 1 Linear 192x192 8x8 37.4 84.5 75.4 -
Swin-MoE-B 1 (dense) - - 192x192 8x8 37.3 85.1 75.9 config
Swin-MoE-B 8 1 Linear 192x192 8x8 38.1 85.3 77.2 config
Swin-MoE-B 16 1 Linear 192x192 8x8 38.7 85.5 78.2 config
Swin-MoE-B 32 1 Linear 192x192 8x8 38.6 85.5 77.9 config
Swin-MoE-B 32 1 Cosine 192x192 8x8 38.5 85.3 77.3 config
Swin-MoE-B 32 2 Linear 192x192 8x8 38.6 85.5 78.7 -

SimMIM Pretrained Swin-V2 Models

Please note that all SimMIM pretrained Swin-V2 models will be stored in the Huggingface repository starting July 2024. For more details, refer to the huggingface repository.

  • Model size only includes the backbone weights and excludes weights in the decoders/classification heads.
  • Batch size for all models is set to 2048.
  • Validation loss is calculated on the ImageNet-1K validation set.
  • Fine-tuned acc@1 refers to the top-1 accuracy on the ImageNet-1K validation set after fine-tuning.
name model size pre-train dataset pre-train iterations validation loss fine-tuned acc@1 pre-trained model fine-tuned model
SwinV2-Small 49M ImageNet-1K 10% 125k 0.4820 82.69 huggingface huggingface
SwinV2-Small 49M ImageNet-1K 10% 250k 0.4961 83.11 huggingface huggingface
SwinV2-Small 49M ImageNet-1K 10% 500k 0.5115 83.17 huggingface huggingface
SwinV2-Small 49M ImageNet-1K 20% 125k 0.4751 83.05 huggingface huggingface
SwinV2-Small 49M ImageNet-1K 20% 250k 0.4722 83.56 huggingface huggingface
SwinV2-Small 49M ImageNet-1K 20% 500k 0.4734 83.75 huggingface huggingface
SwinV2-Small 49M ImageNet-1K 50% 125k 0.4732 83.04 huggingface huggingface
SwinV2-Small 49M ImageNet-1K 50% 250k 0.4681 83.67 huggingface huggingface
SwinV2-Small 49M ImageNet-1K 50% 500k 0.4646 83.96 huggingface huggingface
SwinV2-Small 49M ImageNet-1K 125k 0.4728 82.92 huggingface huggingface
SwinV2-Small 49M ImageNet-1K 250k 0.4674 83.66 huggingface huggingface
SwinV2-Small 49M ImageNet-1K 500k 0.4641 84.08 huggingface huggingface
SwinV2-Base 87M ImageNet-1K 10% 125k 0.4822 83.33 huggingface huggingface
SwinV2-Base 87M ImageNet-1K 10% 250k 0.4997 83.60 huggingface huggingface
SwinV2-Base 87M ImageNet-1K 10% 500k 0.5112 83.41 huggingface huggingface
SwinV2-Base 87M ImageNet-1K 20% 125k 0.4703 83.86 huggingface huggingface
SwinV2-Base 87M ImageNet-1K 20% 250k 0.4679 84.37 huggingface huggingface
SwinV2-Base 87M ImageNet-1K 20% 500k 0.4711 84.61 huggingface huggingface
SwinV2-Base 87M ImageNet-1K 50% 125k 0.4683 84.04 huggingface huggingface
SwinV2-Base 87M ImageNet-1K 50% 250k 0.4633 84.57 huggingface huggingface
SwinV2-Base 87M ImageNet-1K 50% 500k 0.4598 84.95 huggingface huggingface
SwinV2-Base 87M ImageNet-1K 125k 0.4680 84.13 huggingface huggingface
SwinV2-Base 87M ImageNet-1K 250k 0.4626 84.65 huggingface huggingface
SwinV2-Base 87M ImageNet-1K 500k 0.4588 85.04 huggingface huggingface
SwinV2-Base 87M ImageNet-22K 125k 0.4695 84.11 huggingface huggingface
SwinV2-Base 87M ImageNet-22K 250k 0.4649 84.57 huggingface huggingface
SwinV2-Base 87M ImageNet-22K 500k 0.4614 85.11 huggingface huggingface
SwinV2-Large 195M ImageNet-1K 10% 125k 0.4995 83.69 huggingface huggingface
SwinV2-Large 195M ImageNet-1K 10% 250k 0.5140 83.66 huggingface huggingface
SwinV2-Large 195M ImageNet-1K 10% 500k 0.5150 83.50 huggingface huggingface
SwinV2-Large 195M ImageNet-1K 20% 125k 0.4675 84.38 huggingface huggingface
SwinV2-Large 195M ImageNet-1K 20% 250k 0.4746 84.71 huggingface huggingface
SwinV2-Large 195M ImageNet-1K 20% 500k 0.4960 84.59 huggingface huggingface
SwinV2-Large 195M ImageNet-1K 50% 125k 0.4622 84.78 huggingface huggingface
SwinV2-Large 195M ImageNet-1K 50% 250k 0.4566 85.38 huggingface huggingface
SwinV2-Large 195M ImageNet-1K 50% 500k 0.4530 85.80 huggingface huggingface
SwinV2-Large 195M ImageNet-1K 125k 0.4611 84.98 huggingface huggingface
SwinV2-Large 195M ImageNet-1K 250k 0.4552 85.45 huggingface huggingface
SwinV2-Large 195M ImageNet-1K 500k 0.4507 85.91 huggingface huggingface
SwinV2-Large 195M ImageNet-22K 125k 0.4649 84.61 huggingface huggingface
SwinV2-Large 195M ImageNet-22K 250k 0.4586 85.39 huggingface huggingface
SwinV2-Large 195M ImageNet-22K 500k 0.4536 85.81 huggingface huggingface
SwinV2-Huge 655M ImageNet-1K 20% 125k 0.4789 84.35 huggingface huggingface
SwinV2-Huge 655M ImageNet-1K 20% 250k 0.5038 84.16 huggingface huggingface
SwinV2-Huge 655M ImageNet-1K 20% 500k 0.5071 83.44 huggingface huggingface
SwinV2-Huge 655M ImageNet-1K 50% 125k 0.4549 85.09 huggingface huggingface
SwinV2-Huge 655M ImageNet-1K 50% 250k 0.4511 85.64 huggingface huggingface
SwinV2-Huge 655M ImageNet-1K 50% 500k 0.4559 85.69 huggingface huggingface
SwinV2-Huge 655M ImageNet-1K 125k 0.4531 85.23 huggingface huggingface
SwinV2-Huge 655M ImageNet-1K 250k 0.4464 85.90 huggingface huggingface
SwinV2-Huge 655M ImageNet-1K 500k 0.4416 86.34 huggingface huggingface
SwinV2-Huge 655M ImageNet-22K 125k 0.4564 85.14 huggingface huggingface
SwinV2-Huge 655M ImageNet-22K 250k 0.4499 85.86 huggingface huggingface
SwinV2-Huge 655M ImageNet-22K 500k 0.4444 86.27 huggingface huggingface
SwinV2-giant 1.06B ImageNet-1K 50% 125k 0.4534 85.44 huggingface huggingface
SwinV2-giant 1.06B ImageNet-1K 50% 250k 0.4515 85.76 huggingface huggingface
SwinV2-giant 1.06B ImageNet-1K 50% 500k 0.4719 85.51 huggingface huggingface
SwinV2-giant 1.06B ImageNet-1K 125k 0.4513 85.57 huggingface huggingface
SwinV2-giant 1.06B ImageNet-1K 250k 0.4442 86.12 huggingface huggingface
SwinV2-giant 1.06B ImageNet-1K 500k 0.4395 86.46 huggingface huggingface
SwinV2-giant 1.06B ImageNet-22K 125k 0.4544 85.39 huggingface huggingface
SwinV2-giant 1.06B ImageNet-22K 250k 0.4475 85.96 huggingface huggingface
SwinV2-giant 1.06B ImageNet-22K 500k 0.4416 86.53 huggingface huggingface

SimMIM Pretrained Swin-V1 Models

ImageNet-1K Pre-trained and Fine-tuned Models

name pre-train epochs pre-train resolution fine-tune resolution acc@1 pre-trained model fine-tuned model
Swin-Base 100 192x192 192x192 82.8 google/config google/config
Swin-Base 100 192x192 224x224 83.5 google/config google/config
Swin-Base 800 192x192 224x224 84.0 google/config google/config
Swin-Large 800 192x192 224x224 85.4 google/config google/config
SwinV2-Huge 800 192x192 224x224 85.7 / /
SwinV2-Huge 800 192x192 512x512 87.1 / /