Skip to content

[ECCV 2022] unofficial pytorch implementation of the paper "MaxViT: Multi-Axis Vision Transformer"

Notifications You must be signed in to change notification settings

hankyul2/maxvit-pytorch

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 

Repository files navigation

MaxViT (PyTorch version)

This repo contains the unofficial PyTorch-version MaxViT model, training, and validation codes. This repo is written to share the PyTorch-version training hyper-parameters of MaxViT. For this, we just copy-and-paste the training hyper-parameters shown in table 12 of the original paper with the modification of the number of GPUs (we use 4 GPUs). Since most codes including model, train, and valid are copy-pasted from Timm github, the credits should be given to @rwightman and the original authors. See also their repos:

Tutorial

Test environments: torch==1.11.0 & timm==0.9.2

  1. Clone this repo

    git clone https://github.com/hankyul2/maxvit-pytorch
    cd maxvit-pytorch
  2. Run the following command to train MaxViT-T in imagenet-1k dataset. For model variants, just change the --drop-path to 0.3 (small) and 0.4 (base). For training with 4 GPUs, we use the gradient accumulation of 16 = 4096 (paper total batch) / 256 (our total batch).

    Training time: about 5 days for the maxvit_tiny_tf_224 model with 4 GPUs (RTX 3090, 24GB).

    torchrun --nproc_per_node=4 --master_port=12345 train.py /path/to/imagenet --model maxvit_tiny_tf_224 --aa rand-m15-mstd0.5-inc1 --mixup .8 --cutmix 1.0 --remode pixel --reprob 0.25 --drop-path .2 --opt adamw --weight-decay .05 --sched cosine --epochs 300 --lr 3e-3 --warmup-lr 1e-6 --warmup-epoch 30 --min-lr 1e-5 -b 64 -tb 4096 --smoothing 0.1 --clip-grad 1.0 -j 8 --amp --pin-mem --channels-last 
  3. Run the following command to reproduce the validation results of MaxViT-T in the imagenet-1k dataset.

    Results: ** Acc@1 83.820 (16.180) Acc@5 96.528 (3.472)*

    python3 valid.py /path/to/imagenet --img-size 224 --crop-pct 0.95 --cuda 0 --model maxvit_tiny_tf_224 --pretrained

Experiment result

Model Image size #Param FLOPs Top1 Artifacts
MaxViT-T (paper) 224 31M 5.6G 83.62
MaxViT-T (ours) 224 31M 5.6G 83.82 [yaml], [ckpt], [log], [csv]

References

@inproceedings{tu2022maxvit,
  title={Maxvit: Multi-axis vision transformer},
  author={Tu, Zhengzhong and Talebi, Hossein and Zhang, Han and Yang, Feng and Milanfar, Peyman and Bovik, Alan and Li, Yinxiao},
  booktitle={European conference on computer vision},
  pages={459--479},
  year={2022},
  organization={Springer}
}

About

[ECCV 2022] unofficial pytorch implementation of the paper "MaxViT: Multi-Axis Vision Transformer"

Topics

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages