This repo contains the unofficial PyTorch-version MaxViT model, training, and validation codes. This repo is written to share the PyTorch-version training hyper-parameters of MaxViT. For this, we just copy-and-paste the training hyper-parameters shown in table 12 of the original paper with the modification of the number of GPUs (we use 4 GPUs). Since most codes including model, train, and valid are copy-pasted from Timm github, the credits should be given to @rwightman and the original authors. See also their repos:
Test environments: torch==1.11.0
& timm==0.9.2
-
Clone this repo
git clone https://github.com/hankyul2/maxvit-pytorch cd maxvit-pytorch
-
Run the following command to train MaxViT-T in imagenet-1k dataset. For model variants, just change the
--drop-path
to0.3 (small)
and0.4 (base)
. For training with 4 GPUs, we use the gradient accumulation of16 = 4096 (paper total batch) / 256 (our total batch)
.Training time: about 5 days for the
maxvit_tiny_tf_224
model with 4 GPUs (RTX 3090, 24GB).torchrun --nproc_per_node=4 --master_port=12345 train.py /path/to/imagenet --model maxvit_tiny_tf_224 --aa rand-m15-mstd0.5-inc1 --mixup .8 --cutmix 1.0 --remode pixel --reprob 0.25 --drop-path .2 --opt adamw --weight-decay .05 --sched cosine --epochs 300 --lr 3e-3 --warmup-lr 1e-6 --warmup-epoch 30 --min-lr 1e-5 -b 64 -tb 4096 --smoothing 0.1 --clip-grad 1.0 -j 8 --amp --pin-mem --channels-last
-
Run the following command to reproduce the validation results of MaxViT-T in the imagenet-1k dataset.
Results: ** Acc@1 83.820 (16.180) Acc@5 96.528 (3.472)*
python3 valid.py /path/to/imagenet --img-size 224 --crop-pct 0.95 --cuda 0 --model maxvit_tiny_tf_224 --pretrained
Model | Image size | #Param | FLOPs | Top1 | Artifacts |
---|---|---|---|---|---|
MaxViT-T (paper) | 224 | 31M | 5.6G | 83.62 | |
MaxViT-T (ours) | 224 | 31M | 5.6G | 83.82 | [yaml], [ckpt], [log], [csv] |
@inproceedings{tu2022maxvit,
title={Maxvit: Multi-axis vision transformer},
author={Tu, Zhengzhong and Talebi, Hossein and Zhang, Han and Yang, Feng and Milanfar, Peyman and Bovik, Alan and Li, Yinxiao},
booktitle={European conference on computer vision},
pages={459--479},
year={2022},
organization={Springer}
}