RAFConv: Innovating Spatial Attention and Standard Convolutional Operation(preprint)
This repository is a PyTorch implementation of our paper: RFAConv: Innovating Spatial Attention and Standard Convolutional Operation.
We will disclose the full code once the paper has been accepted.
In the classification experiment, the code for Resnet comes from https://github.com/zgcr/pytorch-ImageNet-CIFAR-COCO-VOC-training
In the detection experiment, the YOLOv5&YOLOv8 code comes from https://github.com/ultralytics/yolov5, and the YOLOv7 code comes from https://github.com/WongKinYiu/yolov7.
Mdels |
FLOPS(G) |
Params(M) |
Top1 |
Top5 |
Resnet18 |
1.82 |
11.69 |
69.59 |
89.05 |
+CAMConv(r) |
1.83 |
11.75 |
70.76 |
89.74 |
+CBAMConv(r) |
1.83 |
11.75 |
69.38 |
89.12 |
+CAConv(r) |
1.83 |
11.74 |
70.58 |
89.59 |
+RFAConv(r) |
1.91 |
11.85 |
71.23 |
90.29 |
+RFCAConv(r) |
1.92 |
11.89 |
72.01 |
90.64 |
+RFCBAMConv(r) |
1.90 |
11.88 |
72.15 |
90.71 |
Mdels |
FLOPS(G) |
Params(M) |
Top1 |
Top5 |
Resnet34 |
3.68 |
21.80 |
73.33 |
91.37 |
+CAMConv(r) |
3.68 |
21.93 |
74.03 |
91.69 |
+CBAMConv(r) |
3.68 |
21.93 |
72.95 |
91.26 |
+CAConv(r) |
3.68 |
21.91 |
73.76 |
91.68 |
+RFAConv(r) |
3.84 |
22.16 |
74.25 |
92.03 |
Mdels |
FLOPS(G) |
Params(M) |
mAP50 |
mAP |
time |
YOLOv5n |
4.2 |
1.8 |
67.8 |
41.5 |
2.7 |
+CAMConv(r) |
4.2 |
1.8 |
67.8 |
41.4 |
2.9 |
+CBAMConv(r) |
4.3 |
1.8 |
68.1 |
41.9 |
3.0 |
+CAConv(r) |
4.3 |
1.8 |
68.4 |
42.4 |
3.0 |
+RFAConv(r) |
4.5 |
1.8 |
69.5 |
43.3 |
3.0 |
Mdels |
FLOPS(G) |
Params(M) |
mAP50 |
mAP |
time |
YOLOv5s |
15.9 |
7.1 |
74.4 |
48.9 |
3 |
+CAMConv(r) |
16.0 |
7.1 |
73.9 |
48.5 |
3.5 |
+CBAMConv(r) |
16.0 |
7.1 |
74.1 |
49.0 |
3.7 |
+CAConv(r) |
16.1 |
7.1 |
75 |
49.6 |
3.1 |
+RFAConv(r) |
16.4 |
7.2 |
75 |
50.0 |
5.1 |
+RFCBAMConv(r) |
16.4 |
7.2 |
75.1 |
50.1 |
3.9 |
+RFCAConv(r) |
16.6 |
7.2 |
75.6 |
51.0 |
4.4 |
Mdels |
FLOPS(G) |
Params(M) |
mAP50 |
mAP |
time |
YOLOv7-tiny |
13.2 |
6.1 |
76.4 |
50.2 |
5.0 |
+CAMConv(r) |
13.2 |
6.1 |
76.3 |
50.3 |
5.4 |
+CBAMConv(r) |
13.2 |
6.1 |
76.5 |
50.1 |
5.4 |
+CAConv(r) |
13.2 |
6.1 |
76.6 |
50.5 |
5.4 |
+RFAConv(r) |
13.6 |
6.1 |
76.7 |
50.6 |
7.5 |
Mdels |
FLOPS(G) |
Params(M) |
mAP50 |
mAP |
time |
YOLOv8n |
8.1 |
3.0 |
74.0 |
53.5 |
3.0 |
+CAMConv(r) |
8.1 |
3.0 |
73.8 |
52.8 |
3.1 |
+CBAMConv(r) |
8.2 |
3.0 |
74.4 |
53.3 |
3.1 |
+CAConv(r) |
8.2 |
3.0 |
74.5 |
53.8 |
2.9 |
+RFAConv(r) |
8.4 |
3.1 |
74.7 |
54.0 |
3.2 |
Mdels |
FLOPS(G) |
Params(M) |
AP50 |
AP75 |
AP |
APs |
APm |
APl |
time |
YOLOv5n |
4.5 |
1.8 |
45.6 |
28.9 |
27.5 |
13.5 |
31.5 |
35.9 |
4.4 |
+CAMConv(r) |
4.5 |
1.8 |
45.6 |
28.3 |
27.4 |
13.8 |
31.4 |
35.8 |
5.2 |
+CBAMConv(r) |
4.5 |
1.8 |
45.5 |
28.6 |
27.6 |
13.6 |
31.2 |
36.6 |
5.4 |
+CAConv(r) |
4.5 |
1.8 |
46.2 |
29.2 |
28.1 |
14.3 |
32 |
36.6 |
4.8 |
+RFAConv(r) |
4.7 |
1.9 |
47.3 |
30.6 |
29.0 |
14.8 |
33.4 |
37.4 |
5.3 |
Mdels |
FLOPS(G) |
Params(M) |
AP50 |
AP75 |
AP |
APs |
APm |
APl |
time |
YOLOv7-tiny |
13.7 |
6.2 |
53.8 |
38.3 |
35.9 |
19.9 |
39.4 |
48.8 |
6.8 |
+RFAConv(r) |
14.1 |
6.3 |
55.1 |
40.1 |
37.1 |
20.9 |
41.1 |
50.0 |
8.4 |
Mdels |
FLOPS(G) |
Params(M) |
AP50 |
AP75 |
AP |
APs |
APm |
APl |
time |
YOLOv8n |
8.7 |
3.1 |
51.9 |
39.7 |
36.4 |
18.4 |
40.1 |
52 |
4.2 |
+CAMConv(r) |
8.8 |
3.1 |
51.6 |
39.0 |
36.2 |
18.0 |
39.9 |
51.2 |
4.5 |
+CBAMConv(r) |
8.8 |
3.1 |
51.5 |
39.6 |
36.3 |
18.3 |
40.1 |
51.5 |
4.6 |
+CAConv(r) |
8.8 |
3.1 |
52.1 |
39.9 |
36.7 |
17.8 |
40.3 |
51.6 |
4.3 |
+RFAConv(r) |
9.0 |
3.2 |
53.4 |
41.1 |
37.7 |
18.9 |
41.8 |
52.7 |
4.5 |
+RFCAConv(r) |
9.1 |
3.2 |
53.9 |
41.7 |
38.2 |
19.7 |
42.3 |
53.5 |
4.7 |
@misc{zhang2023rfaconv,
title={RFAConv: Innovating Spatial Attention and Standard Convolutional Operation},
author={Xin Zhang and Chen Liu and Degang Yang and Tingting Song and Yichen Ye and Ke Li and Yingze Song},
year={2023},
eprint={2304.03198},
archivePrefix={arXiv},
primaryClass={cs.CV}
}