Skip to content

YOLOv3 on a MobileNetV3_Small architecture; trained, explained, pruned and quantized for text detection.

License

Notifications You must be signed in to change notification settings

MK2112/mobileYOLOv3

Repository files navigation

mobileYOLOv3

YOLOv3 via a MobileNetV3 backbone for text detection; pruned, quantized, optimized, and explained for deployment on mobile devices. Primarily intended as a single source for learning about YOLO(v3) in an applied manner.

Roadmap

  • Pretrained MobileNetV2 backbone
  • Introduce the YOLOv3 paradigm
  • Basic Pruning, Quantization integration
  • Training pipeline (for ICDAR 2015)
  • Switch backbone to MobileNetV3
  • Mixed Precision Training
  • Pruning and quantization
  • Add textbook-style explanations for YOLOv3
  • Extended training pipeline (COCO-Text dataset, batch augmentation, etc.)
  • Live Image-Feed Inference

References

  • [1] YOLOv3 - Farhadi, A., & Redmon, J. (2018, June). Yolov3: An incremental improvement. In Computer vision and pattern recognition (Vol. 1804, pp. 1-6). Berlin/Heidelberg, Germany: Springer.
  • [2] ICDAR 2015 - Kaggle.com
  • [3] Mobile App Use Cases - Sarker, I. H., Hoque, M. M., Uddin, M. K., & Alsanoosy, T. (2021). Mobile data science and intelligent apps: concepts, AI-based modeling and research directions. Mobile Networks and Applications, 26(1), 285-303.
  • [4] R-CNN - Ren, S., He, K., Girshick, R., & Sun, J. (2016). Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE transactions on pattern analysis and machine intelligence, 39(6), 1137-1149.
  • [5] Sliding Window Detectors - Dalal, N., & Triggs, B. (2005, June). Histograms of oriented gradients for human detection. In 2005 IEEE computer society conference on computer vision and pattern recognition (CVPR'05) (Vol. 1, pp. 886-893). Ieee.
  • [6] CIoU - Zheng, Z., Wang, P., Liu, W., Li, J., Ye, R., & Ren, D. (2020, April). Distance-IoU loss: Faster and better learning for bounding box regression. In Proceedings of the AAAI conference on artificial intelligence (Vol. 34, No. 07, pp. 12993-13000).
  • [7] Focal Loss - Ross, T. Y., & Dollár, G. K. H. P. (2017, July). Focal loss for dense object detection. In proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2980-2988).
  • [8] Label Smoothing - Lukasik, M., Bhojanapalli, S., Menon, A., & Kumar, S. (2020, November). Does label smoothing mitigate label noise?. In International Conference on Machine Learning (pp. 6448-6458). PMLR.
  • [9] Activation Functions - Ramachandran, P., Zoph, B., & Le, Q. V. (2017). Searching for activation functions. arXiv preprint arXiv:1710.05941.
  • [10] MobileNetV3 - Howard, A., Sandler, M., Chu, G., Chen, L. C., Chen, B., Tan, M., ... & Adam, H. (2019). Searching for mobilenetv3. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 1314-1324).
  • [11] ECA - Wang, Q., Wu, B., Zhu, P., Li, P., Zuo, W., & Hu, Q. (2020). ECA-Net: Efficient channel attention for deep convolutional neural networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 11534-11542).
  • [12] DConv - Chollet, F. (2017). Xception: Deep learning with depthwise separable convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1251-1258).
  • [13] Dropout - Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. (2014). Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research, 15(1), 1929-1958.
  • [14] Zhang, H. (2017). mixup: Beyond empirical risk minimization. arXiv preprint arXiv:1710.09412.
  • [15] OneCycleLR - Smith, L. N., & Topin, N. (2019, May). Super-convergence: Very fast training of neural networks using large learning rates. In Artificial intelligence and machine learning for multi-domain operations applications (Vol. 11006, pp. 369-386). SPIE.
  • [16] Lookahead - Zhang, M., Lucas, J., Ba, J., & Hinton, G. E. (2019). Lookahead optimizer: k steps forward, 1 step back. Advances in neural information processing systems, 32.
  • [17] Pruning - Vadera, S., & Ameen, S. (2022). Methods for pruning deep neural networks. IEEE Access, 10, 63280-63300.
  • [18] Quantization - Han, S., Mao, H., & Dally, W. J. (2015). Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. arXiv preprint arXiv:1510.00149.

About

YOLOv3 on a MobileNetV3_Small architecture; trained, explained, pruned and quantized for text detection.

Topics

Resources

License

Stars

Watchers

Forks