The ever-accelerating progress of technology… gives the appearance of approaching some essential singularity. — John von Neumann, 1958
The Singularity Is Nearer: When We Merge with AI. — Ray Kurzweil, 2024
2Université Côte d'Azur, CNRS, I3S, UMR 7271, France
✉ Corresponding Author
This paper introduces a novel automated filter pruning approach through singular values-driven optimization. Based on the observation and analysis of the distribution of singular values of the overparameterized model, we establish a robust connection between weight redundancy and these values, rendering them potent indicators for automated pruning. The automated structured pruning is formulated as a constrained combinatorial optimization problem spanning all layers, aiming to maximize the nuclear norm of the compact model. This problem is decomposed into two sub-problems: determining the pruning configuration and assessing the filter importance within a layer based on the identified pruning ratio. We introduce two straightforward algorithms to address these sub-problems, effectively handling the global relationship between layers and the inter-filter correlation within each layer. Thorough experiments across 8 architectures, 4 benchmark datasets, and 4 vision tasks underscore the efficacy of our framework.
- 13.11.024: 🎬 Lights, Camera, Action! 🔥Presentation Video Out Now! 🍿 Kick back and enjoy!
- 1.11.2024: Baseline and checkpoints are released 🤗. Get your 👋 dirty 💻!
- 31.10.024: The manuscript has been submitted to Neural Networks.
- FasterRCNN for object detection
- MaskRCNN for instance segmentation
- KeypointRCNN for human keypoint detection
To underscore the practical advantages of SLIMING, an experiment was meticulously conducted, involving a direct comparison between a baseline model and a compressed model, both tailored for object detection tasks. Leveraging the FasterRCNN_ResNet50_FPN architecture on a RTX 3060 GPU, the experiment robustly highlights the substantial performance enhancement achieved by SLIMING. The accompanying GIFs offer a vivid visual depiction: the baseline model showcases an inference speed of approximately 12 FPS, while the SLIMING-compressed model boasts a remarkable twofold acceleration in throughput. This notable disparity effectively showcases SLIMING's efficacy and scalability, firmly establishing its relevance and applicability across diverse deployment scenarios.
Note: For replication of this experiment, please refer to detection/README.md.
Input | CR=0% | CR=50% | CR=64% | CR=78% |
---|---|---|---|---|
The visual representation underscores SLIMING's efficacy in retaining crucial features across a diverse range of classes. Noteworthy is its consistent robustness in capturing and preserving essential information at different CRs. This resilience implies sustained effectiveness and reliability across varying scenarios and compression levels, positioning SLIMING as a versatile choice for network compression across diverse applications and datasets.
- Write detailed documentation.
- Upload compressed models.
- Clean code.
If the code and paper help your research, please kindly cite:
@misc{pham2024singular,
title={Singular values-driven automated filter pruning},
author={Pham, Van Tien and Zniyed, Yassine and Nguyen, Thanh Phuong},
howpublished={\url{https://sliming-ai.github.io/}},
year={2024}
}
This work was granted access to the HPC resources of IDRIS under the
allocation 2023-103147 made by GENCI.
The work of T.P. Nguyen is partially supported by ANR ASTRID ROV-Chasseur.