- Learning both Weights and Connections for Efficient Neural Network.
- Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding.
- SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size.
- 8-Bit Approximations for Parallelism in Deep Learning.
- Neural Networks with Few Multiplications.
- Compression of Deep Convolutional Neural Networks for Fast and Low Power Mobile Applications.
- Hardware-oriented Approximation of Convolutional Neural Networks.
- Reduced-Precision Strategies for Bounded Memory in Deep Neural Nets.
- Quantized Neural Networks: Training Neural Networks with Low Precision Weights and Activations.
- DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients.
- Deep Learning with Limited Numerical Precision.
- Dynamic Network Surgery for Efficient DNNs.
- Understanding the Impact of Precision Quantization on the Accuracy and Energy of Neural Networks.
- Variational Dropout Sparsifies Deep Neural Networks https://github.com/ars-ashuha/variational-dropout-sparsifies-dnn
- Soft Weight-Sharing for Neural Network Compression
- LCNN: Lookup-based Convolutional Neural Network
- Bayesian Compression for Deep Learning
- ThiNet: A Filter Level Pruning Method for Deep Neural Network Compression
- https://petewarden.com/2016/05/03/how-to-quantize-neural-networks-with-tensorflow/
- https://github.com/aaron-xichen/pytorch-playground#quantization
- XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks.
- Binarized Neural Networks: Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1.
- BMXNet: An Open-Source Binary Neural Network Implementation Based on MXNet
- Optimize Deep Convolutional Neural Network with Ternarized Weights and High Accuracy
- MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications
- ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices
- CondenseNet: An Efficient DenseNet using Learned Group Convolutions
- FD-MobileNet: Improved MobileNet with a Fast Downsampling Strategy[2018]
To look at:
https://github.com/dkozlov/awesome-knowledge-distillation
https://github.com/ljk628/ML-Systems/blob/master/dl_cnn.md
https://github.com/songhan/SqueezeNet-Deep-Compression
https://github.com/jiaxiang-wu/quantized-cnn
https://github.com/andyhahaha/Convolutional-Neural-Network-Compression-Survey
https://github.com/Zhouaojun/Efficient-Deep-Learning
https://github.com/NervanaSystems/distiller
https://github.com/ZFTurbo/Keras-inference-time-optimizer