TY - GEN
T1 - Minimizing area and energy of deep learning hardware design using collective low precision and structured compression
AU - Yin, Shihui
AU - Srivastava, Gaurav
AU - Venkataramanaiah, Shreyas K.
AU - Chakrabarti, Chaitali
AU - Berisha, Visar
AU - Seo, Jae-sun
N1 - Funding Information:
ACKNOWLEDGEMENT This work was supported in part by Intel Labs, NSF grants 1652866, 1715443, 1740225 and Office of Naval Research grants N000141410722, N000141712826.
Publisher Copyright:
© 2017 IEEE.
PY - 2018/4/10
Y1 - 2018/4/10
N2 - Deep learning algorithms have shown tremendous success in many recognition tasks; however, these algorithms typically include a deep neural network (DNN) structure and a large number of parameters, which makes it challenging to implement them on power/area-constrained embedded platforms. To reduce the network size, several studies investigated compression by introducing element-wise or row-/column-/block-wise sparsity via pruning and regularization. In addition, many recent works have focused on reducing precision of activations and weights with some reducing down to a single bit. However, combining various sparsity structures with binarized or very-low-precision (2-3 bit) neural networks have not been comprehensively explored. In this work, we present design techniques for minimum-area/-energy DNN hardware with minimal degradation in accuracy. During training, both binarization/low-precision and structured sparsity are applied as constraints to find the smallest memory footprint for a given deep learning algorithm. The DNN model for CIFAR-10 dataset with weight memory reduction of 50X exhibits accuracy comparable to that of the floating-point counterpart. Area, performance and energy results of DNN hardware in 40nm CMOS are reported for the MNIST dataset. The optimized DNN that combines 8X structured compression and 3-bit weight precision showed 98.4% accuracy at 20nJ per classification.
AB - Deep learning algorithms have shown tremendous success in many recognition tasks; however, these algorithms typically include a deep neural network (DNN) structure and a large number of parameters, which makes it challenging to implement them on power/area-constrained embedded platforms. To reduce the network size, several studies investigated compression by introducing element-wise or row-/column-/block-wise sparsity via pruning and regularization. In addition, many recent works have focused on reducing precision of activations and weights with some reducing down to a single bit. However, combining various sparsity structures with binarized or very-low-precision (2-3 bit) neural networks have not been comprehensively explored. In this work, we present design techniques for minimum-area/-energy DNN hardware with minimal degradation in accuracy. During training, both binarization/low-precision and structured sparsity are applied as constraints to find the smallest memory footprint for a given deep learning algorithm. The DNN model for CIFAR-10 dataset with weight memory reduction of 50X exhibits accuracy comparable to that of the floating-point counterpart. Area, performance and energy results of DNN hardware in 40nm CMOS are reported for the MNIST dataset. The optimized DNN that combines 8X structured compression and 3-bit weight precision showed 98.4% accuracy at 20nJ per classification.
UR - http://www.scopus.com/inward/record.url?scp=85050956871&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85050956871&partnerID=8YFLogxK
U2 - 10.1109/ACSSC.2017.8335696
DO - 10.1109/ACSSC.2017.8335696
M3 - Conference contribution
AN - SCOPUS:85050956871
T3 - Conference Record of 51st Asilomar Conference on Signals, Systems and Computers, ACSSC 2017
SP - 1907
EP - 1911
BT - Conference Record of 51st Asilomar Conference on Signals, Systems and Computers, ACSSC 2017
A2 - Matthews, Michael B.
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 51st Asilomar Conference on Signals, Systems and Computers, ACSSC 2017
Y2 - 29 October 2017 through 1 November 2017
ER -