TY - GEN
T1 - Small-world-based Structural Pruning for Efficient FPGA Inference of Deep Neural Networks
AU - Krishnan, Gokul
AU - Ma, Yufei
AU - Cao, Yu
N1 - Publisher Copyright:
© 2020 IEEE.
PY - 2020/11/3
Y1 - 2020/11/3
N2 - DNN pruning approaches usually trim model parameters without exploiting the intrinsic graph properties and hardware preferences. As a result, an FPGA accelerator may not directly benefit from such random pruning, with additional cost on indexing and control modules. Inspired by the observation that the brain and real-world networks follow a Small-World model, we propose a graph-based progressive structural pruning technique that integrates local clusters and global sparsity in the Small-World graph and the data locality in the FPGA dataflow. The proposed technique hierarchically trims the DNN into a sparse graph before training, which follows both the Small-World property and FPGA dataflow preferences, such as grouped non-zero and zero parameters to skip data load and corresponding computation. The pruned model is then trained for a given dataset and fine-Tuned to achieve the best accuracy. We evaluate the proposed technique for multiple DNNs with different datasets. It achieves state-of-The-Art sparsity ratio of up to 76% for CIFAR-10, 84% for CIFAR-100, and 76% for the SVHN datasets. Moreover, the generated sparse DNN achieves up to 4× improvement in throughput for an output stationary FPGA architecture across different DNNs with a marginal hardware overhead.
AB - DNN pruning approaches usually trim model parameters without exploiting the intrinsic graph properties and hardware preferences. As a result, an FPGA accelerator may not directly benefit from such random pruning, with additional cost on indexing and control modules. Inspired by the observation that the brain and real-world networks follow a Small-World model, we propose a graph-based progressive structural pruning technique that integrates local clusters and global sparsity in the Small-World graph and the data locality in the FPGA dataflow. The proposed technique hierarchically trims the DNN into a sparse graph before training, which follows both the Small-World property and FPGA dataflow preferences, such as grouped non-zero and zero parameters to skip data load and corresponding computation. The pruned model is then trained for a given dataset and fine-Tuned to achieve the best accuracy. We evaluate the proposed technique for multiple DNNs with different datasets. It achieves state-of-The-Art sparsity ratio of up to 76% for CIFAR-10, 84% for CIFAR-100, and 76% for the SVHN datasets. Moreover, the generated sparse DNN achieves up to 4× improvement in throughput for an output stationary FPGA architecture across different DNNs with a marginal hardware overhead.
KW - Deep Neural Network
KW - Graph Efficiency
KW - Pruning
KW - Small-World graph
KW - Sparse FPGA Accelerator
UR - http://www.scopus.com/inward/record.url?scp=85099213253&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85099213253&partnerID=8YFLogxK
U2 - 10.1109/ICSICT49897.2020.9278024
DO - 10.1109/ICSICT49897.2020.9278024
M3 - Conference contribution
AN - SCOPUS:85099213253
T3 - 2020 IEEE 15th International Conference on Solid-State and Integrated Circuit Technology, ICSICT 2020 - Proceedings
BT - 2020 IEEE 15th International Conference on Solid-State and Integrated Circuit Technology, ICSICT 2020 - Proceedings
A2 - Yu, Shaofeng
A2 - Zhu, Xiaona
A2 - Tang, Ting-Ao
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 15th IEEE International Conference on Solid-State and Integrated Circuit Technology, ICSICT 2020
Y2 - 3 November 2020 through 6 November 2020
ER -