TY - GEN
T1 - BNN Pruning
T2 - 21st International Symposium on Quality Electronic Design, ISQED 2020
AU - Li, Yixing
AU - Ren, Fengbo
N1 - Funding Information:
This work is supported by an NSF grant (IIS/CPS-1652038) and an unrestricted gift (CG#1319167) from Cisco Research Center. The computing infrastructure used in this work is supported by an NFS grant (CNS-1629888). The four GPUs used for this research was donated by the NVIDIA Corporation.
PY - 2020/3
Y1 - 2020/3
N2 - A binary neural network (BNN) is a compact form of neural network. Both the weights and activations in BNNs can be binary values, which leads to a significant reduction in both parameter size and computational complexity compared to their full-precision counterparts. Such reductions can directly translate into reduced memory footprint and computation cost in hardware, making BNNs highly suitable for a wide range of hardware accelerators. However, it is unclear whether and how a BNN can be further pruned for ultimate compactness. As both 0s and 1s are non-Trivial in BNNs, it is not proper to adopt any existing pruning method of full-precision networks that interprets 0s as trivial. In this paper, we present a pruning method tailored to BNNs and illustrate that BNNs can be further pruned by using weight flipping frequency as an indicator of sensitivity to accuracy. The experiments performed on the binary versions of a 9-layer Network-in-Network (NIN) and the AlexNet with the CIFAR-10 dataset show that the proposed BNN-pruning method can achieve 20-40% reduction in binary operations with 0.5-1.0% accuracy drop, which leads to a 15-40% runtime speedup on a TitanX GPU.
AB - A binary neural network (BNN) is a compact form of neural network. Both the weights and activations in BNNs can be binary values, which leads to a significant reduction in both parameter size and computational complexity compared to their full-precision counterparts. Such reductions can directly translate into reduced memory footprint and computation cost in hardware, making BNNs highly suitable for a wide range of hardware accelerators. However, it is unclear whether and how a BNN can be further pruned for ultimate compactness. As both 0s and 1s are non-Trivial in BNNs, it is not proper to adopt any existing pruning method of full-precision networks that interprets 0s as trivial. In this paper, we present a pruning method tailored to BNNs and illustrate that BNNs can be further pruned by using weight flipping frequency as an indicator of sensitivity to accuracy. The experiments performed on the binary versions of a 9-layer Network-in-Network (NIN) and the AlexNet with the CIFAR-10 dataset show that the proposed BNN-pruning method can achieve 20-40% reduction in binary operations with 0.5-1.0% accuracy drop, which leads to a 15-40% runtime speedup on a TitanX GPU.
KW - Neural network
KW - binary
KW - pruning
UR - http://www.scopus.com/inward/record.url?scp=85089952675&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85089952675&partnerID=8YFLogxK
U2 - 10.1109/ISQED48828.2020.9136977
DO - 10.1109/ISQED48828.2020.9136977
M3 - Conference contribution
AN - SCOPUS:85089952675
T3 - Proceedings - International Symposium on Quality Electronic Design, ISQED
SP - 306
EP - 311
BT - Proceedings of the 21st International Symposium on Quality Electronic Design, ISQED 2020
PB - IEEE Computer Society
Y2 - 25 March 2020 through 26 March 2020
ER -