BNN Pruning: Pruning Binary Neural Network Guided by Weight Flipping Frequency

Yixing Li; Fengbo Ren

doi:10.1109/ISQED48828.2020.9136977

BNN Pruning: Pruning Binary Neural Network Guided by Weight Flipping Frequency

Yixing Li, Fengbo Ren

Computing and Augmented Intelligence, School of (IAFSE-SCAI)

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

9 Scopus citations

Abstract

A binary neural network (BNN) is a compact form of neural network. Both the weights and activations in BNNs can be binary values, which leads to a significant reduction in both parameter size and computational complexity compared to their full-precision counterparts. Such reductions can directly translate into reduced memory footprint and computation cost in hardware, making BNNs highly suitable for a wide range of hardware accelerators. However, it is unclear whether and how a BNN can be further pruned for ultimate compactness. As both 0s and 1s are non-Trivial in BNNs, it is not proper to adopt any existing pruning method of full-precision networks that interprets 0s as trivial. In this paper, we present a pruning method tailored to BNNs and illustrate that BNNs can be further pruned by using weight flipping frequency as an indicator of sensitivity to accuracy. The experiments performed on the binary versions of a 9-layer Network-in-Network (NIN) and the AlexNet with the CIFAR-10 dataset show that the proposed BNN-pruning method can achieve 20-40% reduction in binary operations with 0.5-1.0% accuracy drop, which leads to a 15-40% runtime speedup on a TitanX GPU.

Original language	English (US)
Title of host publication	Proceedings of the 21st International Symposium on Quality Electronic Design, ISQED 2020
Publisher	IEEE Computer Society
Pages	306-311
Number of pages	6
ISBN (Electronic)	9781728142074
DOIs	https://doi.org/10.1109/ISQED48828.2020.9136977
State	Published - Mar 2020
Event	21st International Symposium on Quality Electronic Design, ISQED 2020 - Santa Clara, United States Duration: Mar 25 2020 → Mar 26 2020

Publication series

Name	Proceedings - International Symposium on Quality Electronic Design, ISQED
Volume	2020-March
ISSN (Print)	1948-3287
ISSN (Electronic)	1948-3295

Conference

Conference	21st International Symposium on Quality Electronic Design, ISQED 2020
Country/Territory	United States
City	Santa Clara
Period	3/25/20 → 3/26/20

Keywords

Neural network
binary
pruning

ASJC Scopus subject areas

Hardware and Architecture
Electrical and Electronic Engineering
Safety, Risk, Reliability and Quality

Access to Document

10.1109/ISQED48828.2020.9136977

Cite this

Li, Y., & Ren, F. (2020). BNN Pruning: Pruning Binary Neural Network Guided by Weight Flipping Frequency. In Proceedings of the 21st International Symposium on Quality Electronic Design, ISQED 2020 (pp. 306-311). Article 9136977 (Proceedings - International Symposium on Quality Electronic Design, ISQED; Vol. 2020-March). IEEE Computer Society. https://doi.org/10.1109/ISQED48828.2020.9136977

BNN Pruning: Pruning Binary Neural Network Guided by Weight Flipping Frequency. / Li, Yixing; Ren, Fengbo.
Proceedings of the 21st International Symposium on Quality Electronic Design, ISQED 2020. IEEE Computer Society, 2020. p. 306-311 9136977 (Proceedings - International Symposium on Quality Electronic Design, ISQED; Vol. 2020-March).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Li, Y & Ren, F 2020, BNN Pruning: Pruning Binary Neural Network Guided by Weight Flipping Frequency. in Proceedings of the 21st International Symposium on Quality Electronic Design, ISQED 2020., 9136977, Proceedings - International Symposium on Quality Electronic Design, ISQED, vol. 2020-March, IEEE Computer Society, pp. 306-311, 21st International Symposium on Quality Electronic Design, ISQED 2020, Santa Clara, United States, 3/25/20. https://doi.org/10.1109/ISQED48828.2020.9136977

@inproceedings{0a3cb9e69ede401b8d470e257e2b09ca,

title = "BNN Pruning: Pruning Binary Neural Network Guided by Weight Flipping Frequency",

abstract = "A binary neural network (BNN) is a compact form of neural network. Both the weights and activations in BNNs can be binary values, which leads to a significant reduction in both parameter size and computational complexity compared to their full-precision counterparts. Such reductions can directly translate into reduced memory footprint and computation cost in hardware, making BNNs highly suitable for a wide range of hardware accelerators. However, it is unclear whether and how a BNN can be further pruned for ultimate compactness. As both 0s and 1s are non-Trivial in BNNs, it is not proper to adopt any existing pruning method of full-precision networks that interprets 0s as trivial. In this paper, we present a pruning method tailored to BNNs and illustrate that BNNs can be further pruned by using weight flipping frequency as an indicator of sensitivity to accuracy. The experiments performed on the binary versions of a 9-layer Network-in-Network (NIN) and the AlexNet with the CIFAR-10 dataset show that the proposed BNN-pruning method can achieve 20-40% reduction in binary operations with 0.5-1.0% accuracy drop, which leads to a 15-40% runtime speedup on a TitanX GPU.",

keywords = "Neural network, binary, pruning",

author = "Yixing Li and Fengbo Ren",

note = "Funding Information: This work is supported by an NSF grant (IIS/CPS-1652038) and an unrestricted gift (CG#1319167) from Cisco Research Center. The computing infrastructure used in this work is supported by an NFS grant (CNS-1629888). The four GPUs used for this research was donated by the NVIDIA Corporation.; 21st International Symposium on Quality Electronic Design, ISQED 2020 ; Conference date: 25-03-2020 Through 26-03-2020",

year = "2020",

month = mar,

doi = "10.1109/ISQED48828.2020.9136977",

language = "English (US)",

series = "Proceedings - International Symposium on Quality Electronic Design, ISQED",

publisher = "IEEE Computer Society",

pages = "306--311",

booktitle = "Proceedings of the 21st International Symposium on Quality Electronic Design, ISQED 2020",

}

TY - GEN

T1 - BNN Pruning

T2 - 21st International Symposium on Quality Electronic Design, ISQED 2020

AU - Li, Yixing

AU - Ren, Fengbo

N1 - Funding Information: This work is supported by an NSF grant (IIS/CPS-1652038) and an unrestricted gift (CG#1319167) from Cisco Research Center. The computing infrastructure used in this work is supported by an NFS grant (CNS-1629888). The four GPUs used for this research was donated by the NVIDIA Corporation.

PY - 2020/3

Y1 - 2020/3

N2 - A binary neural network (BNN) is a compact form of neural network. Both the weights and activations in BNNs can be binary values, which leads to a significant reduction in both parameter size and computational complexity compared to their full-precision counterparts. Such reductions can directly translate into reduced memory footprint and computation cost in hardware, making BNNs highly suitable for a wide range of hardware accelerators. However, it is unclear whether and how a BNN can be further pruned for ultimate compactness. As both 0s and 1s are non-Trivial in BNNs, it is not proper to adopt any existing pruning method of full-precision networks that interprets 0s as trivial. In this paper, we present a pruning method tailored to BNNs and illustrate that BNNs can be further pruned by using weight flipping frequency as an indicator of sensitivity to accuracy. The experiments performed on the binary versions of a 9-layer Network-in-Network (NIN) and the AlexNet with the CIFAR-10 dataset show that the proposed BNN-pruning method can achieve 20-40% reduction in binary operations with 0.5-1.0% accuracy drop, which leads to a 15-40% runtime speedup on a TitanX GPU.

AB - A binary neural network (BNN) is a compact form of neural network. Both the weights and activations in BNNs can be binary values, which leads to a significant reduction in both parameter size and computational complexity compared to their full-precision counterparts. Such reductions can directly translate into reduced memory footprint and computation cost in hardware, making BNNs highly suitable for a wide range of hardware accelerators. However, it is unclear whether and how a BNN can be further pruned for ultimate compactness. As both 0s and 1s are non-Trivial in BNNs, it is not proper to adopt any existing pruning method of full-precision networks that interprets 0s as trivial. In this paper, we present a pruning method tailored to BNNs and illustrate that BNNs can be further pruned by using weight flipping frequency as an indicator of sensitivity to accuracy. The experiments performed on the binary versions of a 9-layer Network-in-Network (NIN) and the AlexNet with the CIFAR-10 dataset show that the proposed BNN-pruning method can achieve 20-40% reduction in binary operations with 0.5-1.0% accuracy drop, which leads to a 15-40% runtime speedup on a TitanX GPU.

KW - Neural network

KW - binary

KW - pruning

UR - http://www.scopus.com/inward/record.url?scp=85089952675&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85089952675&partnerID=8YFLogxK

U2 - 10.1109/ISQED48828.2020.9136977

DO - 10.1109/ISQED48828.2020.9136977

M3 - Conference contribution

AN - SCOPUS:85089952675

T3 - Proceedings - International Symposium on Quality Electronic Design, ISQED

SP - 306

EP - 311

BT - Proceedings of the 21st International Symposium on Quality Electronic Design, ISQED 2020

PB - IEEE Computer Society

Y2 - 25 March 2020 through 26 March 2020

ER -

BNN Pruning: Pruning Binary Neural Network Guided by Weight Flipping Frequency

Abstract

Publication series

Conference

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this