Parallelizing SRAM arrays with customized bit-cell for binary neural networks

Rui Liu; Xiaochen Peng; Xiaoyu Sun; Win San Khwa; Xin Si; Jia Jing Chen; Jia Fang Li; Meng Fan Chang; Shimeng Yu

doi:10.1145/3195970.3196089

Parallelizing SRAM arrays with customized bit-cell for binary neural networks

Rui Liu, Xiaochen Peng, Xiaoyu Sun, Win San Khwa, Xin Si, Jia Jing Chen, Jia Fang Li, Meng Fan Chang, Shimeng Yu

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

38 Scopus citations

Abstract

Recent advances in deep neural networks (DNNs) have shown Binary Neural Networks (BNNs) are able to provide a reasonable accuracy on various image datasets with a significant reduction in computation and memory cost. In this paper, we explore two BNNs: Hybrid BNN (HBNN) and XNORBNN, where the weights are binarized to +1/-1 while the neuron activations are binarized to 1/0 and +1/-1, respectively. Two SRAM bit cell designs are proposed, namely, 6T SRAM for HBNN and customized 8T SRAM for XNOR-BNN. In our design, the high-precision multiply-and-accumulate (MAC) is replaced by bitwise multiplication for HBNN or XNOR for XNOR-BNN plus bit-counting operations. To parallelize the weighted sum operation, we activate multiple word lines in the SRAM array simultaneously and digitize the analog voltage developed along the bit line by a multi-level sense amplifier (MLSA). In order to partition the large matrices in DNNs, we investigate the impact of sensing bit-levels of MLSA on the accuracy degradation for different sub-array sizes and propose using the nonlinear quantization technique to mitigate the accuracy degradation. With 64×64 sub-array size and 3-bit MLSA, HBNN and XNORBNN architectures can minimize the accuracy degradation to 2.37% and 0.88%, respectively, for an inspired VGG-16 network on the CIFAR-10 dataset. Design space exploration of SRAM based synaptic architectures with the conventional row-by-row access scheme and our proposed parallel access scheme are also performed, showing significant benefits in the area, latency and energy-efficiency. Finally, we have successfully taped-out and validated the proposed HBNN and XNOR-BNN designs in TSMC 65 nm process with measured silicon data, achieving energyefficiency >100 TOPS/W for HBNN and >50 TOPS/W for XNOR-BNN.

Original language	English (US)
Title of host publication	Proceedings of the 55th Annual Design Automation Conference, DAC 2018
Publisher	Institute of Electrical and Electronics Engineers Inc.
ISBN (Print)	9781450357005
DOIs	https://doi.org/10.1145/3195970.3196089
State	Published - Jun 24 2018
Event	55th Annual Design Automation Conference, DAC 2018 - San Francisco, United States Duration: Jun 24 2018 → Jun 29 2018

Publication series

Name	Proceedings - Design Automation Conference
Volume	Part F137710
ISSN (Print)	0738-100X

Other

Other	55th Annual Design Automation Conference, DAC 2018
Country/Territory	United States
City	San Francisco
Period	6/24/18 → 6/29/18

ASJC Scopus subject areas

Computer Science Applications
Control and Systems Engineering
Electrical and Electronic Engineering
Modeling and Simulation

Access to Document

10.1145/3195970.3196089

Cite this

Liu, R., Peng, X., Sun, X., Khwa, W. S., Si, X., Chen, J. J., Li, J. F., Chang, M. F., & Yu, S. (2018). Parallelizing SRAM arrays with customized bit-cell for binary neural networks. In Proceedings of the 55th Annual Design Automation Conference, DAC 2018 Article a21 (Proceedings - Design Automation Conference; Vol. Part F137710). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1145/3195970.3196089

Parallelizing SRAM arrays with customized bit-cell for binary neural networks. / Liu, Rui; Peng, Xiaochen; Sun, Xiaoyu et al.
Proceedings of the 55th Annual Design Automation Conference, DAC 2018. Institute of Electrical and Electronics Engineers Inc., 2018. a21 (Proceedings - Design Automation Conference; Vol. Part F137710).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Liu, R, Peng, X, Sun, X, Khwa, WS, Si, X, Chen, JJ, Li, JF, Chang, MF & Yu, S 2018, Parallelizing SRAM arrays with customized bit-cell for binary neural networks. in Proceedings of the 55th Annual Design Automation Conference, DAC 2018., a21, Proceedings - Design Automation Conference, vol. Part F137710, Institute of Electrical and Electronics Engineers Inc., 55th Annual Design Automation Conference, DAC 2018, San Francisco, United States, 6/24/18. https://doi.org/10.1145/3195970.3196089

@inproceedings{b095161a076d470e9c995db07e1ca8e5,

title = "Parallelizing SRAM arrays with customized bit-cell for binary neural networks",

abstract = "Recent advances in deep neural networks (DNNs) have shown Binary Neural Networks (BNNs) are able to provide a reasonable accuracy on various image datasets with a significant reduction in computation and memory cost. In this paper, we explore two BNNs: Hybrid BNN (HBNN) and XNORBNN, where the weights are binarized to +1/-1 while the neuron activations are binarized to 1/0 and +1/-1, respectively. Two SRAM bit cell designs are proposed, namely, 6T SRAM for HBNN and customized 8T SRAM for XNOR-BNN. In our design, the high-precision multiply-and-accumulate (MAC) is replaced by bitwise multiplication for HBNN or XNOR for XNOR-BNN plus bit-counting operations. To parallelize the weighted sum operation, we activate multiple word lines in the SRAM array simultaneously and digitize the analog voltage developed along the bit line by a multi-level sense amplifier (MLSA). In order to partition the large matrices in DNNs, we investigate the impact of sensing bit-levels of MLSA on the accuracy degradation for different sub-array sizes and propose using the nonlinear quantization technique to mitigate the accuracy degradation. With 64×64 sub-array size and 3-bit MLSA, HBNN and XNORBNN architectures can minimize the accuracy degradation to 2.37% and 0.88%, respectively, for an inspired VGG-16 network on the CIFAR-10 dataset. Design space exploration of SRAM based synaptic architectures with the conventional row-by-row access scheme and our proposed parallel access scheme are also performed, showing significant benefits in the area, latency and energy-efficiency. Finally, we have successfully taped-out and validated the proposed HBNN and XNOR-BNN designs in TSMC 65 nm process with measured silicon data, achieving energyefficiency >100 TOPS/W for HBNN and >50 TOPS/W for XNOR-BNN.",

author = "Rui Liu and Xiaochen Peng and Xiaoyu Sun and Khwa, {Win San} and Xin Si and Chen, {Jia Jing} and Li, {Jia Fang} and Chang, {Meng Fan} and Shimeng Yu",

year = "2018",

month = jun,

day = "24",

doi = "10.1145/3195970.3196089",

language = "English (US)",

isbn = "9781450357005",

series = "Proceedings - Design Automation Conference",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

booktitle = "Proceedings of the 55th Annual Design Automation Conference, DAC 2018",

note = "55th Annual Design Automation Conference, DAC 2018 ; Conference date: 24-06-2018 Through 29-06-2018",

}

TY - GEN

T1 - Parallelizing SRAM arrays with customized bit-cell for binary neural networks

AU - Liu, Rui

AU - Peng, Xiaochen

AU - Sun, Xiaoyu

AU - Khwa, Win San

AU - Si, Xin

AU - Chen, Jia Jing

AU - Li, Jia Fang

AU - Chang, Meng Fan

AU - Yu, Shimeng

PY - 2018/6/24

Y1 - 2018/6/24

N2 - Recent advances in deep neural networks (DNNs) have shown Binary Neural Networks (BNNs) are able to provide a reasonable accuracy on various image datasets with a significant reduction in computation and memory cost. In this paper, we explore two BNNs: Hybrid BNN (HBNN) and XNORBNN, where the weights are binarized to +1/-1 while the neuron activations are binarized to 1/0 and +1/-1, respectively. Two SRAM bit cell designs are proposed, namely, 6T SRAM for HBNN and customized 8T SRAM for XNOR-BNN. In our design, the high-precision multiply-and-accumulate (MAC) is replaced by bitwise multiplication for HBNN or XNOR for XNOR-BNN plus bit-counting operations. To parallelize the weighted sum operation, we activate multiple word lines in the SRAM array simultaneously and digitize the analog voltage developed along the bit line by a multi-level sense amplifier (MLSA). In order to partition the large matrices in DNNs, we investigate the impact of sensing bit-levels of MLSA on the accuracy degradation for different sub-array sizes and propose using the nonlinear quantization technique to mitigate the accuracy degradation. With 64×64 sub-array size and 3-bit MLSA, HBNN and XNORBNN architectures can minimize the accuracy degradation to 2.37% and 0.88%, respectively, for an inspired VGG-16 network on the CIFAR-10 dataset. Design space exploration of SRAM based synaptic architectures with the conventional row-by-row access scheme and our proposed parallel access scheme are also performed, showing significant benefits in the area, latency and energy-efficiency. Finally, we have successfully taped-out and validated the proposed HBNN and XNOR-BNN designs in TSMC 65 nm process with measured silicon data, achieving energyefficiency >100 TOPS/W for HBNN and >50 TOPS/W for XNOR-BNN.

AB - Recent advances in deep neural networks (DNNs) have shown Binary Neural Networks (BNNs) are able to provide a reasonable accuracy on various image datasets with a significant reduction in computation and memory cost. In this paper, we explore two BNNs: Hybrid BNN (HBNN) and XNORBNN, where the weights are binarized to +1/-1 while the neuron activations are binarized to 1/0 and +1/-1, respectively. Two SRAM bit cell designs are proposed, namely, 6T SRAM for HBNN and customized 8T SRAM for XNOR-BNN. In our design, the high-precision multiply-and-accumulate (MAC) is replaced by bitwise multiplication for HBNN or XNOR for XNOR-BNN plus bit-counting operations. To parallelize the weighted sum operation, we activate multiple word lines in the SRAM array simultaneously and digitize the analog voltage developed along the bit line by a multi-level sense amplifier (MLSA). In order to partition the large matrices in DNNs, we investigate the impact of sensing bit-levels of MLSA on the accuracy degradation for different sub-array sizes and propose using the nonlinear quantization technique to mitigate the accuracy degradation. With 64×64 sub-array size and 3-bit MLSA, HBNN and XNORBNN architectures can minimize the accuracy degradation to 2.37% and 0.88%, respectively, for an inspired VGG-16 network on the CIFAR-10 dataset. Design space exploration of SRAM based synaptic architectures with the conventional row-by-row access scheme and our proposed parallel access scheme are also performed, showing significant benefits in the area, latency and energy-efficiency. Finally, we have successfully taped-out and validated the proposed HBNN and XNOR-BNN designs in TSMC 65 nm process with measured silicon data, achieving energyefficiency >100 TOPS/W for HBNN and >50 TOPS/W for XNOR-BNN.

UR - http://www.scopus.com/inward/record.url?scp=85053692014&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85053692014&partnerID=8YFLogxK

U2 - 10.1145/3195970.3196089

DO - 10.1145/3195970.3196089

M3 - Conference contribution

AN - SCOPUS:85053692014

SN - 9781450357005

T3 - Proceedings - Design Automation Conference

BT - Proceedings of the 55th Annual Design Automation Conference, DAC 2018

PB - Institute of Electrical and Electronics Engineers Inc.

T2 - 55th Annual Design Automation Conference, DAC 2018

Y2 - 24 June 2018 through 29 June 2018

ER -

Parallelizing SRAM arrays with customized bit-cell for binary neural networks

Abstract

Publication series

Other

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this