A Flash-based Current-mode IC to Realize Quantized Neural Networks

Kyler R. Scott; Cheng Yen Lee; Sunil P. Khatri; Sarma Vrudhula

doi:10.23919/DATE54114.2022.9774539

A Flash-based Current-mode IC to Realize Quantized Neural Networks

Kyler R. Scott, Cheng Yen Lee, Sunil P. Khatri, Sarma Vrudhula

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

4 Scopus citations

Abstract

This paper presents a mixed-signal architecture for implementing Quantized Neural Networks (QNNs) using flash transistors to achieve extremely high throughput with extremely low power, energy and memory requirements. Its low resource consumption makes our design especially suited for use in edge devices. The network weights are stored in-memory using flash transistors, and nodes perform operations in the analog current domain. Our design can be programmed with any QNN whose hyperparameters (the number of layers, filters, or filter size, etc) do not exceed the maximum provisioned. Once the flash devices are programmed with a trained model and the IC is given an input, our architecture performs inference with zero access to off-chip memory. We demonstrate the robustness of our design under current-mode non-linearities arising from process and voltage variations. We test validation accuracy on the ImageNet dataset, and show that our IC suffers only 0.6% and 1.0% reduction in classification accuracy for Top-1 and Top-5 outputs, respectively. Our implementation results in a sim boldsymbol {50}times reduction in latency and energy when compared to a recently published mixed-signal ASIC implementation, with similar power characteristics. Our approach provides layer partitioning and node sharing possibilities, which allow us to trade off latency, power, and area amongst each other.

Original language	English (US)
Title of host publication	Proceedings of the 2022 Design, Automation and Test in Europe Conference and Exhibition, DATE 2022
Editors	Cristiana Bolchini, Ingrid Verbauwhede, Ioana Vatajelu
Publisher	Institute of Electrical and Electronics Engineers Inc.
Pages	1029-1034
Number of pages	6
ISBN (Electronic)	9783981926361
DOIs	https://doi.org/10.23919/DATE54114.2022.9774539
State	Published - 2022
Event	2022 Design, Automation and Test in Europe Conference and Exhibition, DATE 2022 - Virtual, Online, Belgium Duration: Mar 14 2022 → Mar 23 2022

Publication series

Name	Proceedings of the 2022 Design, Automation and Test in Europe Conference and Exhibition, DATE 2022

Conference

Conference	2022 Design, Automation and Test in Europe Conference and Exhibition, DATE 2022
Country/Territory	Belgium
City	Virtual, Online
Period	3/14/22 → 3/23/22

Keywords

Current-mode Circuits
Floating-gate Transistors
Quantized Neural Networks

ASJC Scopus subject areas

Artificial Intelligence
Computer Networks and Communications
Hardware and Architecture
Software
Safety, Risk, Reliability and Quality
Control and Optimization

Access to Document

10.23919/DATE54114.2022.9774539

Cite this

Scott, K. R., Lee, C. Y., Khatri, S. P., & Vrudhula, S. (2022). A Flash-based Current-mode IC to Realize Quantized Neural Networks. In C. Bolchini, I. Verbauwhede, & I. Vatajelu (Eds.), Proceedings of the 2022 Design, Automation and Test in Europe Conference and Exhibition, DATE 2022 (pp. 1029-1034). (Proceedings of the 2022 Design, Automation and Test in Europe Conference and Exhibition, DATE 2022). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.23919/DATE54114.2022.9774539

A Flash-based Current-mode IC to Realize Quantized Neural Networks. / Scott, Kyler R.; Lee, Cheng Yen; Khatri, Sunil P. et al.
Proceedings of the 2022 Design, Automation and Test in Europe Conference and Exhibition, DATE 2022. ed. / Cristiana Bolchini; Ingrid Verbauwhede; Ioana Vatajelu. Institute of Electrical and Electronics Engineers Inc., 2022. p. 1029-1034 (Proceedings of the 2022 Design, Automation and Test in Europe Conference and Exhibition, DATE 2022).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Scott, KR, Lee, CY, Khatri, SP & Vrudhula, S 2022, A Flash-based Current-mode IC to Realize Quantized Neural Networks. in C Bolchini, I Verbauwhede & I Vatajelu (eds), Proceedings of the 2022 Design, Automation and Test in Europe Conference and Exhibition, DATE 2022. Proceedings of the 2022 Design, Automation and Test in Europe Conference and Exhibition, DATE 2022, Institute of Electrical and Electronics Engineers Inc., pp. 1029-1034, 2022 Design, Automation and Test in Europe Conference and Exhibition, DATE 2022, Virtual, Online, Belgium, 3/14/22. https://doi.org/10.23919/DATE54114.2022.9774539

Scott KR, Lee CY, Khatri SP, Vrudhula S. A Flash-based Current-mode IC to Realize Quantized Neural Networks. In Bolchini C, Verbauwhede I, Vatajelu I, editors, Proceedings of the 2022 Design, Automation and Test in Europe Conference and Exhibition, DATE 2022. Institute of Electrical and Electronics Engineers Inc. 2022. p. 1029-1034. (Proceedings of the 2022 Design, Automation and Test in Europe Conference and Exhibition, DATE 2022). doi: 10.23919/DATE54114.2022.9774539

Scott, Kyler R. ; Lee, Cheng Yen ; Khatri, Sunil P. et al. / A Flash-based Current-mode IC to Realize Quantized Neural Networks. Proceedings of the 2022 Design, Automation and Test in Europe Conference and Exhibition, DATE 2022. editor / Cristiana Bolchini ; Ingrid Verbauwhede ; Ioana Vatajelu. Institute of Electrical and Electronics Engineers Inc., 2022. pp. 1029-1034 (Proceedings of the 2022 Design, Automation and Test in Europe Conference and Exhibition, DATE 2022).

@inproceedings{1c3dd5fa5c384e04b308c8fa6e4b31ec,

title = "A Flash-based Current-mode IC to Realize Quantized Neural Networks",

abstract = "This paper presents a mixed-signal architecture for implementing Quantized Neural Networks (QNNs) using flash transistors to achieve extremely high throughput with extremely low power, energy and memory requirements. Its low resource consumption makes our design especially suited for use in edge devices. The network weights are stored in-memory using flash transistors, and nodes perform operations in the analog current domain. Our design can be programmed with any QNN whose hyperparameters (the number of layers, filters, or filter size, etc) do not exceed the maximum provisioned. Once the flash devices are programmed with a trained model and the IC is given an input, our architecture performs inference with zero access to off-chip memory. We demonstrate the robustness of our design under current-mode non-linearities arising from process and voltage variations. We test validation accuracy on the ImageNet dataset, and show that our IC suffers only 0.6% and 1.0% reduction in classification accuracy for Top-1 and Top-5 outputs, respectively. Our implementation results in a sim boldsymbol {50}times reduction in latency and energy when compared to a recently published mixed-signal ASIC implementation, with similar power characteristics. Our approach provides layer partitioning and node sharing possibilities, which allow us to trade off latency, power, and area amongst each other.",

keywords = "Current-mode Circuits, Floating-gate Transistors, Quantized Neural Networks",

author = "Scott, {Kyler R.} and Lee, {Cheng Yen} and Khatri, {Sunil P.} and Sarma Vrudhula",

note = "Publisher Copyright: {\textcopyright} 2022 EDAA.; 2022 Design, Automation and Test in Europe Conference and Exhibition, DATE 2022 ; Conference date: 14-03-2022 Through 23-03-2022",

year = "2022",

doi = "10.23919/DATE54114.2022.9774539",

language = "English (US)",

series = "Proceedings of the 2022 Design, Automation and Test in Europe Conference and Exhibition, DATE 2022",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

pages = "1029--1034",

editor = "Cristiana Bolchini and Ingrid Verbauwhede and Ioana Vatajelu",

booktitle = "Proceedings of the 2022 Design, Automation and Test in Europe Conference and Exhibition, DATE 2022",

}

TY - GEN

T1 - A Flash-based Current-mode IC to Realize Quantized Neural Networks

AU - Scott, Kyler R.

AU - Lee, Cheng Yen

AU - Khatri, Sunil P.

AU - Vrudhula, Sarma

PY - 2022

Y1 - 2022

N2 - This paper presents a mixed-signal architecture for implementing Quantized Neural Networks (QNNs) using flash transistors to achieve extremely high throughput with extremely low power, energy and memory requirements. Its low resource consumption makes our design especially suited for use in edge devices. The network weights are stored in-memory using flash transistors, and nodes perform operations in the analog current domain. Our design can be programmed with any QNN whose hyperparameters (the number of layers, filters, or filter size, etc) do not exceed the maximum provisioned. Once the flash devices are programmed with a trained model and the IC is given an input, our architecture performs inference with zero access to off-chip memory. We demonstrate the robustness of our design under current-mode non-linearities arising from process and voltage variations. We test validation accuracy on the ImageNet dataset, and show that our IC suffers only 0.6% and 1.0% reduction in classification accuracy for Top-1 and Top-5 outputs, respectively. Our implementation results in a sim boldsymbol {50}times reduction in latency and energy when compared to a recently published mixed-signal ASIC implementation, with similar power characteristics. Our approach provides layer partitioning and node sharing possibilities, which allow us to trade off latency, power, and area amongst each other.

AB - This paper presents a mixed-signal architecture for implementing Quantized Neural Networks (QNNs) using flash transistors to achieve extremely high throughput with extremely low power, energy and memory requirements. Its low resource consumption makes our design especially suited for use in edge devices. The network weights are stored in-memory using flash transistors, and nodes perform operations in the analog current domain. Our design can be programmed with any QNN whose hyperparameters (the number of layers, filters, or filter size, etc) do not exceed the maximum provisioned. Once the flash devices are programmed with a trained model and the IC is given an input, our architecture performs inference with zero access to off-chip memory. We demonstrate the robustness of our design under current-mode non-linearities arising from process and voltage variations. We test validation accuracy on the ImageNet dataset, and show that our IC suffers only 0.6% and 1.0% reduction in classification accuracy for Top-1 and Top-5 outputs, respectively. Our implementation results in a sim boldsymbol {50}times reduction in latency and energy when compared to a recently published mixed-signal ASIC implementation, with similar power characteristics. Our approach provides layer partitioning and node sharing possibilities, which allow us to trade off latency, power, and area amongst each other.

KW - Current-mode Circuits

KW - Floating-gate Transistors

KW - Quantized Neural Networks

UR - http://www.scopus.com/inward/record.url?scp=85130806404&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85130806404&partnerID=8YFLogxK

U2 - 10.23919/DATE54114.2022.9774539

DO - 10.23919/DATE54114.2022.9774539

M3 - Conference contribution

AN - SCOPUS:85130806404

T3 - Proceedings of the 2022 Design, Automation and Test in Europe Conference and Exhibition, DATE 2022

SP - 1029

EP - 1034

BT - Proceedings of the 2022 Design, Automation and Test in Europe Conference and Exhibition, DATE 2022

A2 - Bolchini, Cristiana

A2 - Verbauwhede, Ingrid

A2 - Vatajelu, Ioana

PB - Institute of Electrical and Electronics Engineers Inc.

T2 - 2022 Design, Automation and Test in Europe Conference and Exhibition, DATE 2022

Y2 - 14 March 2022 through 23 March 2022

ER -

A Flash-based Current-mode IC to Realize Quantized Neural Networks

Abstract

Publication series

Conference

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this