TY - GEN
T1 - A Flash-based Current-mode IC to Realize Quantized Neural Networks
AU - Scott, Kyler R.
AU - Lee, Cheng Yen
AU - Khatri, Sunil P.
AU - Vrudhula, Sarma
N1 - Publisher Copyright:
© 2022 EDAA.
PY - 2022
Y1 - 2022
N2 - This paper presents a mixed-signal architecture for implementing Quantized Neural Networks (QNNs) using flash transistors to achieve extremely high throughput with extremely low power, energy and memory requirements. Its low resource consumption makes our design especially suited for use in edge devices. The network weights are stored in-memory using flash transistors, and nodes perform operations in the analog current domain. Our design can be programmed with any QNN whose hyperparameters (the number of layers, filters, or filter size, etc) do not exceed the maximum provisioned. Once the flash devices are programmed with a trained model and the IC is given an input, our architecture performs inference with zero access to off-chip memory. We demonstrate the robustness of our design under current-mode non-linearities arising from process and voltage variations. We test validation accuracy on the ImageNet dataset, and show that our IC suffers only 0.6% and 1.0% reduction in classification accuracy for Top-1 and Top-5 outputs, respectively. Our implementation results in a sim boldsymbol {50}times reduction in latency and energy when compared to a recently published mixed-signal ASIC implementation, with similar power characteristics. Our approach provides layer partitioning and node sharing possibilities, which allow us to trade off latency, power, and area amongst each other.
AB - This paper presents a mixed-signal architecture for implementing Quantized Neural Networks (QNNs) using flash transistors to achieve extremely high throughput with extremely low power, energy and memory requirements. Its low resource consumption makes our design especially suited for use in edge devices. The network weights are stored in-memory using flash transistors, and nodes perform operations in the analog current domain. Our design can be programmed with any QNN whose hyperparameters (the number of layers, filters, or filter size, etc) do not exceed the maximum provisioned. Once the flash devices are programmed with a trained model and the IC is given an input, our architecture performs inference with zero access to off-chip memory. We demonstrate the robustness of our design under current-mode non-linearities arising from process and voltage variations. We test validation accuracy on the ImageNet dataset, and show that our IC suffers only 0.6% and 1.0% reduction in classification accuracy for Top-1 and Top-5 outputs, respectively. Our implementation results in a sim boldsymbol {50}times reduction in latency and energy when compared to a recently published mixed-signal ASIC implementation, with similar power characteristics. Our approach provides layer partitioning and node sharing possibilities, which allow us to trade off latency, power, and area amongst each other.
KW - Current-mode Circuits
KW - Floating-gate Transistors
KW - Quantized Neural Networks
UR - http://www.scopus.com/inward/record.url?scp=85130806404&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85130806404&partnerID=8YFLogxK
U2 - 10.23919/DATE54114.2022.9774539
DO - 10.23919/DATE54114.2022.9774539
M3 - Conference contribution
AN - SCOPUS:85130806404
T3 - Proceedings of the 2022 Design, Automation and Test in Europe Conference and Exhibition, DATE 2022
SP - 1029
EP - 1034
BT - Proceedings of the 2022 Design, Automation and Test in Europe Conference and Exhibition, DATE 2022
A2 - Bolchini, Cristiana
A2 - Verbauwhede, Ingrid
A2 - Vatajelu, Ioana
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2022 Design, Automation and Test in Europe Conference and Exhibition, DATE 2022
Y2 - 14 March 2022 through 23 March 2022
ER -