A Latency-Optimized Reconfigurable NoC for In-Memory Acceleration of DNNs

Sumit K. Mandal; Gokul Krishnan; Chaitali Chakrabarti; Jae Sun Seo; Yu Cao; Umit Y. Ogras

doi:10.1109/JETCAS.2020.3015509

A Latency-Optimized Reconfigurable NoC for In-Memory Acceleration of DNNs

Sumit K. Mandal, Gokul Krishnan, Chaitali Chakrabarti, Jae Sun Seo, Yu Cao, Umit Y. Ogras

Engineering, Ira A. Fulton Schools of (IAFSE)

Research output: Contribution to journal › Article › peer-review

20 Scopus citations

Abstract

In-memory computing reduces latency and energy consumption of Deep Neural Networks (DNNs) by reducing the number of off-chip memory accesses. However, crossbar-based in-memory computing may significantly increase the volume of on-chip communication since the weights and activations are on-chip. State-of-the-art interconnect methodologies for in-memory computing deploy a bus-based network or mesh-based Network-on-Chip (NoC). Our experiments show that up to 90% of the total inference latency of a DNN hardware is spent on on-chip communication when the bus-based network is used. To reduce the communication latency, we propose a methodology to generate an NoC architecture along with a scheduling technique customized for different DNNs. We prove mathematically that the generated NoC architecture and corresponding schedules achieve the minimum possible communication latency for a given DNN. Furthermore, we generalize the proposed solution for edge computing and cloud computing. Experimental evaluations on a wide range of DNNs show that the proposed NoC architecture enables 20%-80% reduction in communication latency with respect to state-of-the-art interconnect solutions.

Original language	English (US)
Article number	9164917
Pages (from-to)	362-375
Number of pages	14
Journal	IEEE Journal on Emerging and Selected Topics in Circuits and Systems
Volume	10
Issue number	3
DOIs	https://doi.org/10.1109/JETCAS.2020.3015509
State	Published - Sep 2020

Keywords

In-memory computing
deep neural networks
interconnect
network-on-chip
neural network accelerator

ASJC Scopus subject areas

Electrical and Electronic Engineering

Access to Document

10.1109/JETCAS.2020.3015509

Cite this

@article{e20f1bb6cd4740fb8b209d1cf5bc68d6,

title = "A Latency-Optimized Reconfigurable NoC for In-Memory Acceleration of DNNs",

abstract = "In-memory computing reduces latency and energy consumption of Deep Neural Networks (DNNs) by reducing the number of off-chip memory accesses. However, crossbar-based in-memory computing may significantly increase the volume of on-chip communication since the weights and activations are on-chip. State-of-the-art interconnect methodologies for in-memory computing deploy a bus-based network or mesh-based Network-on-Chip (NoC). Our experiments show that up to 90% of the total inference latency of a DNN hardware is spent on on-chip communication when the bus-based network is used. To reduce the communication latency, we propose a methodology to generate an NoC architecture along with a scheduling technique customized for different DNNs. We prove mathematically that the generated NoC architecture and corresponding schedules achieve the minimum possible communication latency for a given DNN. Furthermore, we generalize the proposed solution for edge computing and cloud computing. Experimental evaluations on a wide range of DNNs show that the proposed NoC architecture enables 20%-80% reduction in communication latency with respect to state-of-the-art interconnect solutions.",

keywords = "In-memory computing, deep neural networks, interconnect, network-on-chip, neural network accelerator",

author = "Mandal, {Sumit K.} and Gokul Krishnan and Chaitali Chakrabarti and Seo, {Jae Sun} and Yu Cao and Ogras, {Umit Y.}",

note = "Funding Information: Manuscript received May 30, 2020; revised July 21, 2020; accepted August 2, 2020. Date of publication August 11, 2020; date of current version September 21, 2020. This work was supported in part by the Center for Brain-inspired Computing (C-BRIC), one of the six centers in JUMP, the Semiconductor Research Corporation Program sponsored by the Defense Advanced Research Projects Agency (DARPA) through the NSF CAREER Award under Grant CNS-1651624 and in part by the Semiconductor Research Corporation under Grant ID 2938.001. This article was recommended by Guest Editor K.-C. Chen. (Corresponding author: Sumit K. Mandal.) The authors are with the School of Electrical, Computer and Energy Engineering, Arizona State University, Tempe, AZ 85287 USA (e-mail: skmandal@asu.edu; gkrish19@asu.edu; chaitali@asu.edu; jseo28@asu.edu; yu.cao@asu.edu; umit@asu.edu). Publisher Copyright: {\textcopyright} 2011 IEEE.",

year = "2020",

month = sep,

doi = "10.1109/JETCAS.2020.3015509",

language = "English (US)",

volume = "10",

pages = "362--375",

journal = "IEEE Journal on Emerging and Selected Topics in Circuits and Systems",

issn = "2156-3357",

publisher = "IEEE Circuits and Systems Society",

number = "3",

}

TY - JOUR

T1 - A Latency-Optimized Reconfigurable NoC for In-Memory Acceleration of DNNs

AU - Mandal, Sumit K.

AU - Krishnan, Gokul

AU - Chakrabarti, Chaitali

AU - Seo, Jae Sun

AU - Cao, Yu

AU - Ogras, Umit Y.

N1 - Funding Information: Manuscript received May 30, 2020; revised July 21, 2020; accepted August 2, 2020. Date of publication August 11, 2020; date of current version September 21, 2020. This work was supported in part by the Center for Brain-inspired Computing (C-BRIC), one of the six centers in JUMP, the Semiconductor Research Corporation Program sponsored by the Defense Advanced Research Projects Agency (DARPA) through the NSF CAREER Award under Grant CNS-1651624 and in part by the Semiconductor Research Corporation under Grant ID 2938.001. This article was recommended by Guest Editor K.-C. Chen. (Corresponding author: Sumit K. Mandal.) The authors are with the School of Electrical, Computer and Energy Engineering, Arizona State University, Tempe, AZ 85287 USA (e-mail: skmandal@asu.edu; gkrish19@asu.edu; chaitali@asu.edu; jseo28@asu.edu; yu.cao@asu.edu; umit@asu.edu). Publisher Copyright: © 2011 IEEE.

PY - 2020/9

Y1 - 2020/9

N2 - In-memory computing reduces latency and energy consumption of Deep Neural Networks (DNNs) by reducing the number of off-chip memory accesses. However, crossbar-based in-memory computing may significantly increase the volume of on-chip communication since the weights and activations are on-chip. State-of-the-art interconnect methodologies for in-memory computing deploy a bus-based network or mesh-based Network-on-Chip (NoC). Our experiments show that up to 90% of the total inference latency of a DNN hardware is spent on on-chip communication when the bus-based network is used. To reduce the communication latency, we propose a methodology to generate an NoC architecture along with a scheduling technique customized for different DNNs. We prove mathematically that the generated NoC architecture and corresponding schedules achieve the minimum possible communication latency for a given DNN. Furthermore, we generalize the proposed solution for edge computing and cloud computing. Experimental evaluations on a wide range of DNNs show that the proposed NoC architecture enables 20%-80% reduction in communication latency with respect to state-of-the-art interconnect solutions.

AB - In-memory computing reduces latency and energy consumption of Deep Neural Networks (DNNs) by reducing the number of off-chip memory accesses. However, crossbar-based in-memory computing may significantly increase the volume of on-chip communication since the weights and activations are on-chip. State-of-the-art interconnect methodologies for in-memory computing deploy a bus-based network or mesh-based Network-on-Chip (NoC). Our experiments show that up to 90% of the total inference latency of a DNN hardware is spent on on-chip communication when the bus-based network is used. To reduce the communication latency, we propose a methodology to generate an NoC architecture along with a scheduling technique customized for different DNNs. We prove mathematically that the generated NoC architecture and corresponding schedules achieve the minimum possible communication latency for a given DNN. Furthermore, we generalize the proposed solution for edge computing and cloud computing. Experimental evaluations on a wide range of DNNs show that the proposed NoC architecture enables 20%-80% reduction in communication latency with respect to state-of-the-art interconnect solutions.

KW - In-memory computing

KW - deep neural networks

KW - interconnect

KW - network-on-chip

KW - neural network accelerator

UR - http://www.scopus.com/inward/record.url?scp=85093363447&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85093363447&partnerID=8YFLogxK

U2 - 10.1109/JETCAS.2020.3015509

DO - 10.1109/JETCAS.2020.3015509

M3 - Article

AN - SCOPUS:85093363447

SN - 2156-3357

VL - 10

SP - 362

EP - 375

JO - IEEE Journal on Emerging and Selected Topics in Circuits and Systems

JF - IEEE Journal on Emerging and Selected Topics in Circuits and Systems

IS - 3

M1 - 9164917

ER -

A Latency-Optimized Reconfigurable NoC for In-Memory Acceleration of DNNs

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this