TY - GEN
T1 - Interconnect-Centric Benchmarking of In-Memory Acceleration for DNNS
AU - Krishnan, Gakul
AU - Mandai, Sumit K.
AU - Chakrabarti, Chaitali
AU - Seo, Jae Sun
AU - Ogras, Umit Y.
AU - Cao, Yu
N1 - Funding Information:
This work presents a novel performance benchmark tool for IMC architectures that incorporates device, circuits, architecture, and interconnect under a single roof. The tool assesses the area, energy, and latency of the IMC accelerator. To illustrate the efficacy and versatility of the tool, we show three use cases that utilize the proposed IMC benchmarking tool. The three cases show that the tool provides greater flexibility for IMC architecture design space exploration. Furthermore, we motivate the need for future work in the design of optimal on-chip and off-chip interconnect fabrics for energy-efficient IMC acceleration of DNNs. ACKNOWLEDGEMENTS This work was supported in part by C-BRIC, one of the six centers in JUMP, a Semiconductor Research Corporation program sponsored by DARPA, NSF CAREER Award CNS-1651624 and Semiconductor Research Corporation under task ID 3012.001. REFERENCES [1] X. Du, et al. Towards Efficient Neural Networks On-a-Chip: Joint Hardware-Algorithm Approaches. IEEE CSTIC, 2019. [2] M. Horowitz. Computing's Energy Problem (And What We Can Do About it). IEEE ISSCC, 2014. [3] G. Huang, et al. Densely Connected Convolutional Networks. IEEE CVPR, 2017. [4] A. Shafiee, et al. ISAAC: A Convolutional Neural Network Accelerator with In-Situ Analog Arithmetic in Crossbars. ACM/IEEE ISCA, 2016. [5] L. Song, et al. Pipelayer: A Pipelined Reram-based Accelerator for Deep Learning. IEEE HPCA, 2017. [6] G. Krishnan, et al. Interconnect-aware Area and Energy Optimization for In-Memory Acceleration of DNNs. IEEE Design & Test, 2020. [7] SK. Mandal, et al. A Latency-Optimized Reconfigurable NoC for In-Memory Acceleration of DNNs." IEEE JETCAS, 2020. [8] H. Kwon, et al. Maeri: Enabling Flexible Dataflow Mapping over DNN Accelerators via Reconfigurable Interconnects." ACM SIGPLAN Notices, 2018. [9] N. Jiang, et al. A Detailed and Flexible Cycle-Accurate Network-on-Chip Simulator. IEEE ISPASS, 2013. [10] SK. Mandal, et al. Performance Analysis of Priority-aware NoCs with Deflection Routing under Traffic Congestion." ICCAD, 2020. [11] Z. Zhu, et al. "MNSIM 2.0: A Behavior-Level Modeling Tool for Memristor-based Neuromorphic Computing Systems." IEEE GLSVLSI, 2020. [12] X. Peng, et al. DNN+ NeuroSim: An End-to-End Benchmarking Framework for Compute-in-Memory Accelerators with Versatile Device Technologies. IEEE IEDM, 2019. [13] https://github.com/gkrish19/SIAM
Publisher Copyright:
© 2021 IEEE.
PY - 2021/3/14
Y1 - 2021/3/14
N2 - In-memory computing (IMC) provides a dense and parallel structure for high performance and energy-efficient acceleration of deep neural networks (DNNs). The increased computational density of IMC architectures results in increased on -chip communication costs, stressing the interconnect fabric. In this work, we develop a novel performance benchmark tool for IMC architectures that incorporates device, circuits, architecture, and interconnect under a single roof. The tool assesses the area, energy, and latency of the IMC accelerator. We analyze three interconnect cases to illustrate the versatility of the tool: (1) Point-to-point (P2P) and network-on-chip (NoC) based IMC architectures to demonstrate the criticality of the interconnect choice; (2) Area and energy optimization to improve IMC utilization and reduce on-chip interconnect cost; (3) Evaluation of a reconfigurable NoC to achieve minimum on-chip communication latency. Through these studies, we motivate the need for future work in the design of optimal on-chip and off-chip interconnect fabrics for IMC architectures.
AB - In-memory computing (IMC) provides a dense and parallel structure for high performance and energy-efficient acceleration of deep neural networks (DNNs). The increased computational density of IMC architectures results in increased on -chip communication costs, stressing the interconnect fabric. In this work, we develop a novel performance benchmark tool for IMC architectures that incorporates device, circuits, architecture, and interconnect under a single roof. The tool assesses the area, energy, and latency of the IMC accelerator. We analyze three interconnect cases to illustrate the versatility of the tool: (1) Point-to-point (P2P) and network-on-chip (NoC) based IMC architectures to demonstrate the criticality of the interconnect choice; (2) Area and energy optimization to improve IMC utilization and reduce on-chip interconnect cost; (3) Evaluation of a reconfigurable NoC to achieve minimum on-chip communication latency. Through these studies, we motivate the need for future work in the design of optimal on-chip and off-chip interconnect fabrics for IMC architectures.
UR - http://www.scopus.com/inward/record.url?scp=85113204204&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85113204204&partnerID=8YFLogxK
U2 - 10.1109/CSTIC52283.2021.9461480
DO - 10.1109/CSTIC52283.2021.9461480
M3 - Conference contribution
AN - SCOPUS:85113204204
T3 - China Semiconductor Technology International Conference 2021, CSTIC 2021
BT - China Semiconductor Technology International Conference 2021, CSTIC 2021
A2 - Claeys, Cor
A2 - Liang, Steve X.
A2 - Lin, Qinghuang
A2 - Huang, Ru
A2 - Wu, Hanming
A2 - Song, Peilin
A2 - Pang, Linyong
A2 - Zhang, Ying
A2 - Zhang, Beichao
A2 - Xinping Qu, Xinping
A2 - Zhuo, Cheng
A2 - Lung, Hsiang-Lan
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2021 China Semiconductor Technology International Conference, CSTIC 2021
Y2 - 14 March 2021 through 15 March 2021
ER -