A 28nm 8-bit Floating-Point Tensor Core based CNN Training Processor with Dynamic Activation/Weight Sparsification

Shreyas Kolala Venkataramanaiah; Jian Meng; Han Sok Suh; Injune Yeo; Jyotishman Saikia; Sai Kiran Cherupally; Yichi Zhang; Zhiru Zhang; Jae Sun Seo

doi:10.1109/ESSCIRC55480.2022.9911359

A 28nm 8-bit Floating-Point Tensor Core based CNN Training Processor with Dynamic Activation/Weight Sparsification

Shreyas Kolala Venkataramanaiah, Jian Meng, Han Sok Suh, Injune Yeo, Jyotishman Saikia, Sai Kiran Cherupally, Yichi Zhang, Zhiru Zhang, Jae Sun Seo

Engineering, Ira A. Fulton Schools of (IAFSE)

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

1 Scopus citations

Abstract

We present an 8-bit floating-point (FP8) training processor which implements (1) highly parallel tensor cores (fused multiply-add trees) that maintain high utilization throughout forward propagation (FP), backward propagation (BP), and weight update (WU) phases of the training process, (2) hardware-efficient channel gating for dynamic output activation sparsity, (3) dynamic weight sparsity based on group Lasso, and (4) gradient skipping based on FP prediction error. We develop a custom ISA to flexibly support different CNN topologies and training parameters. The 28nm prototype chip demonstrates large improvements in FLOPs reduction (7.3 ×), energy efficiency (16.4 TFLOPS/W), and overall training latency speedup (4.7×), for both supervised and self-supervised training tasks.

Original language	English (US)
Title of host publication	ESSCIRC 2022 - IEEE 48th European Solid State Circuits Conference, Proceedings
Publisher	Institute of Electrical and Electronics Engineers Inc.
Pages	89-92
Number of pages	4
ISBN (Electronic)	9781665484947
DOIs	https://doi.org/10.1109/ESSCIRC55480.2022.9911359
State	Published - 2022
Event	48th IEEE European Solid State Circuits Conference, ESSCIRC 2022 - Milan, Italy Duration: Sep 19 2022 → Sep 22 2022

Publication series

Name	ESSCIRC 2022 - IEEE 48th European Solid State Circuits Conference, Proceedings

Conference

Conference	48th IEEE European Solid State Circuits Conference, ESSCIRC 2022
Country/Territory	Italy
City	Milan
Period	9/19/22 → 9/22/22

Keywords

Convolutional neural networks
deep neural network training
hardware accelerator
structured sparsity

ASJC Scopus subject areas

Hardware and Architecture
Electrical and Electronic Engineering
Electronic, Optical and Magnetic Materials
Instrumentation
Artificial Intelligence

Access to Document

10.1109/ESSCIRC55480.2022.9911359

Cite this

Venkataramanaiah, S. K., Meng, J., Suh, H. S., Yeo, I., Saikia, J., Cherupally, S. K., Zhang, Y., Zhang, Z., & Seo, J. S. (2022). A 28nm 8-bit Floating-Point Tensor Core based CNN Training Processor with Dynamic Activation/Weight Sparsification. In ESSCIRC 2022 - IEEE 48th European Solid State Circuits Conference, Proceedings (pp. 89-92). (ESSCIRC 2022 - IEEE 48th European Solid State Circuits Conference, Proceedings). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ESSCIRC55480.2022.9911359

A 28nm 8-bit Floating-Point Tensor Core based CNN Training Processor with Dynamic Activation/Weight Sparsification. / Venkataramanaiah, Shreyas Kolala; Meng, Jian; Suh, Han Sok et al.
ESSCIRC 2022 - IEEE 48th European Solid State Circuits Conference, Proceedings. Institute of Electrical and Electronics Engineers Inc., 2022. p. 89-92 (ESSCIRC 2022 - IEEE 48th European Solid State Circuits Conference, Proceedings).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Venkataramanaiah, SK, Meng, J, Suh, HS, Yeo, I, Saikia, J, Cherupally, SK, Zhang, Y, Zhang, Z & Seo, JS 2022, A 28nm 8-bit Floating-Point Tensor Core based CNN Training Processor with Dynamic Activation/Weight Sparsification. in ESSCIRC 2022 - IEEE 48th European Solid State Circuits Conference, Proceedings. ESSCIRC 2022 - IEEE 48th European Solid State Circuits Conference, Proceedings, Institute of Electrical and Electronics Engineers Inc., pp. 89-92, 48th IEEE European Solid State Circuits Conference, ESSCIRC 2022, Milan, Italy, 9/19/22. https://doi.org/10.1109/ESSCIRC55480.2022.9911359

Venkataramanaiah SK, Meng J, Suh HS, Yeo I, Saikia J, Cherupally SK et al. A 28nm 8-bit Floating-Point Tensor Core based CNN Training Processor with Dynamic Activation/Weight Sparsification. In ESSCIRC 2022 - IEEE 48th European Solid State Circuits Conference, Proceedings. Institute of Electrical and Electronics Engineers Inc. 2022. p. 89-92. (ESSCIRC 2022 - IEEE 48th European Solid State Circuits Conference, Proceedings). doi: 10.1109/ESSCIRC55480.2022.9911359

Venkataramanaiah, Shreyas Kolala ; Meng, Jian ; Suh, Han Sok et al. / A 28nm 8-bit Floating-Point Tensor Core based CNN Training Processor with Dynamic Activation/Weight Sparsification. ESSCIRC 2022 - IEEE 48th European Solid State Circuits Conference, Proceedings. Institute of Electrical and Electronics Engineers Inc., 2022. pp. 89-92 (ESSCIRC 2022 - IEEE 48th European Solid State Circuits Conference, Proceedings).

@inproceedings{d5aec28932954789a27ada6fa4b11850,

title = "A 28nm 8-bit Floating-Point Tensor Core based CNN Training Processor with Dynamic Activation/Weight Sparsification",

abstract = "We present an 8-bit floating-point (FP8) training processor which implements (1) highly parallel tensor cores (fused multiply-add trees) that maintain high utilization throughout forward propagation (FP), backward propagation (BP), and weight update (WU) phases of the training process, (2) hardware-efficient channel gating for dynamic output activation sparsity, (3) dynamic weight sparsity based on group Lasso, and (4) gradient skipping based on FP prediction error. We develop a custom ISA to flexibly support different CNN topologies and training parameters. The 28nm prototype chip demonstrates large improvements in FLOPs reduction (7.3 ×), energy efficiency (16.4 TFLOPS/W), and overall training latency speedup (4.7×), for both supervised and self-supervised training tasks.",

keywords = "Convolutional neural networks, deep neural network training, hardware accelerator, structured sparsity",

author = "Venkataramanaiah, {Shreyas Kolala} and Jian Meng and Suh, {Han Sok} and Injune Yeo and Jyotishman Saikia and Cherupally, {Sai Kiran} and Yichi Zhang and Zhiru Zhang and Seo, {Jae Sun}",

note = "Funding Information: VI. ACKNOWLEDGEMENTS This work is supported in part by NSF and JUMP C-BRIC, a SRC program sponsored by DARPA. Publisher Copyright: {\textcopyright} 2022 IEEE.; 48th IEEE European Solid State Circuits Conference, ESSCIRC 2022 ; Conference date: 19-09-2022 Through 22-09-2022",

year = "2022",

doi = "10.1109/ESSCIRC55480.2022.9911359",

language = "English (US)",

series = "ESSCIRC 2022 - IEEE 48th European Solid State Circuits Conference, Proceedings",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

pages = "89--92",

booktitle = "ESSCIRC 2022 - IEEE 48th European Solid State Circuits Conference, Proceedings",

}

TY - GEN

T1 - A 28nm 8-bit Floating-Point Tensor Core based CNN Training Processor with Dynamic Activation/Weight Sparsification

AU - Venkataramanaiah, Shreyas Kolala

AU - Meng, Jian

AU - Suh, Han Sok

AU - Yeo, Injune

AU - Saikia, Jyotishman

AU - Cherupally, Sai Kiran

AU - Zhang, Yichi

AU - Zhang, Zhiru

AU - Seo, Jae Sun

PY - 2022

Y1 - 2022

N2 - We present an 8-bit floating-point (FP8) training processor which implements (1) highly parallel tensor cores (fused multiply-add trees) that maintain high utilization throughout forward propagation (FP), backward propagation (BP), and weight update (WU) phases of the training process, (2) hardware-efficient channel gating for dynamic output activation sparsity, (3) dynamic weight sparsity based on group Lasso, and (4) gradient skipping based on FP prediction error. We develop a custom ISA to flexibly support different CNN topologies and training parameters. The 28nm prototype chip demonstrates large improvements in FLOPs reduction (7.3 ×), energy efficiency (16.4 TFLOPS/W), and overall training latency speedup (4.7×), for both supervised and self-supervised training tasks.

AB - We present an 8-bit floating-point (FP8) training processor which implements (1) highly parallel tensor cores (fused multiply-add trees) that maintain high utilization throughout forward propagation (FP), backward propagation (BP), and weight update (WU) phases of the training process, (2) hardware-efficient channel gating for dynamic output activation sparsity, (3) dynamic weight sparsity based on group Lasso, and (4) gradient skipping based on FP prediction error. We develop a custom ISA to flexibly support different CNN topologies and training parameters. The 28nm prototype chip demonstrates large improvements in FLOPs reduction (7.3 ×), energy efficiency (16.4 TFLOPS/W), and overall training latency speedup (4.7×), for both supervised and self-supervised training tasks.

KW - Convolutional neural networks

KW - deep neural network training

KW - hardware accelerator

KW - structured sparsity

UR - http://www.scopus.com/inward/record.url?scp=85141481129&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85141481129&partnerID=8YFLogxK

U2 - 10.1109/ESSCIRC55480.2022.9911359

DO - 10.1109/ESSCIRC55480.2022.9911359

M3 - Conference contribution

AN - SCOPUS:85141481129

T3 - ESSCIRC 2022 - IEEE 48th European Solid State Circuits Conference, Proceedings

SP - 89

EP - 92

BT - ESSCIRC 2022 - IEEE 48th European Solid State Circuits Conference, Proceedings

PB - Institute of Electrical and Electronics Engineers Inc.

T2 - 48th IEEE European Solid State Circuits Conference, ESSCIRC 2022

Y2 - 19 September 2022 through 22 September 2022

ER -

A 28nm 8-bit Floating-Point Tensor Core based CNN Training Processor with Dynamic Activation/Weight Sparsification

Abstract

Publication series

Conference

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this