A 28nm 8-bit Floating-Point Tensor Core based CNN Training Processor with Dynamic Activation/Weight Sparsification

Shreyas Kolala Venkataramanaiah, Jian Meng, Han Sok Suh, Injune Yeo, Jyotishman Saikia, Sai Kiran Cherupally, Yichi Zhang, Zhiru Zhang, Jae Sun Seo

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Scopus citations

Abstract

We present an 8-bit floating-point (FP8) training processor which implements (1) highly parallel tensor cores (fused multiply-add trees) that maintain high utilization throughout forward propagation (FP), backward propagation (BP), and weight update (WU) phases of the training process, (2) hardware-efficient channel gating for dynamic output activation sparsity, (3) dynamic weight sparsity based on group Lasso, and (4) gradient skipping based on FP prediction error. We develop a custom ISA to flexibly support different CNN topologies and training parameters. The 28nm prototype chip demonstrates large improvements in FLOPs reduction (7.3 ×), energy efficiency (16.4 TFLOPS/W), and overall training latency speedup (4.7×), for both supervised and self-supervised training tasks.

Original languageEnglish (US)
Title of host publicationESSCIRC 2022 - IEEE 48th European Solid State Circuits Conference, Proceedings
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages89-92
Number of pages4
ISBN (Electronic)9781665484947
DOIs
StatePublished - 2022
Event48th IEEE European Solid State Circuits Conference, ESSCIRC 2022 - Milan, Italy
Duration: Sep 19 2022Sep 22 2022

Publication series

NameESSCIRC 2022 - IEEE 48th European Solid State Circuits Conference, Proceedings

Conference

Conference48th IEEE European Solid State Circuits Conference, ESSCIRC 2022
Country/TerritoryItaly
CityMilan
Period9/19/229/22/22

Keywords

  • Convolutional neural networks
  • deep neural network training
  • hardware accelerator
  • structured sparsity

ASJC Scopus subject areas

  • Hardware and Architecture
  • Electrical and Electronic Engineering
  • Electronic, Optical and Magnetic Materials
  • Instrumentation
  • Artificial Intelligence

Fingerprint

Dive into the research topics of 'A 28nm 8-bit Floating-Point Tensor Core based CNN Training Processor with Dynamic Activation/Weight Sparsification'. Together they form a unique fingerprint.

Cite this