An energy-efficient deep convolutional neural network accelerator featuring conditional computing and low external memory access

Minkyu Kim; Jae Sun Seo

doi:10.1109/JSSC.2020.3029235

An energy-efficient deep convolutional neural network accelerator featuring conditional computing and low external memory access

Minkyu Kim, Jae Sun Seo

Research output: Contribution to journal › Article › peer-review

11 Scopus citations

Abstract

With its algorithmic success in many machine learning tasks and applications, deep convolutional neural networks (DCNNs) have been implemented with custom hardware in a number of prior works. However, such works have not exploited conditional/approximate computing to the utmost toward eliminating redundant computations of CNNs. This article presents a DCNN accelerator featuring a novel conditional computing scheme that synergistically combines precision cascading (PC) with zero skipping (ZS). To reduce many redundant convolutions that are followed by max-pooling operations, we propose precision cascading, where the input features are divided into a number of low-precision groups and approximate convolutions with only the most significant bits (MSBs) are performed first. Based on this approximate computation, the full-precision convolution is performed only on the maximum pooling output that is found. This way, the total number of bit-wise convolutions can be reduced by ∼ 2× with < 0.8% degradation in ImageNet accuracy. PC provides the added benefit of increased sparsity per low-precision group, which we exploit with ZS to eliminate the clock cycles and external memory accesses. The proposed conditional computing scheme has been implemented with custom architecture in a 40-nm prototype chip, which achieves a peak energy efficiency of 24.97 TOPS/W at 0.6-V supply and a low external memory access of 0.0018 access/MAC with VGG-16 CNN for ImageNet classification and a peak energy efficiency of 28.51 TOPS/W at 0.9-V supply with FlowNet for Flying Chair data set.

Original language	English (US)
Article number	9229157
Pages (from-to)	803-813
Number of pages	11
Journal	IEEE Journal of Solid-State Circuits
Volume	56
Issue number	3
DOIs	https://doi.org/10.1109/JSSC.2020.3029235
State	Published - Mar 2021

Keywords

Application-specific integrated circuit (ASIC)
approximate computing
conditional computing
deep convolutional neural network (DCNN)
deep learning
energy-efficient accelerator

ASJC Scopus subject areas

Electrical and Electronic Engineering

Access to Document

10.1109/JSSC.2020.3029235

Cite this

@article{9195570427834583807e1a57b370d060,

title = "An energy-efficient deep convolutional neural network accelerator featuring conditional computing and low external memory access",

abstract = "With its algorithmic success in many machine learning tasks and applications, deep convolutional neural networks (DCNNs) have been implemented with custom hardware in a number of prior works. However, such works have not exploited conditional/approximate computing to the utmost toward eliminating redundant computations of CNNs. This article presents a DCNN accelerator featuring a novel conditional computing scheme that synergistically combines precision cascading (PC) with zero skipping (ZS). To reduce many redundant convolutions that are followed by max-pooling operations, we propose precision cascading, where the input features are divided into a number of low-precision groups and approximate convolutions with only the most significant bits (MSBs) are performed first. Based on this approximate computation, the full-precision convolution is performed only on the maximum pooling output that is found. This way, the total number of bit-wise convolutions can be reduced by ∼ 2× with < 0.8% degradation in ImageNet accuracy. PC provides the added benefit of increased sparsity per low-precision group, which we exploit with ZS to eliminate the clock cycles and external memory accesses. The proposed conditional computing scheme has been implemented with custom architecture in a 40-nm prototype chip, which achieves a peak energy efficiency of 24.97 TOPS/W at 0.6-V supply and a low external memory access of 0.0018 access/MAC with VGG-16 CNN for ImageNet classification and a peak energy efficiency of 28.51 TOPS/W at 0.9-V supply with FlowNet for Flying Chair data set.",

keywords = "Application-specific integrated circuit (ASIC), approximate computing, conditional computing, deep convolutional neural network (DCNN), deep learning, energy-efficient accelerator",

author = "Minkyu Kim and Seo, {Jae Sun}",

note = "Publisher Copyright: {\textcopyright} 1966-2012 IEEE.",

year = "2021",

month = mar,

doi = "10.1109/JSSC.2020.3029235",

language = "English (US)",

volume = "56",

pages = "803--813",

journal = "IEEE Journal of Solid-State Circuits",

issn = "0018-9200",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

number = "3",

}

TY - JOUR

T1 - An energy-efficient deep convolutional neural network accelerator featuring conditional computing and low external memory access

AU - Kim, Minkyu

AU - Seo, Jae Sun

PY - 2021/3

Y1 - 2021/3

N2 - With its algorithmic success in many machine learning tasks and applications, deep convolutional neural networks (DCNNs) have been implemented with custom hardware in a number of prior works. However, such works have not exploited conditional/approximate computing to the utmost toward eliminating redundant computations of CNNs. This article presents a DCNN accelerator featuring a novel conditional computing scheme that synergistically combines precision cascading (PC) with zero skipping (ZS). To reduce many redundant convolutions that are followed by max-pooling operations, we propose precision cascading, where the input features are divided into a number of low-precision groups and approximate convolutions with only the most significant bits (MSBs) are performed first. Based on this approximate computation, the full-precision convolution is performed only on the maximum pooling output that is found. This way, the total number of bit-wise convolutions can be reduced by ∼ 2× with < 0.8% degradation in ImageNet accuracy. PC provides the added benefit of increased sparsity per low-precision group, which we exploit with ZS to eliminate the clock cycles and external memory accesses. The proposed conditional computing scheme has been implemented with custom architecture in a 40-nm prototype chip, which achieves a peak energy efficiency of 24.97 TOPS/W at 0.6-V supply and a low external memory access of 0.0018 access/MAC with VGG-16 CNN for ImageNet classification and a peak energy efficiency of 28.51 TOPS/W at 0.9-V supply with FlowNet for Flying Chair data set.

AB - With its algorithmic success in many machine learning tasks and applications, deep convolutional neural networks (DCNNs) have been implemented with custom hardware in a number of prior works. However, such works have not exploited conditional/approximate computing to the utmost toward eliminating redundant computations of CNNs. This article presents a DCNN accelerator featuring a novel conditional computing scheme that synergistically combines precision cascading (PC) with zero skipping (ZS). To reduce many redundant convolutions that are followed by max-pooling operations, we propose precision cascading, where the input features are divided into a number of low-precision groups and approximate convolutions with only the most significant bits (MSBs) are performed first. Based on this approximate computation, the full-precision convolution is performed only on the maximum pooling output that is found. This way, the total number of bit-wise convolutions can be reduced by ∼ 2× with < 0.8% degradation in ImageNet accuracy. PC provides the added benefit of increased sparsity per low-precision group, which we exploit with ZS to eliminate the clock cycles and external memory accesses. The proposed conditional computing scheme has been implemented with custom architecture in a 40-nm prototype chip, which achieves a peak energy efficiency of 24.97 TOPS/W at 0.6-V supply and a low external memory access of 0.0018 access/MAC with VGG-16 CNN for ImageNet classification and a peak energy efficiency of 28.51 TOPS/W at 0.9-V supply with FlowNet for Flying Chair data set.

KW - Application-specific integrated circuit (ASIC)

KW - approximate computing

KW - conditional computing

KW - deep convolutional neural network (DCNN)

KW - deep learning

KW - energy-efficient accelerator

UR - http://www.scopus.com/inward/record.url?scp=85101838156&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85101838156&partnerID=8YFLogxK

U2 - 10.1109/JSSC.2020.3029235

DO - 10.1109/JSSC.2020.3029235

M3 - Article

AN - SCOPUS:85101838156

SN - 0018-9200

VL - 56

SP - 803

EP - 813

JO - IEEE Journal of Solid-State Circuits

JF - IEEE Journal of Solid-State Circuits

IS - 3

M1 - 9229157

ER -

An energy-efficient deep convolutional neural network accelerator featuring conditional computing and low external memory access

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this