Sparse BD-Net: A multiplication-less DNN with sparse binarized depth-wise separable convolution

Zhezhi He; Li Yang; Shaahin Angizi; Adnan Siraj Rakin; Deliang Fan

doi:10.1145/3369391

Sparse BD-Net: A multiplication-less DNN with sparse binarized depth-wise separable convolution

Zhezhi He, Li Yang, Shaahin Angizi, Adnan Siraj Rakin, Deliang Fan

Engineering, Ira A. Fulton Schools of (IAFSE)

Research output: Contribution to journal › Article › peer-review

13 Scopus citations

Abstract

In this work, we propose a multiplication-less binarized depthwise-separable convolution neural network, called BD-Net. BD-Net is designed to use binarized depthwise separable convolution block as the drop-in replacement of conventional spatial-convolution in deep convolution neural network (DNN). In BD-Net, the computation-expensive convolution operations (i.e., Multiplication and Accumulation) are converted into energy-efficient Addition/Subtraction operations. For further compressing the model size while maintaining the dominant computation in addition/subtraction, we propose a brand-new sparse binarization method with a hardware-oriented structured sparsity pattern. To successfully train such sparse BD-Net, we propose and leverage two techniques: (1) a modified group-lasso regularization whose group size is identical to the capacity of basic computing core in accelerator and (2) a weight penalty clipping technique to solve the disharmony issue between weight binarization and lasso regularization. The experiment results show that the proposed sparse BD-Net can achieve comparable or even better inference accuracy, in comparison to the full precision CNN baseline. Beyond that, a BD-Net customized process-in-memory accelerator is designed using SOT-MRAM, which owns characteristics of high channel expansion flexibility and computation parallelism. Through the detailed analysis from both software and hardware perspectives, we provide an intuitive design guidance for software/hardware co-design of DNN acceleration on mobile embedded systems. Note that this journal submission is the extended version of our previous published paper in ISVLSI 2018 [24].

Original language	English (US)
Article number	15
Journal	ACM Journal on Emerging Technologies in Computing Systems
Volume	16
Issue number	2
DOIs	https://doi.org/10.1145/3369391
State	Published - Jan 29 2020

Keywords

Deep neural network
in-memory computing
model compression

ASJC Scopus subject areas

Software
Hardware and Architecture
Electrical and Electronic Engineering

Access to Document

10.1145/3369391

Cite this

@article{9d866ac3491447b3a0c4b2d2cc87f8b3,

title = "Sparse BD-Net: A multiplication-less DNN with sparse binarized depth-wise separable convolution",

abstract = "In this work, we propose a multiplication-less binarized depthwise-separable convolution neural network, called BD-Net. BD-Net is designed to use binarized depthwise separable convolution block as the drop-in replacement of conventional spatial-convolution in deep convolution neural network (DNN). In BD-Net, the computation-expensive convolution operations (i.e., Multiplication and Accumulation) are converted into energy-efficient Addition/Subtraction operations. For further compressing the model size while maintaining the dominant computation in addition/subtraction, we propose a brand-new sparse binarization method with a hardware-oriented structured sparsity pattern. To successfully train such sparse BD-Net, we propose and leverage two techniques: (1) a modified group-lasso regularization whose group size is identical to the capacity of basic computing core in accelerator and (2) a weight penalty clipping technique to solve the disharmony issue between weight binarization and lasso regularization. The experiment results show that the proposed sparse BD-Net can achieve comparable or even better inference accuracy, in comparison to the full precision CNN baseline. Beyond that, a BD-Net customized process-in-memory accelerator is designed using SOT-MRAM, which owns characteristics of high channel expansion flexibility and computation parallelism. Through the detailed analysis from both software and hardware perspectives, we provide an intuitive design guidance for software/hardware co-design of DNN acceleration on mobile embedded systems. Note that this journal submission is the extended version of our previous published paper in ISVLSI 2018 [24].",

keywords = "Deep neural network, in-memory computing, model compression",

author = "Zhezhi He and Li Yang and Shaahin Angizi and Rakin, {Adnan Siraj} and Deliang Fan",

note = "Funding Information: This work is supported in part by the National Science Foundation under Grant Nos. 1740126, 1908495, 1931871, and Semiconductor Research Corporation nCORE. Authors{\textquoteright} addresses: Z. He, L. Yang, A. S. Rakin, and D. Fan, School of Electrical, Computer and Energy Engineering, Arizona State University, 650 E Tyler Mall, Tempe, Arizona, 85287-5706; emails: {zhezhihe, lyang166, asrakin, dfan}@asu.edu; S. Angizi, Department of ECE, University of Central Florida, 4328 Scorpius Street, Orlando, Florida, 32816-2362; email: angizi@knights.ucf.edu. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from permissions@acm.org. {\textcopyright} 2020 Association for Computing Machinery. 1550-4832/2020/01-ART15 $15.00 https://doi.org/10.1145/3369391 Publisher Copyright: {\textcopyright} 2020 ACM.",

year = "2020",

month = jan,

day = "29",

doi = "10.1145/3369391",

language = "English (US)",

volume = "16",

journal = "ACM Journal on Emerging Technologies in Computing Systems",

issn = "1550-4832",

publisher = "Association for Computing Machinery (ACM)",

number = "2",

}

TY - JOUR

T1 - Sparse BD-Net

T2 - A multiplication-less DNN with sparse binarized depth-wise separable convolution

AU - He, Zhezhi

AU - Yang, Li

AU - Angizi, Shaahin

AU - Rakin, Adnan Siraj

AU - Fan, Deliang

N1 - Funding Information: This work is supported in part by the National Science Foundation under Grant Nos. 1740126, 1908495, 1931871, and Semiconductor Research Corporation nCORE. Authors’ addresses: Z. He, L. Yang, A. S. Rakin, and D. Fan, School of Electrical, Computer and Energy Engineering, Arizona State University, 650 E Tyler Mall, Tempe, Arizona, 85287-5706; emails: {zhezhihe, lyang166, asrakin, dfan}@asu.edu; S. Angizi, Department of ECE, University of Central Florida, 4328 Scorpius Street, Orlando, Florida, 32816-2362; email: angizi@knights.ucf.edu. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from permissions@acm.org. © 2020 Association for Computing Machinery. 1550-4832/2020/01-ART15 $15.00 https://doi.org/10.1145/3369391 Publisher Copyright: © 2020 ACM.

PY - 2020/1/29

Y1 - 2020/1/29

N2 - In this work, we propose a multiplication-less binarized depthwise-separable convolution neural network, called BD-Net. BD-Net is designed to use binarized depthwise separable convolution block as the drop-in replacement of conventional spatial-convolution in deep convolution neural network (DNN). In BD-Net, the computation-expensive convolution operations (i.e., Multiplication and Accumulation) are converted into energy-efficient Addition/Subtraction operations. For further compressing the model size while maintaining the dominant computation in addition/subtraction, we propose a brand-new sparse binarization method with a hardware-oriented structured sparsity pattern. To successfully train such sparse BD-Net, we propose and leverage two techniques: (1) a modified group-lasso regularization whose group size is identical to the capacity of basic computing core in accelerator and (2) a weight penalty clipping technique to solve the disharmony issue between weight binarization and lasso regularization. The experiment results show that the proposed sparse BD-Net can achieve comparable or even better inference accuracy, in comparison to the full precision CNN baseline. Beyond that, a BD-Net customized process-in-memory accelerator is designed using SOT-MRAM, which owns characteristics of high channel expansion flexibility and computation parallelism. Through the detailed analysis from both software and hardware perspectives, we provide an intuitive design guidance for software/hardware co-design of DNN acceleration on mobile embedded systems. Note that this journal submission is the extended version of our previous published paper in ISVLSI 2018 [24].

AB - In this work, we propose a multiplication-less binarized depthwise-separable convolution neural network, called BD-Net. BD-Net is designed to use binarized depthwise separable convolution block as the drop-in replacement of conventional spatial-convolution in deep convolution neural network (DNN). In BD-Net, the computation-expensive convolution operations (i.e., Multiplication and Accumulation) are converted into energy-efficient Addition/Subtraction operations. For further compressing the model size while maintaining the dominant computation in addition/subtraction, we propose a brand-new sparse binarization method with a hardware-oriented structured sparsity pattern. To successfully train such sparse BD-Net, we propose and leverage two techniques: (1) a modified group-lasso regularization whose group size is identical to the capacity of basic computing core in accelerator and (2) a weight penalty clipping technique to solve the disharmony issue between weight binarization and lasso regularization. The experiment results show that the proposed sparse BD-Net can achieve comparable or even better inference accuracy, in comparison to the full precision CNN baseline. Beyond that, a BD-Net customized process-in-memory accelerator is designed using SOT-MRAM, which owns characteristics of high channel expansion flexibility and computation parallelism. Through the detailed analysis from both software and hardware perspectives, we provide an intuitive design guidance for software/hardware co-design of DNN acceleration on mobile embedded systems. Note that this journal submission is the extended version of our previous published paper in ISVLSI 2018 [24].

KW - Deep neural network

KW - in-memory computing

KW - model compression

UR - http://www.scopus.com/inward/record.url?scp=85084764110&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85084764110&partnerID=8YFLogxK

U2 - 10.1145/3369391

DO - 10.1145/3369391

M3 - Article

AN - SCOPUS:85084764110

SN - 1550-4832

VL - 16

JO - ACM Journal on Emerging Technologies in Computing Systems

JF - ACM Journal on Emerging Technologies in Computing Systems

IS - 2

M1 - 15

ER -

Sparse BD-Net: A multiplication-less DNN with sparse binarized depth-wise separable convolution

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this