TY - JOUR

T1 - Handling Stuck-at-Fault Defects Using Matrix Transformation for Robust Inference of DNNs

AU - Zhang, Baogang

AU - Uysal, Necati

AU - Fan, Deliang

AU - Ewetz, Rickard

N1 - Funding Information:
He is currently an Assistant Professor with the Electrical and Computer Engineering Department, University of Central Florida, Orlando, FL, USA. His research group is funded by National Science Foundation and UCF InHouse grant. His current research interests include physical design and computer-aided design for emerging technologies.
Funding Information:
Manuscript received March 18, 2019; revised May 31, 2019 and September 12, 2019; accepted September 17, 2019. Date of publication September 30, 2019; date of current version September 18, 2020. This work was supported by NSF under Grant CNS-1908471. This article was recommended by Associate Editor A. Gamatie. (Corresponding author: Baogang Zhang.) B. Zhang, N. Uysal, and R. Ewetz are with the Department of Electrical and Computer Engineering, University of Central Florida, Orlando, FL 32816 USA (e-mail: baogang.zhang@knights.ucf.edu, necati@knights.ucf.edu, rickard.ewetz@ucf.edu).

PY - 2020/10

Y1 - 2020/10

N2 - Matrix-vector multiplication is the dominating computational workload in the inference phase of deep neural networks (DNNs). Memristor crossbar arrays (MCAs) can efficiently perform matrix-vector multiplication in the analog domain. A key challenge is that memristor devices may suffer stuck-at-fault defects, which can severely degrade the classification accuracy. Earlier studies have shown that the accuracy loss can be recovered by utilizing additional hardware or hardware aware training. In this article, we propose a framework that handles stuck-at-faults using matrix transformations, which is called the MT framework. The framework is based on introducing a cost metric that captures the negative impact of the stuck-at-fault defects. Next, the cost metric is minimized by applying matrix transformations T. A transformation T changes a weight matrix W into a new weight matrix W= T(W). In particular, a row flipping transformation, a permutation transformation, and a value range transformation are proposed. The row flipping transformation results in that stuck-off (stuck-on) faults are translated into stuck-on (stuck-off) faults. The permutation transformation maps small (large) weights to memristors stuck-off (stuck-on). The value range transformation is based on reducing the magnitude of the smallest and largest elements in the weight matrices, which results in that the stuck-at-faults introduce smaller errors. The experimental results demonstrate that the MT framework is capable of recovering 99% of the accuracy loss on both the MNIST and CIFAR-10 datasets without utilizing hardware aware training. The accuracy improvements come at the expense of an 8.19× and 9.23× overhead in power and area, respectively. Nevertheless, the overhead can be reduced with up to 50% by leveraging hardware aware training.

AB - Matrix-vector multiplication is the dominating computational workload in the inference phase of deep neural networks (DNNs). Memristor crossbar arrays (MCAs) can efficiently perform matrix-vector multiplication in the analog domain. A key challenge is that memristor devices may suffer stuck-at-fault defects, which can severely degrade the classification accuracy. Earlier studies have shown that the accuracy loss can be recovered by utilizing additional hardware or hardware aware training. In this article, we propose a framework that handles stuck-at-faults using matrix transformations, which is called the MT framework. The framework is based on introducing a cost metric that captures the negative impact of the stuck-at-fault defects. Next, the cost metric is minimized by applying matrix transformations T. A transformation T changes a weight matrix W into a new weight matrix W= T(W). In particular, a row flipping transformation, a permutation transformation, and a value range transformation are proposed. The row flipping transformation results in that stuck-off (stuck-on) faults are translated into stuck-on (stuck-off) faults. The permutation transformation maps small (large) weights to memristors stuck-off (stuck-on). The value range transformation is based on reducing the magnitude of the smallest and largest elements in the weight matrices, which results in that the stuck-at-faults introduce smaller errors. The experimental results demonstrate that the MT framework is capable of recovering 99% of the accuracy loss on both the MNIST and CIFAR-10 datasets without utilizing hardware aware training. The accuracy improvements come at the expense of an 8.19× and 9.23× overhead in power and area, respectively. Nevertheless, the overhead can be reduced with up to 50% by leveraging hardware aware training.

KW - Analog computing

KW - deep neural networks (DNNs)

KW - memristors

KW - stuck-at-faults

KW - transformations

UR - http://www.scopus.com/inward/record.url?scp=85094100496&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85094100496&partnerID=8YFLogxK

U2 - 10.1109/TCAD.2019.2944582

DO - 10.1109/TCAD.2019.2944582

M3 - Article

AN - SCOPUS:85094100496

VL - 39

SP - 2448

EP - 2460

JO - IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems

JF - IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems

SN - 0278-0070

IS - 10

M1 - 8852740

ER -