TY - JOUR
T1 - Handling Stuck-at-Fault Defects Using Matrix Transformation for Robust Inference of DNNs
AU - Zhang, Baogang
AU - Uysal, Necati
AU - Fan, Deliang
AU - Ewetz, Rickard
N1 - Funding Information:
He is currently an Assistant Professor with the Electrical and Computer Engineering Department, University of Central Florida, Orlando, FL, USA. His research group is funded by National Science Foundation and UCF InHouse grant. His current research interests include physical design and computer-aided design for emerging technologies.
Funding Information:
Manuscript received March 18, 2019; revised May 31, 2019 and September 12, 2019; accepted September 17, 2019. Date of publication September 30, 2019; date of current version September 18, 2020. This work was supported by NSF under Grant CNS-1908471. This article was recommended by Associate Editor A. Gamatie. (Corresponding author: Baogang Zhang.) B. Zhang, N. Uysal, and R. Ewetz are with the Department of Electrical and Computer Engineering, University of Central Florida, Orlando, FL 32816 USA (e-mail: baogang.zhang@knights.ucf.edu, necati@knights.ucf.edu, rickard.ewetz@ucf.edu).
PY - 2020/10
Y1 - 2020/10
N2 - Matrix-vector multiplication is the dominating computational workload in the inference phase of deep neural networks (DNNs). Memristor crossbar arrays (MCAs) can efficiently perform matrix-vector multiplication in the analog domain. A key challenge is that memristor devices may suffer stuck-at-fault defects, which can severely degrade the classification accuracy. Earlier studies have shown that the accuracy loss can be recovered by utilizing additional hardware or hardware aware training. In this article, we propose a framework that handles stuck-at-faults using matrix transformations, which is called the MT framework. The framework is based on introducing a cost metric that captures the negative impact of the stuck-at-fault defects. Next, the cost metric is minimized by applying matrix transformations T. A transformation T changes a weight matrix W into a new weight matrix W= T(W). In particular, a row flipping transformation, a permutation transformation, and a value range transformation are proposed. The row flipping transformation results in that stuck-off (stuck-on) faults are translated into stuck-on (stuck-off) faults. The permutation transformation maps small (large) weights to memristors stuck-off (stuck-on). The value range transformation is based on reducing the magnitude of the smallest and largest elements in the weight matrices, which results in that the stuck-at-faults introduce smaller errors. The experimental results demonstrate that the MT framework is capable of recovering 99% of the accuracy loss on both the MNIST and CIFAR-10 datasets without utilizing hardware aware training. The accuracy improvements come at the expense of an 8.19× and 9.23× overhead in power and area, respectively. Nevertheless, the overhead can be reduced with up to 50% by leveraging hardware aware training.
AB - Matrix-vector multiplication is the dominating computational workload in the inference phase of deep neural networks (DNNs). Memristor crossbar arrays (MCAs) can efficiently perform matrix-vector multiplication in the analog domain. A key challenge is that memristor devices may suffer stuck-at-fault defects, which can severely degrade the classification accuracy. Earlier studies have shown that the accuracy loss can be recovered by utilizing additional hardware or hardware aware training. In this article, we propose a framework that handles stuck-at-faults using matrix transformations, which is called the MT framework. The framework is based on introducing a cost metric that captures the negative impact of the stuck-at-fault defects. Next, the cost metric is minimized by applying matrix transformations T. A transformation T changes a weight matrix W into a new weight matrix W= T(W). In particular, a row flipping transformation, a permutation transformation, and a value range transformation are proposed. The row flipping transformation results in that stuck-off (stuck-on) faults are translated into stuck-on (stuck-off) faults. The permutation transformation maps small (large) weights to memristors stuck-off (stuck-on). The value range transformation is based on reducing the magnitude of the smallest and largest elements in the weight matrices, which results in that the stuck-at-faults introduce smaller errors. The experimental results demonstrate that the MT framework is capable of recovering 99% of the accuracy loss on both the MNIST and CIFAR-10 datasets without utilizing hardware aware training. The accuracy improvements come at the expense of an 8.19× and 9.23× overhead in power and area, respectively. Nevertheless, the overhead can be reduced with up to 50% by leveraging hardware aware training.
KW - Analog computing
KW - deep neural networks (DNNs)
KW - memristors
KW - stuck-at-faults
KW - transformations
UR - http://www.scopus.com/inward/record.url?scp=85094100496&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85094100496&partnerID=8YFLogxK
U2 - 10.1109/TCAD.2019.2944582
DO - 10.1109/TCAD.2019.2944582
M3 - Article
AN - SCOPUS:85094100496
VL - 39
SP - 2448
EP - 2460
JO - IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
JF - IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
SN - 0278-0070
IS - 10
M1 - 8852740
ER -