TY - JOUR
T1 - Efficient Error Detection for Matrix Multiplication With Systolic Arrays on FPGAs
AU - Libano, Fabiano
AU - Rech, Paolo
AU - Brunhaver, John
N1 - Funding Information:
This material is based on research sponsored by Air Force Research Laboratory (AFRL) andDefense Advanced Research Projects Agency (DARPA) under Grant FA8650-18-2-7860. The U.S. Government is authorized to reproduce and distribute reprints for Governmental purposes notwithstanding any copyright notation thereon. This work was supported by the European Union's Horizon 2020 research and innovation programme under the Marie Sklodowska-Curie under Grant 886202.
Publisher Copyright:
© 2023 IEEE.
PY - 2023/8/1
Y1 - 2023/8/1
N2 - Matrix multiplication has always been a cornerstone in computer science. In fact, linear algebra tools permeate a wide variety of applications: from weather forecasting, to financial market prediction, radio signal processing, computer vision, and more. Since many of the aforementioned applications typically impose strict performance and/or fault tolerance constraints, the demand for fast and reliable matrix multiplication (MxM) is at an all-time high. Typically, increased reliability is achieved through redundancy. However, coarse-grain duplication incurs an often prohibitive overhead, higher than 100%. Thanks to the peculiar characteristics of the MxM algorithm, more efficient algorithm-based hardening solutions have been designed to detect (and even correct) some types of errors with lower overhead. We show that, despite being more efficient, current solutions are still sub-optimal in certain scenarios, particularly when considering persistent faults in Field-Programmable Gate-Arrays (FPGAs). Based on a thorough analysis of the fault model, we propose an error detection technique for MxM that decreases both algorithmic and architectural costs by over a polynomial degree, when compared to existing algorithm-based strategies. Furthermore, we report arithmetic overheads at the application level to be under 1% for three state-of-the-art Convolutional Neural Networks (CNNs).
AB - Matrix multiplication has always been a cornerstone in computer science. In fact, linear algebra tools permeate a wide variety of applications: from weather forecasting, to financial market prediction, radio signal processing, computer vision, and more. Since many of the aforementioned applications typically impose strict performance and/or fault tolerance constraints, the demand for fast and reliable matrix multiplication (MxM) is at an all-time high. Typically, increased reliability is achieved through redundancy. However, coarse-grain duplication incurs an often prohibitive overhead, higher than 100%. Thanks to the peculiar characteristics of the MxM algorithm, more efficient algorithm-based hardening solutions have been designed to detect (and even correct) some types of errors with lower overhead. We show that, despite being more efficient, current solutions are still sub-optimal in certain scenarios, particularly when considering persistent faults in Field-Programmable Gate-Arrays (FPGAs). Based on a thorough analysis of the fault model, we propose an error detection technique for MxM that decreases both algorithmic and architectural costs by over a polynomial degree, when compared to existing algorithm-based strategies. Furthermore, we report arithmetic overheads at the application level to be under 1% for three state-of-the-art Convolutional Neural Networks (CNNs).
KW - Error detection
KW - FPGA
KW - matrix multiplication
KW - systolic array
UR - http://www.scopus.com/inward/record.url?scp=85149360157&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85149360157&partnerID=8YFLogxK
U2 - 10.1109/TC.2023.3248282
DO - 10.1109/TC.2023.3248282
M3 - Article
AN - SCOPUS:85149360157
SN - 0018-9340
VL - 72
SP - 2390
EP - 2403
JO - IEEE Transactions on Computers
JF - IEEE Transactions on Computers
IS - 8
ER -