TY - GEN
T1 - Benchmark of RRAM based Architectures for Dot-Product Computation
AU - Peng, Xiaochen
AU - Yu, Shimeng
N1 - Funding Information:
ACKNOWLEDGEMENT This work is supported in part by NSF-CCF-1552687, and NSF/SRC E2CDA program with NSF-CCF-1740225 and SRC Contract 2018-NC-2762.
Publisher Copyright:
© 2018 IEEE.
PY - 2019/1/8
Y1 - 2019/1/8
N2 - Memory array architecture based on emerging non-volatile memory devices have been proposed for on-chip acceleration of dot-product computation in neural networks. As recent advances in machine learning have shown that precision reduction is a useful technique to reduce the computation and memory storage, it is desired to evaluate their hardware cost. In this paper, we use a circuit-level macro model, i.e. NeuroSim, to benchmark the circuit-level performance metrics, such as chip area, latency, and dynamic energy for the XNOR-RRAM and conventional 8-bit RRAM architectures. Both architectures are implemented to process the dot-product operation of a 512×512 synaptic matrix in sequential row-by-row and parallel read-out fashion separately. The simulation results are based on RRAM models and 32nm CMOS PDK, the energy-efficiency of the parallel XNOR-RRAM architecture could achieve 311 TOPS/W, showing at least ~15× and ~621× improvement compared to the parallel and sequential conventional 8-bit RRAM architectures respectively.
AB - Memory array architecture based on emerging non-volatile memory devices have been proposed for on-chip acceleration of dot-product computation in neural networks. As recent advances in machine learning have shown that precision reduction is a useful technique to reduce the computation and memory storage, it is desired to evaluate their hardware cost. In this paper, we use a circuit-level macro model, i.e. NeuroSim, to benchmark the circuit-level performance metrics, such as chip area, latency, and dynamic energy for the XNOR-RRAM and conventional 8-bit RRAM architectures. Both architectures are implemented to process the dot-product operation of a 512×512 synaptic matrix in sequential row-by-row and parallel read-out fashion separately. The simulation results are based on RRAM models and 32nm CMOS PDK, the energy-efficiency of the parallel XNOR-RRAM architecture could achieve 311 TOPS/W, showing at least ~15× and ~621× improvement compared to the parallel and sequential conventional 8-bit RRAM architectures respectively.
KW - hardware accelerator
KW - machine learning
KW - neuromorphic computing
KW - non-volatile memory
UR - http://www.scopus.com/inward/record.url?scp=85062234025&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85062234025&partnerID=8YFLogxK
U2 - 10.1109/APCCAS.2018.8605606
DO - 10.1109/APCCAS.2018.8605606
M3 - Conference contribution
AN - SCOPUS:85062234025
T3 - 2018 IEEE Asia Pacific Conference on Circuits and Systems, APCCAS 2018
SP - 378
EP - 381
BT - 2018 IEEE Asia Pacific Conference on Circuits and Systems, APCCAS 2018
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 14th IEEE Asia Pacific Conference on Circuits and Systems, APCCAS 2018
Y2 - 26 October 2018 through 30 October 2018
ER -