TY - GEN
T1 - An Energy-Efficient Reconfigurable LSTM Accelerator for Natural Language Processing
AU - Azari, Elham
AU - Vrudhula, Sarma
N1 - Funding Information:
This work was supported by the NSF I/UCRC Center for Embedded Systems and from NSF grant number 1361926.
Publisher Copyright:
© 2019 IEEE.
PY - 2019/12
Y1 - 2019/12
N2 - Long Short-Term Memory (LSTM) Recurrent Neural network (RNN) is known for its capability in modeling sequence learning tasks such as language modeling. However, due to the large number of model parameters and compute-intensive operations, existing FPGA implementations of LSTMs are not sufficiently energy-efficient as they require large area and exhibit high power consumption. This work describes a substantially different hardware implementation of an LSTM which includes several architectural innovations to achieve high throughput and energy-efficiency. The architectural innovations include (1) an improved design of an approximate multiplier (AM) and its integration with the compute-intensive units of the LSTM; (2) the design of control mechanisms to handle the variable-cycle (data dependent) multiply operations; (3) incorporation of hierarchical pipelining at multiple levels of the design to maximize the overlap of the variable cycle computations. In addition, this work applies a post-training, range-based, linear quantization to the parameters of the model to further improve the performance and energy-efficiency. A python framework is also developed that allows for analysis and fine tuning of the input parameters before mapping the design to hardware. This paper extensively explores the design trade-offs and demonstrates the advantages for one common application - language modeling. Implementation of the design on a Xilinx Zynq XC7Z030 FPGA shows maximum improvement as compared to three recent published works in throughput to be 27.86X, 7.69X and 11.06X and in energy-efficiency to be 45.26X, 14.76X and 16.97X, respectively.
AB - Long Short-Term Memory (LSTM) Recurrent Neural network (RNN) is known for its capability in modeling sequence learning tasks such as language modeling. However, due to the large number of model parameters and compute-intensive operations, existing FPGA implementations of LSTMs are not sufficiently energy-efficient as they require large area and exhibit high power consumption. This work describes a substantially different hardware implementation of an LSTM which includes several architectural innovations to achieve high throughput and energy-efficiency. The architectural innovations include (1) an improved design of an approximate multiplier (AM) and its integration with the compute-intensive units of the LSTM; (2) the design of control mechanisms to handle the variable-cycle (data dependent) multiply operations; (3) incorporation of hierarchical pipelining at multiple levels of the design to maximize the overlap of the variable cycle computations. In addition, this work applies a post-training, range-based, linear quantization to the parameters of the model to further improve the performance and energy-efficiency. A python framework is also developed that allows for analysis and fine tuning of the input parameters before mapping the design to hardware. This paper extensively explores the design trade-offs and demonstrates the advantages for one common application - language modeling. Implementation of the design on a Xilinx Zynq XC7Z030 FPGA shows maximum improvement as compared to three recent published works in throughput to be 27.86X, 7.69X and 11.06X and in energy-efficiency to be 45.26X, 14.76X and 16.97X, respectively.
KW - FPGA
KW - Hardware Acceleration
KW - Long Short-Term Memory
KW - NLP
KW - Recurrent Neural Network
UR - http://www.scopus.com/inward/record.url?scp=85081341216&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85081341216&partnerID=8YFLogxK
U2 - 10.1109/BigData47090.2019.9006030
DO - 10.1109/BigData47090.2019.9006030
M3 - Conference contribution
AN - SCOPUS:85081341216
T3 - Proceedings - 2019 IEEE International Conference on Big Data, Big Data 2019
SP - 4450
EP - 4459
BT - Proceedings - 2019 IEEE International Conference on Big Data, Big Data 2019
A2 - Baru, Chaitanya
A2 - Huan, Jun
A2 - Khan, Latifur
A2 - Hu, Xiaohua Tony
A2 - Ak, Ronay
A2 - Tian, Yuanyuan
A2 - Barga, Roger
A2 - Zaniolo, Carlo
A2 - Lee, Kisung
A2 - Ye, Yanfang Fanny
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2019 IEEE International Conference on Big Data, Big Data 2019
Y2 - 9 December 2019 through 12 December 2019
ER -