TY - JOUR
T1 - Hardware acceleration for RLNC
T2 - A case study based on the xtensa processor with the tensilica instruction-set extension
AU - Acevedo, Javier
AU - Scheffel, Robert
AU - Wunderlich, Simon
AU - Hasler, Mattis
AU - Pandi, Sreekrishna
AU - Cabrera, Juan
AU - Fitzek, Frank H.P.
AU - Fettweis, Gerhard
AU - Reisslein, Martin
N1 - Funding Information:
We are grateful to Sebastian Haas of the TU Dresden Vodafone Chair Mobile Communication Systems for assistance with the configuration of the Xtensa Xplorer.
Funding Information:
1 5G Lab Germany, Deutsche Telekom Chair of Communication Networks, TU Dresden, 01062 Dresden, Germany; javier.acevedo@mailbox.tu-dresden.de (J.A.); simon.wunderlich@tu-dresden.de (S.W.); sreekrishna.pandi@tu-dresden.de (S.P.); juan.cabrera@tu-dresden.de (J.C.); frank.fitzek@tu-dresden.de (F.H.P.F.) 5G Lab Germany, Vodafone Chair Mobile Communication Systems, TU Dresden, 01062 Dresden, Germany; robert.scheffel1@tu-dresden.de (R.S.); mattis.hasler@tu-dresden.de (M.H.); gerhard.fettweis@tu-dresden.de (G.F.) School of Electrical, Computer, and Energy Engineering, Arizona State University, Tempe, AZ 85287, USA * Correspondence: reisslein@asu.edu; Tel.: +1-480-965-8593 † This article is based upon work supported by in part by the Free State of Saxony through funds from the European Commission for the Atto3-D Project and the German Research Foundation (DFG) within the Cluster of Excellence Center for Advancing Electronics Dresden (cfaed).
Publisher Copyright:
© 2018 by the authors. Licensee MDPI, Basel, Switzerland.
PY - 2018/9/8
Y1 - 2018/9/8
N2 - Random linear network coding (RLNC) can greatly aid data transmission in lossy wireless networks. However, RLNC requires computationally complex matrix multiplications and inversions in finite fields (Galois fields). These computations are highly demanding for energy-constrained mobile devices. The presented case study evaluates hardware acceleration strategies for RLNC in the context of the Tensilica Xtensa LX5 processor with the tensilica instruction set extension (TIE). More specifically, we develop TIEs for multiply-accumulate (MAC) operations for accelerating matrix multiplications in Galois fields, single instruction multiple data (SIMD) instructions operating on consecutive memory locations, as well as the flexible-length instruction extension (FLIX). We evaluate the number of clock cycles required for RLNC encoding and decoding without and with the MAC, SIMD, and FLIX acceleration strategies. We also evaluate the RLNC encoding and decoding throughput and energy consumption for a range of RLNC generation and code word sizes. We find that for GF(28 ) and GF(216 ) RLNC encoding, the SIMD and FLIX acceleration strategies achieve speedups of approximately four hundred fold compared to a benchmark C code implementation without TIE. We also find that the unicore Xtensa LX5 with SIMD has seven to thirty times higher RLNC encoding and decoding throughput than the state-of-the-art ODROID XU3 system-on-a-chip (SoC) operating with a single core; the Xtensa LX5 with FLIX, in turn, increases the throughput by roughly 25% compared to utilizing only SIMD. Furthermore, the Xtensa LX5 with FLIX consumes roughly four orders of magnitude less energy than the ODROID XU3 SoC.
AB - Random linear network coding (RLNC) can greatly aid data transmission in lossy wireless networks. However, RLNC requires computationally complex matrix multiplications and inversions in finite fields (Galois fields). These computations are highly demanding for energy-constrained mobile devices. The presented case study evaluates hardware acceleration strategies for RLNC in the context of the Tensilica Xtensa LX5 processor with the tensilica instruction set extension (TIE). More specifically, we develop TIEs for multiply-accumulate (MAC) operations for accelerating matrix multiplications in Galois fields, single instruction multiple data (SIMD) instructions operating on consecutive memory locations, as well as the flexible-length instruction extension (FLIX). We evaluate the number of clock cycles required for RLNC encoding and decoding without and with the MAC, SIMD, and FLIX acceleration strategies. We also evaluate the RLNC encoding and decoding throughput and energy consumption for a range of RLNC generation and code word sizes. We find that for GF(28 ) and GF(216 ) RLNC encoding, the SIMD and FLIX acceleration strategies achieve speedups of approximately four hundred fold compared to a benchmark C code implementation without TIE. We also find that the unicore Xtensa LX5 with SIMD has seven to thirty times higher RLNC encoding and decoding throughput than the state-of-the-art ODROID XU3 system-on-a-chip (SoC) operating with a single core; the Xtensa LX5 with FLIX, in turn, increases the throughput by roughly 25% compared to utilizing only SIMD. Furthermore, the Xtensa LX5 with FLIX consumes roughly four orders of magnitude less energy than the ODROID XU3 SoC.
KW - Application-specific instruction-set processor (ASIP)
KW - Flexible-length instruction extension (FLIX)
KW - Galois field
KW - Hardware acceleration
KW - Multiply-accumulate (MAC) operations
KW - Random linear network coding (RLNC)
KW - Single instruction multiple data (SIMD)
UR - http://www.scopus.com/inward/record.url?scp=85053630811&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85053630811&partnerID=8YFLogxK
U2 - 10.3390/electronics7090180
DO - 10.3390/electronics7090180
M3 - Article
AN - SCOPUS:85053630811
SN - 2079-9292
VL - 7
JO - Electronics (Switzerland)
JF - Electronics (Switzerland)
IS - 9
M1 - 180
ER -