Hardware acceleration for RLNC: A case study based on the xtensa processor with the tensilica instruction-set extension

Javier Acevedo; Robert Scheffel; Simon Wunderlich; Mattis Hasler; Sreekrishna Pandi; Juan Cabrera; Frank H.P. Fitzek; Gerhard Fettweis; Martin Reisslein

doi:10.3390/electronics7090180

Hardware acceleration for RLNC: A case study based on the xtensa processor with the tensilica instruction-set extension

Javier Acevedo, Robert Scheffel, Simon Wunderlich, Mattis Hasler, Sreekrishna Pandi, Juan Cabrera, Frank H.P. Fitzek, Gerhard Fettweis, Martin Reisslein

Research output: Contribution to journal › Article › peer-review

10 Scopus citations

Abstract

Random linear network coding (RLNC) can greatly aid data transmission in lossy wireless networks. However, RLNC requires computationally complex matrix multiplications and inversions in finite fields (Galois fields). These computations are highly demanding for energy-constrained mobile devices. The presented case study evaluates hardware acceleration strategies for RLNC in the context of the Tensilica Xtensa LX5 processor with the tensilica instruction set extension (TIE). More specifically, we develop TIEs for multiply-accumulate (MAC) operations for accelerating matrix multiplications in Galois fields, single instruction multiple data (SIMD) instructions operating on consecutive memory locations, as well as the flexible-length instruction extension (FLIX). We evaluate the number of clock cycles required for RLNC encoding and decoding without and with the MAC, SIMD, and FLIX acceleration strategies. We also evaluate the RLNC encoding and decoding throughput and energy consumption for a range of RLNC generation and code word sizes. We find that for GF(2⁸ ) and GF(2¹⁶ ) RLNC encoding, the SIMD and FLIX acceleration strategies achieve speedups of approximately four hundred fold compared to a benchmark C code implementation without TIE. We also find that the unicore Xtensa LX5 with SIMD has seven to thirty times higher RLNC encoding and decoding throughput than the state-of-the-art ODROID XU3 system-on-a-chip (SoC) operating with a single core; the Xtensa LX5 with FLIX, in turn, increases the throughput by roughly 25% compared to utilizing only SIMD. Furthermore, the Xtensa LX5 with FLIX consumes roughly four orders of magnitude less energy than the ODROID XU3 SoC.

Original language	English (US)
Article number	180
Journal	Electronics (Switzerland)
Volume	7
Issue number	9
DOIs	https://doi.org/10.3390/electronics7090180
State	Published - Sep 8 2018

Keywords

Application-specific instruction-set processor (ASIP)
Flexible-length instruction extension (FLIX)
Galois field
Hardware acceleration
Multiply-accumulate (MAC) operations
Random linear network coding (RLNC)
Single instruction multiple data (SIMD)

ASJC Scopus subject areas

Control and Systems Engineering
Signal Processing
Hardware and Architecture
Computer Networks and Communications
Electrical and Electronic Engineering

Access to Document

10.3390/electronics7090180

Cite this

@article{48d768f334084b5da575570b3895cde8,

title = "Hardware acceleration for RLNC: A case study based on the xtensa processor with the tensilica instruction-set extension",

abstract = "Random linear network coding (RLNC) can greatly aid data transmission in lossy wireless networks. However, RLNC requires computationally complex matrix multiplications and inversions in finite fields (Galois fields). These computations are highly demanding for energy-constrained mobile devices. The presented case study evaluates hardware acceleration strategies for RLNC in the context of the Tensilica Xtensa LX5 processor with the tensilica instruction set extension (TIE). More specifically, we develop TIEs for multiply-accumulate (MAC) operations for accelerating matrix multiplications in Galois fields, single instruction multiple data (SIMD) instructions operating on consecutive memory locations, as well as the flexible-length instruction extension (FLIX). We evaluate the number of clock cycles required for RLNC encoding and decoding without and with the MAC, SIMD, and FLIX acceleration strategies. We also evaluate the RLNC encoding and decoding throughput and energy consumption for a range of RLNC generation and code word sizes. We find that for GF(28 ) and GF(216 ) RLNC encoding, the SIMD and FLIX acceleration strategies achieve speedups of approximately four hundred fold compared to a benchmark C code implementation without TIE. We also find that the unicore Xtensa LX5 with SIMD has seven to thirty times higher RLNC encoding and decoding throughput than the state-of-the-art ODROID XU3 system-on-a-chip (SoC) operating with a single core; the Xtensa LX5 with FLIX, in turn, increases the throughput by roughly 25% compared to utilizing only SIMD. Furthermore, the Xtensa LX5 with FLIX consumes roughly four orders of magnitude less energy than the ODROID XU3 SoC.",

keywords = "Application-specific instruction-set processor (ASIP), Flexible-length instruction extension (FLIX), Galois field, Hardware acceleration, Multiply-accumulate (MAC) operations, Random linear network coding (RLNC), Single instruction multiple data (SIMD)",

author = "Javier Acevedo and Robert Scheffel and Simon Wunderlich and Mattis Hasler and Sreekrishna Pandi and Juan Cabrera and Fitzek, {Frank H.P.} and Gerhard Fettweis and Martin Reisslein",

note = "Publisher Copyright: {\textcopyright} 2018 by the authors. Licensee MDPI, Basel, Switzerland.",

year = "2018",

month = sep,

day = "8",

doi = "10.3390/electronics7090180",

language = "English (US)",

volume = "7",

journal = "Electronics (Switzerland)",

issn = "2079-9292",

publisher = "Multidisciplinary Digital Publishing Institute (MDPI)",

number = "9",

}

TY - JOUR

T1 - Hardware acceleration for RLNC

T2 - A case study based on the xtensa processor with the tensilica instruction-set extension

AU - Acevedo, Javier

AU - Scheffel, Robert

AU - Wunderlich, Simon

AU - Hasler, Mattis

AU - Pandi, Sreekrishna

AU - Cabrera, Juan

AU - Fitzek, Frank H.P.

AU - Fettweis, Gerhard

AU - Reisslein, Martin

PY - 2018/9/8

Y1 - 2018/9/8

N2 - Random linear network coding (RLNC) can greatly aid data transmission in lossy wireless networks. However, RLNC requires computationally complex matrix multiplications and inversions in finite fields (Galois fields). These computations are highly demanding for energy-constrained mobile devices. The presented case study evaluates hardware acceleration strategies for RLNC in the context of the Tensilica Xtensa LX5 processor with the tensilica instruction set extension (TIE). More specifically, we develop TIEs for multiply-accumulate (MAC) operations for accelerating matrix multiplications in Galois fields, single instruction multiple data (SIMD) instructions operating on consecutive memory locations, as well as the flexible-length instruction extension (FLIX). We evaluate the number of clock cycles required for RLNC encoding and decoding without and with the MAC, SIMD, and FLIX acceleration strategies. We also evaluate the RLNC encoding and decoding throughput and energy consumption for a range of RLNC generation and code word sizes. We find that for GF(28 ) and GF(216 ) RLNC encoding, the SIMD and FLIX acceleration strategies achieve speedups of approximately four hundred fold compared to a benchmark C code implementation without TIE. We also find that the unicore Xtensa LX5 with SIMD has seven to thirty times higher RLNC encoding and decoding throughput than the state-of-the-art ODROID XU3 system-on-a-chip (SoC) operating with a single core; the Xtensa LX5 with FLIX, in turn, increases the throughput by roughly 25% compared to utilizing only SIMD. Furthermore, the Xtensa LX5 with FLIX consumes roughly four orders of magnitude less energy than the ODROID XU3 SoC.

AB - Random linear network coding (RLNC) can greatly aid data transmission in lossy wireless networks. However, RLNC requires computationally complex matrix multiplications and inversions in finite fields (Galois fields). These computations are highly demanding for energy-constrained mobile devices. The presented case study evaluates hardware acceleration strategies for RLNC in the context of the Tensilica Xtensa LX5 processor with the tensilica instruction set extension (TIE). More specifically, we develop TIEs for multiply-accumulate (MAC) operations for accelerating matrix multiplications in Galois fields, single instruction multiple data (SIMD) instructions operating on consecutive memory locations, as well as the flexible-length instruction extension (FLIX). We evaluate the number of clock cycles required for RLNC encoding and decoding without and with the MAC, SIMD, and FLIX acceleration strategies. We also evaluate the RLNC encoding and decoding throughput and energy consumption for a range of RLNC generation and code word sizes. We find that for GF(28 ) and GF(216 ) RLNC encoding, the SIMD and FLIX acceleration strategies achieve speedups of approximately four hundred fold compared to a benchmark C code implementation without TIE. We also find that the unicore Xtensa LX5 with SIMD has seven to thirty times higher RLNC encoding and decoding throughput than the state-of-the-art ODROID XU3 system-on-a-chip (SoC) operating with a single core; the Xtensa LX5 with FLIX, in turn, increases the throughput by roughly 25% compared to utilizing only SIMD. Furthermore, the Xtensa LX5 with FLIX consumes roughly four orders of magnitude less energy than the ODROID XU3 SoC.

KW - Application-specific instruction-set processor (ASIP)

KW - Flexible-length instruction extension (FLIX)

KW - Galois field

KW - Hardware acceleration

KW - Multiply-accumulate (MAC) operations

KW - Random linear network coding (RLNC)

KW - Single instruction multiple data (SIMD)

UR - http://www.scopus.com/inward/record.url?scp=85053630811&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85053630811&partnerID=8YFLogxK

U2 - 10.3390/electronics7090180

DO - 10.3390/electronics7090180

M3 - Article

AN - SCOPUS:85053630811

SN - 2079-9292

VL - 7

JO - Electronics (Switzerland)

JF - Electronics (Switzerland)

IS - 9

M1 - 180

ER -

Hardware acceleration for RLNC: A case study based on the xtensa processor with the tensilica instruction-set extension

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this