Network Coding in Heterogeneous Multicore IoT Nodes with DAG Scheduling of Parallel Matrix Block Operations

Simon Wunderlich; Juan A. Cabrera; Frank H.P. Fitzek; Martin Reisslein

doi:10.1109/JIOT.2017.2703813

Network Coding in Heterogeneous Multicore IoT Nodes with DAG Scheduling of Parallel Matrix Block Operations

Simon Wunderlich, Juan A. Cabrera, Frank H.P. Fitzek, Martin Reisslein

Research output: Contribution to journal › Article › peer-review

47 Scopus citations

Abstract

Random linear network coding (RLNC) has the potential to improve the performance of current and future Internet of Things (IoT) communication systems, but is computationally demanding due to matrix multiplications and inversions. Some single-core RLNC implementations achieve already sufficient coding speeds for contemporary multimedia streaming formats. However, advances in multimedia streaming formats and IoT applications will require the exploitation of heterogeneous multicore architectures, which are becoming common for a wide range of IoT nodes, including smartphones. In this paper, we introduce and evaluate efficient RLNC computing strategies for IoT node architectures, including the emerging heterogeneous big.LITTLE multicore architectures with multiple big (fast) cores and multiple LITTLE (slow) cores. In contrast to existing RLNC implementation strategies, we build on and adapt highly optimized dense matrix operations from the high performance computing field to RLNC on heterogeneous multicore IoT nodes. Our approach includes the optimization of RLNC matrix operations through optimized operations on matrix blocks with single instruction multiple data instructions. We schedule block operations on the heterogeneous cores through a directed acyclic graph that avoids artificial synchronization points while ensuring the data dependencies. We examine priority scheduling according to the number of outgoing dependencies of a task and data locality of cached blocks. Our extensive measurements with several heterogeneous big.LITTLE multicore IoT node and smartphone processor boards demonstrate higher RLNC encoding and decoding throughputs than existing approaches. Moreover, our measurements indicate that the utilization of more cores decreases energy consumption, which is an important goal for IoT nodes.

Original language	English (US)
Article number	7926320
Pages (from-to)	917-933
Number of pages	17
Journal	IEEE Internet of Things Journal
Volume	4
Issue number	4
DOIs	https://doi.org/10.1109/JIOT.2017.2703813
State	Published - Aug 2017

Keywords

Directed acyclic graph (DAG)
Internet of Things (IoT) node
heterogeneous multicore architecture
matrix inversion
matrix multiplication
parallel computing
random linear network coding (RLNC)
smartphone

ASJC Scopus subject areas

Signal Processing
Information Systems
Hardware and Architecture
Computer Science Applications
Computer Networks and Communications

Access to Document

10.1109/JIOT.2017.2703813

Cite this

@article{cbed82ef704a417383c2a113577ace19,

title = "Network Coding in Heterogeneous Multicore IoT Nodes with DAG Scheduling of Parallel Matrix Block Operations",

abstract = "Random linear network coding (RLNC) has the potential to improve the performance of current and future Internet of Things (IoT) communication systems, but is computationally demanding due to matrix multiplications and inversions. Some single-core RLNC implementations achieve already sufficient coding speeds for contemporary multimedia streaming formats. However, advances in multimedia streaming formats and IoT applications will require the exploitation of heterogeneous multicore architectures, which are becoming common for a wide range of IoT nodes, including smartphones. In this paper, we introduce and evaluate efficient RLNC computing strategies for IoT node architectures, including the emerging heterogeneous big.LITTLE multicore architectures with multiple big (fast) cores and multiple LITTLE (slow) cores. In contrast to existing RLNC implementation strategies, we build on and adapt highly optimized dense matrix operations from the high performance computing field to RLNC on heterogeneous multicore IoT nodes. Our approach includes the optimization of RLNC matrix operations through optimized operations on matrix blocks with single instruction multiple data instructions. We schedule block operations on the heterogeneous cores through a directed acyclic graph that avoids artificial synchronization points while ensuring the data dependencies. We examine priority scheduling according to the number of outgoing dependencies of a task and data locality of cached blocks. Our extensive measurements with several heterogeneous big.LITTLE multicore IoT node and smartphone processor boards demonstrate higher RLNC encoding and decoding throughputs than existing approaches. Moreover, our measurements indicate that the utilization of more cores decreases energy consumption, which is an important goal for IoT nodes.",

keywords = "Directed acyclic graph (DAG), Internet of Things (IoT) node, heterogeneous multicore architecture, matrix inversion, matrix multiplication, parallel computing, random linear network coding (RLNC), smartphone",

author = "Simon Wunderlich and Cabrera, {Juan A.} and Fitzek, {Frank H.P.} and Martin Reisslein",

note = "Funding Information: Manuscript received January 30, 2017; revised April 24, 2017; accepted May 9, 2017. Date of publication May 11, 2017; date of current version August 9, 2017. This work was supported in part by the German Research Foundation (DFG) in the Collaborative Research Center 912 Highly Adaptive Energy-Efficient Computing (HAEC) and in part by a DRESDEN Senior Fellowship. A preliminary form of the multicore approach appeared in [1]. (Corresponding author: Frank H. P. Fitzek.) S. Wunderlich, J. A. Cabrera, and F. H. P. Fitzek are with the Deutsche Telekom Chair of Communication Networks, Technische Universitat Dresden, 01062 Dresden, Germany (e-mail: simon.wunderlich@mailbox.tu-dresden.de; juan.cabrera@tu-dresden.de; frank.fitzek@tu-dresden.de). Publisher Copyright: {\textcopyright} 2014 IEEE.",

year = "2017",

month = aug,

doi = "10.1109/JIOT.2017.2703813",

language = "English (US)",

volume = "4",

pages = "917--933",

journal = "IEEE Internet of Things Journal",

issn = "2327-4662",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

number = "4",

}

TY - JOUR

T1 - Network Coding in Heterogeneous Multicore IoT Nodes with DAG Scheduling of Parallel Matrix Block Operations

AU - Wunderlich, Simon

AU - Cabrera, Juan A.

AU - Fitzek, Frank H.P.

AU - Reisslein, Martin

N1 - Funding Information: Manuscript received January 30, 2017; revised April 24, 2017; accepted May 9, 2017. Date of publication May 11, 2017; date of current version August 9, 2017. This work was supported in part by the German Research Foundation (DFG) in the Collaborative Research Center 912 Highly Adaptive Energy-Efficient Computing (HAEC) and in part by a DRESDEN Senior Fellowship. A preliminary form of the multicore approach appeared in [1]. (Corresponding author: Frank H. P. Fitzek.) S. Wunderlich, J. A. Cabrera, and F. H. P. Fitzek are with the Deutsche Telekom Chair of Communication Networks, Technische Universitat Dresden, 01062 Dresden, Germany (e-mail: simon.wunderlich@mailbox.tu-dresden.de; juan.cabrera@tu-dresden.de; frank.fitzek@tu-dresden.de). Publisher Copyright: © 2014 IEEE.

PY - 2017/8

Y1 - 2017/8

N2 - Random linear network coding (RLNC) has the potential to improve the performance of current and future Internet of Things (IoT) communication systems, but is computationally demanding due to matrix multiplications and inversions. Some single-core RLNC implementations achieve already sufficient coding speeds for contemporary multimedia streaming formats. However, advances in multimedia streaming formats and IoT applications will require the exploitation of heterogeneous multicore architectures, which are becoming common for a wide range of IoT nodes, including smartphones. In this paper, we introduce and evaluate efficient RLNC computing strategies for IoT node architectures, including the emerging heterogeneous big.LITTLE multicore architectures with multiple big (fast) cores and multiple LITTLE (slow) cores. In contrast to existing RLNC implementation strategies, we build on and adapt highly optimized dense matrix operations from the high performance computing field to RLNC on heterogeneous multicore IoT nodes. Our approach includes the optimization of RLNC matrix operations through optimized operations on matrix blocks with single instruction multiple data instructions. We schedule block operations on the heterogeneous cores through a directed acyclic graph that avoids artificial synchronization points while ensuring the data dependencies. We examine priority scheduling according to the number of outgoing dependencies of a task and data locality of cached blocks. Our extensive measurements with several heterogeneous big.LITTLE multicore IoT node and smartphone processor boards demonstrate higher RLNC encoding and decoding throughputs than existing approaches. Moreover, our measurements indicate that the utilization of more cores decreases energy consumption, which is an important goal for IoT nodes.

AB - Random linear network coding (RLNC) has the potential to improve the performance of current and future Internet of Things (IoT) communication systems, but is computationally demanding due to matrix multiplications and inversions. Some single-core RLNC implementations achieve already sufficient coding speeds for contemporary multimedia streaming formats. However, advances in multimedia streaming formats and IoT applications will require the exploitation of heterogeneous multicore architectures, which are becoming common for a wide range of IoT nodes, including smartphones. In this paper, we introduce and evaluate efficient RLNC computing strategies for IoT node architectures, including the emerging heterogeneous big.LITTLE multicore architectures with multiple big (fast) cores and multiple LITTLE (slow) cores. In contrast to existing RLNC implementation strategies, we build on and adapt highly optimized dense matrix operations from the high performance computing field to RLNC on heterogeneous multicore IoT nodes. Our approach includes the optimization of RLNC matrix operations through optimized operations on matrix blocks with single instruction multiple data instructions. We schedule block operations on the heterogeneous cores through a directed acyclic graph that avoids artificial synchronization points while ensuring the data dependencies. We examine priority scheduling according to the number of outgoing dependencies of a task and data locality of cached blocks. Our extensive measurements with several heterogeneous big.LITTLE multicore IoT node and smartphone processor boards demonstrate higher RLNC encoding and decoding throughputs than existing approaches. Moreover, our measurements indicate that the utilization of more cores decreases energy consumption, which is an important goal for IoT nodes.

KW - Directed acyclic graph (DAG)

KW - Internet of Things (IoT) node

KW - heterogeneous multicore architecture

KW - matrix inversion

KW - matrix multiplication

KW - parallel computing

KW - random linear network coding (RLNC)

KW - smartphone

UR - http://www.scopus.com/inward/record.url?scp=85029541631&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85029541631&partnerID=8YFLogxK

U2 - 10.1109/JIOT.2017.2703813

DO - 10.1109/JIOT.2017.2703813

M3 - Article

AN - SCOPUS:85029541631

SN - 2327-4662

VL - 4

SP - 917

EP - 933

JO - IEEE Internet of Things Journal

JF - IEEE Internet of Things Journal

IS - 4

M1 - 7926320

ER -

Network Coding in Heterogeneous Multicore IoT Nodes with DAG Scheduling of Parallel Matrix Block Operations

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this