TY - JOUR
T1 - Network Coding in Heterogeneous Multicore IoT Nodes with DAG Scheduling of Parallel Matrix Block Operations
AU - Wunderlich, Simon
AU - Cabrera, Juan A.
AU - Fitzek, Frank H.P.
AU - Reisslein, Martin
N1 - Funding Information:
Manuscript received January 30, 2017; revised April 24, 2017; accepted May 9, 2017. Date of publication May 11, 2017; date of current version August 9, 2017. This work was supported in part by the German Research Foundation (DFG) in the Collaborative Research Center 912 Highly Adaptive Energy-Efficient Computing (HAEC) and in part by a DRESDEN Senior Fellowship. A preliminary form of the multicore approach appeared in [1]. (Corresponding author: Frank H. P. Fitzek.) S. Wunderlich, J. A. Cabrera, and F. H. P. Fitzek are with the Deutsche Telekom Chair of Communication Networks, Technische Universitat Dresden, 01062 Dresden, Germany (e-mail: simon.wunderlich@mailbox.tu-dresden.de; juan.cabrera@tu-dresden.de; frank.fitzek@tu-dresden.de).
Publisher Copyright:
© 2014 IEEE.
PY - 2017/8
Y1 - 2017/8
N2 - Random linear network coding (RLNC) has the potential to improve the performance of current and future Internet of Things (IoT) communication systems, but is computationally demanding due to matrix multiplications and inversions. Some single-core RLNC implementations achieve already sufficient coding speeds for contemporary multimedia streaming formats. However, advances in multimedia streaming formats and IoT applications will require the exploitation of heterogeneous multicore architectures, which are becoming common for a wide range of IoT nodes, including smartphones. In this paper, we introduce and evaluate efficient RLNC computing strategies for IoT node architectures, including the emerging heterogeneous big.LITTLE multicore architectures with multiple big (fast) cores and multiple LITTLE (slow) cores. In contrast to existing RLNC implementation strategies, we build on and adapt highly optimized dense matrix operations from the high performance computing field to RLNC on heterogeneous multicore IoT nodes. Our approach includes the optimization of RLNC matrix operations through optimized operations on matrix blocks with single instruction multiple data instructions. We schedule block operations on the heterogeneous cores through a directed acyclic graph that avoids artificial synchronization points while ensuring the data dependencies. We examine priority scheduling according to the number of outgoing dependencies of a task and data locality of cached blocks. Our extensive measurements with several heterogeneous big.LITTLE multicore IoT node and smartphone processor boards demonstrate higher RLNC encoding and decoding throughputs than existing approaches. Moreover, our measurements indicate that the utilization of more cores decreases energy consumption, which is an important goal for IoT nodes.
AB - Random linear network coding (RLNC) has the potential to improve the performance of current and future Internet of Things (IoT) communication systems, but is computationally demanding due to matrix multiplications and inversions. Some single-core RLNC implementations achieve already sufficient coding speeds for contemporary multimedia streaming formats. However, advances in multimedia streaming formats and IoT applications will require the exploitation of heterogeneous multicore architectures, which are becoming common for a wide range of IoT nodes, including smartphones. In this paper, we introduce and evaluate efficient RLNC computing strategies for IoT node architectures, including the emerging heterogeneous big.LITTLE multicore architectures with multiple big (fast) cores and multiple LITTLE (slow) cores. In contrast to existing RLNC implementation strategies, we build on and adapt highly optimized dense matrix operations from the high performance computing field to RLNC on heterogeneous multicore IoT nodes. Our approach includes the optimization of RLNC matrix operations through optimized operations on matrix blocks with single instruction multiple data instructions. We schedule block operations on the heterogeneous cores through a directed acyclic graph that avoids artificial synchronization points while ensuring the data dependencies. We examine priority scheduling according to the number of outgoing dependencies of a task and data locality of cached blocks. Our extensive measurements with several heterogeneous big.LITTLE multicore IoT node and smartphone processor boards demonstrate higher RLNC encoding and decoding throughputs than existing approaches. Moreover, our measurements indicate that the utilization of more cores decreases energy consumption, which is an important goal for IoT nodes.
KW - Directed acyclic graph (DAG)
KW - Internet of Things (IoT) node
KW - heterogeneous multicore architecture
KW - matrix inversion
KW - matrix multiplication
KW - parallel computing
KW - random linear network coding (RLNC)
KW - smartphone
UR - http://www.scopus.com/inward/record.url?scp=85029541631&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85029541631&partnerID=8YFLogxK
U2 - 10.1109/JIOT.2017.2703813
DO - 10.1109/JIOT.2017.2703813
M3 - Article
AN - SCOPUS:85029541631
SN - 2327-4662
VL - 4
SP - 917
EP - 933
JO - IEEE Internet of Things Journal
JF - IEEE Internet of Things Journal
IS - 4
M1 - 7926320
ER -