Efficient code assignment techniques for local memory on software managed multicores

Jing Lu, Ke Bai, Aviral Shrivastava

Research output: Contribution to journalArticle

6 Citations (Scopus)

Abstract

Scaling the memory hierarchy is a major challenge when we scale the number of cores in a multicore processor. Software Managed Multicore (SMM) architectures come up as one of the promising solutions. In an SMM architecture, there are no caches, and each core has only a local scratchpad memory [Banakar et al. 2002]. As the local memory usually is small, large applications cannot be directly executed on it. Code and data of the task mapped to each core need to be managed between global memory and local memory. This article solves the problem of efficiently managing code on an SMM architecture. The primary requirement of generating efficient code assignments is a correct management cost model. In this article, we address this problem by proposing a cost calculation graph. In addition, we develop two heuristics CMSM (Code Mapping for Software Managed multicores) and CMSM-advanced that result in efficient code management execution on the local scratchpad memory. Experimental results collected after executing applications from the MiBench suite [Guthaus et al. 2001] demonstrate that merely by adopting the correct management cost calculation, even using previous code assignment schemes, we can improve performance by an average of 12%. Combining the correct management cost model and a more optimized code mapping algorithm together, our heuristics can reduce runtime in more than 80% of the cases, and by up to 20% on our set of benchmarks, compared to the state-of-the-art code assignment approach [Jung et al. 2010]. When compared with Instruction-level Parallelism (ILP) results, CMSM-advanced performs an average of 5% worse.We also simulate the benchmarks on a cache-based system, and find that the code management overhead on SMM core with our code management is much less than memory latency of a cache-based system.

Original languageEnglish (US)
Article number71
JournalACM Transactions on Embedded Computing Systems
Volume14
Issue number4
DOIs
StatePublished - Dec 1 2015

Fingerprint

Data storage equipment
Costs

Keywords

  • Code
  • Embedded systems
  • Instruction
  • Localmemory
  • Multicore processor
  • Scratchpad memory

ASJC Scopus subject areas

  • Hardware and Architecture
  • Software

Cite this

Efficient code assignment techniques for local memory on software managed multicores. / Lu, Jing; Bai, Ke; Shrivastava, Aviral.

In: ACM Transactions on Embedded Computing Systems, Vol. 14, No. 4, 71, 01.12.2015.

Research output: Contribution to journalArticle

@article{387ecf468ea24b6ab1203252ff7c9d70,
title = "Efficient code assignment techniques for local memory on software managed multicores",
abstract = "Scaling the memory hierarchy is a major challenge when we scale the number of cores in a multicore processor. Software Managed Multicore (SMM) architectures come up as one of the promising solutions. In an SMM architecture, there are no caches, and each core has only a local scratchpad memory [Banakar et al. 2002]. As the local memory usually is small, large applications cannot be directly executed on it. Code and data of the task mapped to each core need to be managed between global memory and local memory. This article solves the problem of efficiently managing code on an SMM architecture. The primary requirement of generating efficient code assignments is a correct management cost model. In this article, we address this problem by proposing a cost calculation graph. In addition, we develop two heuristics CMSM (Code Mapping for Software Managed multicores) and CMSM-advanced that result in efficient code management execution on the local scratchpad memory. Experimental results collected after executing applications from the MiBench suite [Guthaus et al. 2001] demonstrate that merely by adopting the correct management cost calculation, even using previous code assignment schemes, we can improve performance by an average of 12{\%}. Combining the correct management cost model and a more optimized code mapping algorithm together, our heuristics can reduce runtime in more than 80{\%} of the cases, and by up to 20{\%} on our set of benchmarks, compared to the state-of-the-art code assignment approach [Jung et al. 2010]. When compared with Instruction-level Parallelism (ILP) results, CMSM-advanced performs an average of 5{\%} worse.We also simulate the benchmarks on a cache-based system, and find that the code management overhead on SMM core with our code management is much less than memory latency of a cache-based system.",
keywords = "Code, Embedded systems, Instruction, Localmemory, Multicore processor, Scratchpad memory",
author = "Jing Lu and Ke Bai and Aviral Shrivastava",
year = "2015",
month = "12",
day = "1",
doi = "10.1145/2738039",
language = "English (US)",
volume = "14",
journal = "ACM Transactions on Embedded Computing Systems",
issn = "1539-9087",
publisher = "Association for Computing Machinery (ACM)",
number = "4",

}

TY - JOUR

T1 - Efficient code assignment techniques for local memory on software managed multicores

AU - Lu, Jing

AU - Bai, Ke

AU - Shrivastava, Aviral

PY - 2015/12/1

Y1 - 2015/12/1

N2 - Scaling the memory hierarchy is a major challenge when we scale the number of cores in a multicore processor. Software Managed Multicore (SMM) architectures come up as one of the promising solutions. In an SMM architecture, there are no caches, and each core has only a local scratchpad memory [Banakar et al. 2002]. As the local memory usually is small, large applications cannot be directly executed on it. Code and data of the task mapped to each core need to be managed between global memory and local memory. This article solves the problem of efficiently managing code on an SMM architecture. The primary requirement of generating efficient code assignments is a correct management cost model. In this article, we address this problem by proposing a cost calculation graph. In addition, we develop two heuristics CMSM (Code Mapping for Software Managed multicores) and CMSM-advanced that result in efficient code management execution on the local scratchpad memory. Experimental results collected after executing applications from the MiBench suite [Guthaus et al. 2001] demonstrate that merely by adopting the correct management cost calculation, even using previous code assignment schemes, we can improve performance by an average of 12%. Combining the correct management cost model and a more optimized code mapping algorithm together, our heuristics can reduce runtime in more than 80% of the cases, and by up to 20% on our set of benchmarks, compared to the state-of-the-art code assignment approach [Jung et al. 2010]. When compared with Instruction-level Parallelism (ILP) results, CMSM-advanced performs an average of 5% worse.We also simulate the benchmarks on a cache-based system, and find that the code management overhead on SMM core with our code management is much less than memory latency of a cache-based system.

AB - Scaling the memory hierarchy is a major challenge when we scale the number of cores in a multicore processor. Software Managed Multicore (SMM) architectures come up as one of the promising solutions. In an SMM architecture, there are no caches, and each core has only a local scratchpad memory [Banakar et al. 2002]. As the local memory usually is small, large applications cannot be directly executed on it. Code and data of the task mapped to each core need to be managed between global memory and local memory. This article solves the problem of efficiently managing code on an SMM architecture. The primary requirement of generating efficient code assignments is a correct management cost model. In this article, we address this problem by proposing a cost calculation graph. In addition, we develop two heuristics CMSM (Code Mapping for Software Managed multicores) and CMSM-advanced that result in efficient code management execution on the local scratchpad memory. Experimental results collected after executing applications from the MiBench suite [Guthaus et al. 2001] demonstrate that merely by adopting the correct management cost calculation, even using previous code assignment schemes, we can improve performance by an average of 12%. Combining the correct management cost model and a more optimized code mapping algorithm together, our heuristics can reduce runtime in more than 80% of the cases, and by up to 20% on our set of benchmarks, compared to the state-of-the-art code assignment approach [Jung et al. 2010]. When compared with Instruction-level Parallelism (ILP) results, CMSM-advanced performs an average of 5% worse.We also simulate the benchmarks on a cache-based system, and find that the code management overhead on SMM core with our code management is much less than memory latency of a cache-based system.

KW - Code

KW - Embedded systems

KW - Instruction

KW - Localmemory

KW - Multicore processor

KW - Scratchpad memory

UR - http://www.scopus.com/inward/record.url?scp=84952909242&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84952909242&partnerID=8YFLogxK

U2 - 10.1145/2738039

DO - 10.1145/2738039

M3 - Article

VL - 14

JO - ACM Transactions on Embedded Computing Systems

JF - ACM Transactions on Embedded Computing Systems

SN - 1539-9087

IS - 4

M1 - 71

ER -