Reducing energy and increasing performance with traffic optimization in many-core systems

George B.P. Bezerra, Stephanie Forrest, Payman Zarkesh-Ha

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Citations (Scopus)

Abstract

As the number of cores on a die continues to increase, it is necessary to optimize the traffic patterns of applications in order to minimize power consumption and maximize performance. We present a new approach for traffic optimization in many-core systems, which targets communication locality and load-balancing. Our approach works by mapping memory blocks to physical locations on the chip that are close to cores that access them, and by enforcing load balance by limiting the number of blocks mapped to each location. Communication locality reduces the average distance traveled by packets, which minimizes power and increases performance. Load-balancing avoids hotspots and improves cache utilization. Rather than treating every application in the same way, our method uses available information to produce mappings that are specially tuned for individual applications. Simulations performed on a 64-core system show a reduction in dynamic energy consumption of up to 81.6% and of 45.5% on average, and gains in performance of up to 13.2% on scientific benchmarks.

Original languageEnglish (US)
Title of host publication2011 13th International Workshop on System Level Interconnect Prediction, SLIP 2011
DOIs
StatePublished - Dec 1 2011
Externally publishedYes
Event2011 13th International Workshop on System Level Interconnect Prediction, SLIP 2011 - San Diego, CA, United States
Duration: Jun 5 2011Jun 5 2011

Other

Other2011 13th International Workshop on System Level Interconnect Prediction, SLIP 2011
CountryUnited States
CitySan Diego, CA
Period6/5/116/5/11

Fingerprint

Many-core
Traffic
Load Balancing
Locality
Resource allocation
Optimization
Energy
Minimise
Load Balance
Average Distance
Information use
Communication
Hot Spot
Cache
Power Consumption
Energy Consumption
Electric power utilization
Chip
Continue
Die

Keywords

  • communication graph
  • communication locality
  • load-balancing
  • many-core
  • memory-block mapping
  • non-uniform cache access
  • Traffic optimization

ASJC Scopus subject areas

  • Hardware and Architecture
  • Electrical and Electronic Engineering
  • Computer Science Applications
  • Applied Mathematics

Cite this

Bezerra, G. B. P., Forrest, S., & Zarkesh-Ha, P. (2011). Reducing energy and increasing performance with traffic optimization in many-core systems. In 2011 13th International Workshop on System Level Interconnect Prediction, SLIP 2011 [6135429] https://doi.org/10.1109/SLIP.2011.6135429

Reducing energy and increasing performance with traffic optimization in many-core systems. / Bezerra, George B.P.; Forrest, Stephanie; Zarkesh-Ha, Payman.

2011 13th International Workshop on System Level Interconnect Prediction, SLIP 2011. 2011. 6135429.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Bezerra, GBP, Forrest, S & Zarkesh-Ha, P 2011, Reducing energy and increasing performance with traffic optimization in many-core systems. in 2011 13th International Workshop on System Level Interconnect Prediction, SLIP 2011., 6135429, 2011 13th International Workshop on System Level Interconnect Prediction, SLIP 2011, San Diego, CA, United States, 6/5/11. https://doi.org/10.1109/SLIP.2011.6135429
Bezerra GBP, Forrest S, Zarkesh-Ha P. Reducing energy and increasing performance with traffic optimization in many-core systems. In 2011 13th International Workshop on System Level Interconnect Prediction, SLIP 2011. 2011. 6135429 https://doi.org/10.1109/SLIP.2011.6135429
Bezerra, George B.P. ; Forrest, Stephanie ; Zarkesh-Ha, Payman. / Reducing energy and increasing performance with traffic optimization in many-core systems. 2011 13th International Workshop on System Level Interconnect Prediction, SLIP 2011. 2011.
@inproceedings{b7a3b78567b4449790aafd89916a5d68,
title = "Reducing energy and increasing performance with traffic optimization in many-core systems",
abstract = "As the number of cores on a die continues to increase, it is necessary to optimize the traffic patterns of applications in order to minimize power consumption and maximize performance. We present a new approach for traffic optimization in many-core systems, which targets communication locality and load-balancing. Our approach works by mapping memory blocks to physical locations on the chip that are close to cores that access them, and by enforcing load balance by limiting the number of blocks mapped to each location. Communication locality reduces the average distance traveled by packets, which minimizes power and increases performance. Load-balancing avoids hotspots and improves cache utilization. Rather than treating every application in the same way, our method uses available information to produce mappings that are specially tuned for individual applications. Simulations performed on a 64-core system show a reduction in dynamic energy consumption of up to 81.6{\%} and of 45.5{\%} on average, and gains in performance of up to 13.2{\%} on scientific benchmarks.",
keywords = "communication graph, communication locality, load-balancing, many-core, memory-block mapping, non-uniform cache access, Traffic optimization",
author = "Bezerra, {George B.P.} and Stephanie Forrest and Payman Zarkesh-Ha",
year = "2011",
month = "12",
day = "1",
doi = "10.1109/SLIP.2011.6135429",
language = "English (US)",
isbn = "9781457712401",
booktitle = "2011 13th International Workshop on System Level Interconnect Prediction, SLIP 2011",

}

TY - GEN

T1 - Reducing energy and increasing performance with traffic optimization in many-core systems

AU - Bezerra, George B.P.

AU - Forrest, Stephanie

AU - Zarkesh-Ha, Payman

PY - 2011/12/1

Y1 - 2011/12/1

N2 - As the number of cores on a die continues to increase, it is necessary to optimize the traffic patterns of applications in order to minimize power consumption and maximize performance. We present a new approach for traffic optimization in many-core systems, which targets communication locality and load-balancing. Our approach works by mapping memory blocks to physical locations on the chip that are close to cores that access them, and by enforcing load balance by limiting the number of blocks mapped to each location. Communication locality reduces the average distance traveled by packets, which minimizes power and increases performance. Load-balancing avoids hotspots and improves cache utilization. Rather than treating every application in the same way, our method uses available information to produce mappings that are specially tuned for individual applications. Simulations performed on a 64-core system show a reduction in dynamic energy consumption of up to 81.6% and of 45.5% on average, and gains in performance of up to 13.2% on scientific benchmarks.

AB - As the number of cores on a die continues to increase, it is necessary to optimize the traffic patterns of applications in order to minimize power consumption and maximize performance. We present a new approach for traffic optimization in many-core systems, which targets communication locality and load-balancing. Our approach works by mapping memory blocks to physical locations on the chip that are close to cores that access them, and by enforcing load balance by limiting the number of blocks mapped to each location. Communication locality reduces the average distance traveled by packets, which minimizes power and increases performance. Load-balancing avoids hotspots and improves cache utilization. Rather than treating every application in the same way, our method uses available information to produce mappings that are specially tuned for individual applications. Simulations performed on a 64-core system show a reduction in dynamic energy consumption of up to 81.6% and of 45.5% on average, and gains in performance of up to 13.2% on scientific benchmarks.

KW - communication graph

KW - communication locality

KW - load-balancing

KW - many-core

KW - memory-block mapping

KW - non-uniform cache access

KW - Traffic optimization

UR - http://www.scopus.com/inward/record.url?scp=84857205032&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84857205032&partnerID=8YFLogxK

U2 - 10.1109/SLIP.2011.6135429

DO - 10.1109/SLIP.2011.6135429

M3 - Conference contribution

AN - SCOPUS:84857205032

SN - 9781457712401

BT - 2011 13th International Workshop on System Level Interconnect Prediction, SLIP 2011

ER -