TY - GEN
T1 - Reducing energy and increasing performance with traffic optimization in many-core systems
AU - Bezerra, George B.P.
AU - Forrest, Stephanie
AU - Zarkesh-Ha, Payman
PY - 2011
Y1 - 2011
N2 - As the number of cores on a die continues to increase, it is necessary to optimize the traffic patterns of applications in order to minimize power consumption and maximize performance. We present a new approach for traffic optimization in many-core systems, which targets communication locality and load-balancing. Our approach works by mapping memory blocks to physical locations on the chip that are close to cores that access them, and by enforcing load balance by limiting the number of blocks mapped to each location. Communication locality reduces the average distance traveled by packets, which minimizes power and increases performance. Load-balancing avoids hotspots and improves cache utilization. Rather than treating every application in the same way, our method uses available information to produce mappings that are specially tuned for individual applications. Simulations performed on a 64-core system show a reduction in dynamic energy consumption of up to 81.6% and of 45.5% on average, and gains in performance of up to 13.2% on scientific benchmarks.
AB - As the number of cores on a die continues to increase, it is necessary to optimize the traffic patterns of applications in order to minimize power consumption and maximize performance. We present a new approach for traffic optimization in many-core systems, which targets communication locality and load-balancing. Our approach works by mapping memory blocks to physical locations on the chip that are close to cores that access them, and by enforcing load balance by limiting the number of blocks mapped to each location. Communication locality reduces the average distance traveled by packets, which minimizes power and increases performance. Load-balancing avoids hotspots and improves cache utilization. Rather than treating every application in the same way, our method uses available information to produce mappings that are specially tuned for individual applications. Simulations performed on a 64-core system show a reduction in dynamic energy consumption of up to 81.6% and of 45.5% on average, and gains in performance of up to 13.2% on scientific benchmarks.
KW - Traffic optimization
KW - communication graph
KW - communication locality
KW - load-balancing
KW - many-core
KW - memory-block mapping
KW - non-uniform cache access
UR - http://www.scopus.com/inward/record.url?scp=84857205032&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84857205032&partnerID=8YFLogxK
U2 - 10.1109/SLIP.2011.6135429
DO - 10.1109/SLIP.2011.6135429
M3 - Conference contribution
AN - SCOPUS:84857205032
SN - 9781457712401
T3 - International Workshop on System Level Interconnect Prediction, SLIP
BT - 2011 13th International Workshop on System Level Interconnect Prediction, SLIP 2011
T2 - 2011 13th International Workshop on System Level Interconnect Prediction, SLIP 2011
Y2 - 5 June 2011 through 5 June 2011
ER -