Understanding the future of energy efficiency in multi-module GPUs

Akhil Arunkumar, Evgeny Bolotin, David Nellans, Carole-Jean Wu

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Citation (Scopus)

Abstract

As Moore's law slows down, GPUs must pivot towards multi-module designs to continue scaling performance at historical rates. Prior work on multi-module GPUs has focused on performance, while largely ignoring the issue of energy efficiency. In this work, we propose a new metric for GPU efficiency called EDP Scaling Efficiency that quantifies the effects of both strong performance scaling and overall energy efficiency in these designs. To enable this analysis, we develop a novel top-down GPU energy estimation framework that is accurate within 10% of a recent GPU design. Being decoupled from granular GPU microarchitectural details, the framework is appropriate for energy efficiency studies in future GPUs. Using this model in conjunction with performance simulation, we show that the dominating factor influencing the energy efficiency of GPUs over the next decade is GPUmodule (GPM) idle time. Furthermore, neither inter-module interconnect energy, nor GPM microarchitectural design is expected to play a key role in this regard. We demonstrate that multi-module GPUs are on a trajectory to become 2× less energy efficient than current monolithic designs; a significant issue for data centers which are already energy constrained. Finally, we show that architects must be willing to spend more (not less) energy to enable higher bandwidth inter-GPM connections, because counter-intuitively, this additional energy expenditure can reduce total GPU energy consumption by as much as 45%, providing a path to energy efficient strong scaling in the future.

Original languageEnglish (US)
Title of host publicationProceedings - 25th IEEE International Symposium on High Performance Computer Architecture, HPCA 2019
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages519-532
Number of pages14
ISBN (Electronic)9781728114446
DOIs
StatePublished - Mar 26 2019
Event25th IEEE International Symposium on High Performance Computer Architecture, HPCA 2019 - Washington, United States
Duration: Feb 16 2019Feb 20 2019

Publication series

NameProceedings - 25th IEEE International Symposium on High Performance Computer Architecture, HPCA 2019

Conference

Conference25th IEEE International Symposium on High Performance Computer Architecture, HPCA 2019
CountryUnited States
CityWashington
Period2/16/192/20/19

Fingerprint

Energy efficiency
Graphics processing unit
Energy utilization
Trajectories
Bandwidth

Keywords

  • Energy Efficiency
  • Energy Model
  • GPU
  • Moore's Law
  • Multi Chip Module
  • NUMA

ASJC Scopus subject areas

  • Software
  • Hardware and Architecture
  • Computer Networks and Communications

Cite this

Arunkumar, A., Bolotin, E., Nellans, D., & Wu, C-J. (2019). Understanding the future of energy efficiency in multi-module GPUs. In Proceedings - 25th IEEE International Symposium on High Performance Computer Architecture, HPCA 2019 (pp. 519-532). [8675192] (Proceedings - 25th IEEE International Symposium on High Performance Computer Architecture, HPCA 2019). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/HPCA.2019.00063

Understanding the future of energy efficiency in multi-module GPUs. / Arunkumar, Akhil; Bolotin, Evgeny; Nellans, David; Wu, Carole-Jean.

Proceedings - 25th IEEE International Symposium on High Performance Computer Architecture, HPCA 2019. Institute of Electrical and Electronics Engineers Inc., 2019. p. 519-532 8675192 (Proceedings - 25th IEEE International Symposium on High Performance Computer Architecture, HPCA 2019).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Arunkumar, A, Bolotin, E, Nellans, D & Wu, C-J 2019, Understanding the future of energy efficiency in multi-module GPUs. in Proceedings - 25th IEEE International Symposium on High Performance Computer Architecture, HPCA 2019., 8675192, Proceedings - 25th IEEE International Symposium on High Performance Computer Architecture, HPCA 2019, Institute of Electrical and Electronics Engineers Inc., pp. 519-532, 25th IEEE International Symposium on High Performance Computer Architecture, HPCA 2019, Washington, United States, 2/16/19. https://doi.org/10.1109/HPCA.2019.00063
Arunkumar A, Bolotin E, Nellans D, Wu C-J. Understanding the future of energy efficiency in multi-module GPUs. In Proceedings - 25th IEEE International Symposium on High Performance Computer Architecture, HPCA 2019. Institute of Electrical and Electronics Engineers Inc. 2019. p. 519-532. 8675192. (Proceedings - 25th IEEE International Symposium on High Performance Computer Architecture, HPCA 2019). https://doi.org/10.1109/HPCA.2019.00063
Arunkumar, Akhil ; Bolotin, Evgeny ; Nellans, David ; Wu, Carole-Jean. / Understanding the future of energy efficiency in multi-module GPUs. Proceedings - 25th IEEE International Symposium on High Performance Computer Architecture, HPCA 2019. Institute of Electrical and Electronics Engineers Inc., 2019. pp. 519-532 (Proceedings - 25th IEEE International Symposium on High Performance Computer Architecture, HPCA 2019).
@inproceedings{171520c0d42c4d6d8cc5b99c2f144e7f,
title = "Understanding the future of energy efficiency in multi-module GPUs",
abstract = "As Moore's law slows down, GPUs must pivot towards multi-module designs to continue scaling performance at historical rates. Prior work on multi-module GPUs has focused on performance, while largely ignoring the issue of energy efficiency. In this work, we propose a new metric for GPU efficiency called EDP Scaling Efficiency that quantifies the effects of both strong performance scaling and overall energy efficiency in these designs. To enable this analysis, we develop a novel top-down GPU energy estimation framework that is accurate within 10{\%} of a recent GPU design. Being decoupled from granular GPU microarchitectural details, the framework is appropriate for energy efficiency studies in future GPUs. Using this model in conjunction with performance simulation, we show that the dominating factor influencing the energy efficiency of GPUs over the next decade is GPUmodule (GPM) idle time. Furthermore, neither inter-module interconnect energy, nor GPM microarchitectural design is expected to play a key role in this regard. We demonstrate that multi-module GPUs are on a trajectory to become 2× less energy efficient than current monolithic designs; a significant issue for data centers which are already energy constrained. Finally, we show that architects must be willing to spend more (not less) energy to enable higher bandwidth inter-GPM connections, because counter-intuitively, this additional energy expenditure can reduce total GPU energy consumption by as much as 45{\%}, providing a path to energy efficient strong scaling in the future.",
keywords = "Energy Efficiency, Energy Model, GPU, Moore's Law, Multi Chip Module, NUMA",
author = "Akhil Arunkumar and Evgeny Bolotin and David Nellans and Carole-Jean Wu",
year = "2019",
month = "3",
day = "26",
doi = "10.1109/HPCA.2019.00063",
language = "English (US)",
series = "Proceedings - 25th IEEE International Symposium on High Performance Computer Architecture, HPCA 2019",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
pages = "519--532",
booktitle = "Proceedings - 25th IEEE International Symposium on High Performance Computer Architecture, HPCA 2019",

}

TY - GEN

T1 - Understanding the future of energy efficiency in multi-module GPUs

AU - Arunkumar, Akhil

AU - Bolotin, Evgeny

AU - Nellans, David

AU - Wu, Carole-Jean

PY - 2019/3/26

Y1 - 2019/3/26

N2 - As Moore's law slows down, GPUs must pivot towards multi-module designs to continue scaling performance at historical rates. Prior work on multi-module GPUs has focused on performance, while largely ignoring the issue of energy efficiency. In this work, we propose a new metric for GPU efficiency called EDP Scaling Efficiency that quantifies the effects of both strong performance scaling and overall energy efficiency in these designs. To enable this analysis, we develop a novel top-down GPU energy estimation framework that is accurate within 10% of a recent GPU design. Being decoupled from granular GPU microarchitectural details, the framework is appropriate for energy efficiency studies in future GPUs. Using this model in conjunction with performance simulation, we show that the dominating factor influencing the energy efficiency of GPUs over the next decade is GPUmodule (GPM) idle time. Furthermore, neither inter-module interconnect energy, nor GPM microarchitectural design is expected to play a key role in this regard. We demonstrate that multi-module GPUs are on a trajectory to become 2× less energy efficient than current monolithic designs; a significant issue for data centers which are already energy constrained. Finally, we show that architects must be willing to spend more (not less) energy to enable higher bandwidth inter-GPM connections, because counter-intuitively, this additional energy expenditure can reduce total GPU energy consumption by as much as 45%, providing a path to energy efficient strong scaling in the future.

AB - As Moore's law slows down, GPUs must pivot towards multi-module designs to continue scaling performance at historical rates. Prior work on multi-module GPUs has focused on performance, while largely ignoring the issue of energy efficiency. In this work, we propose a new metric for GPU efficiency called EDP Scaling Efficiency that quantifies the effects of both strong performance scaling and overall energy efficiency in these designs. To enable this analysis, we develop a novel top-down GPU energy estimation framework that is accurate within 10% of a recent GPU design. Being decoupled from granular GPU microarchitectural details, the framework is appropriate for energy efficiency studies in future GPUs. Using this model in conjunction with performance simulation, we show that the dominating factor influencing the energy efficiency of GPUs over the next decade is GPUmodule (GPM) idle time. Furthermore, neither inter-module interconnect energy, nor GPM microarchitectural design is expected to play a key role in this regard. We demonstrate that multi-module GPUs are on a trajectory to become 2× less energy efficient than current monolithic designs; a significant issue for data centers which are already energy constrained. Finally, we show that architects must be willing to spend more (not less) energy to enable higher bandwidth inter-GPM connections, because counter-intuitively, this additional energy expenditure can reduce total GPU energy consumption by as much as 45%, providing a path to energy efficient strong scaling in the future.

KW - Energy Efficiency

KW - Energy Model

KW - GPU

KW - Moore's Law

KW - Multi Chip Module

KW - NUMA

UR - http://www.scopus.com/inward/record.url?scp=85064192758&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85064192758&partnerID=8YFLogxK

U2 - 10.1109/HPCA.2019.00063

DO - 10.1109/HPCA.2019.00063

M3 - Conference contribution

AN - SCOPUS:85064192758

T3 - Proceedings - 25th IEEE International Symposium on High Performance Computer Architecture, HPCA 2019

SP - 519

EP - 532

BT - Proceedings - 25th IEEE International Symposium on High Performance Computer Architecture, HPCA 2019

PB - Institute of Electrical and Electronics Engineers Inc.

ER -