Exploring DRAM organizations for energy-efficient and resilient exascale memories

Bharan Giridhar, Michael Cieslak, Deepankar Duggal, Ronald Dreslinski, Hsing Min Chen, Robert Patti, Betina Hold, Chaitali Chakrabarti, Trevor Mudge, David Blaauw

Research output: Chapter in Book/Report/Conference proceedingConference contribution

32 Citations (Scopus)

Abstract

The power target for exascale supercomputing is 20MW, with about 30% budgeted for the memory subsystem. Commodity DRAMs will not satisfy this requirement. Additionally, the large number of memory chips (>10M) required will result in crippling failure rates. Although specialized DRAM memories have been reorganized to reduce power through 3D-stacking or row buffer resizing, their implications on fault tolerance have not been considered. We show that addressing reliability and energy is a co-optimization problem involving tradeofis between error correction cost, access energy and refresh power|reducing the physical page size to decrease access energy increases the energy/area over-head of error resilience. Additionally, power can be reduced by optimizing bitline lengths. The proposed 3D-stacked memory uses a page size of 4kb and consumes 5.1pJ/bit based on simulations with NEK5000 benchmarks. Scaling to 100PB, the memory consumes 4.7MW at 100PB/s which, while well within the total power budget (20MW), is also error-resilient.

Original languageEnglish (US)
Title of host publicationInternational Conference for High Performance Computing, Networking, Storage and Analysis, SC
PublisherIEEE Computer Society
ISBN (Print)9781450323789
DOIs
StatePublished - 2013
Event2013 International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2013 - Denver, CO, United States
Duration: Nov 17 2013Nov 22 2013

Other

Other2013 International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2013
CountryUnited States
CityDenver, CO
Period11/17/1311/22/13

Fingerprint

Dynamic random access storage
Data storage equipment
Error correction
Fault tolerance
Costs

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Computer Science Applications
  • Hardware and Architecture
  • Software

Cite this

Giridhar, B., Cieslak, M., Duggal, D., Dreslinski, R., Chen, H. M., Patti, R., ... Blaauw, D. (2013). Exploring DRAM organizations for energy-efficient and resilient exascale memories. In International Conference for High Performance Computing, Networking, Storage and Analysis, SC [23] IEEE Computer Society. https://doi.org/10.1145/2503210.2503215

Exploring DRAM organizations for energy-efficient and resilient exascale memories. / Giridhar, Bharan; Cieslak, Michael; Duggal, Deepankar; Dreslinski, Ronald; Chen, Hsing Min; Patti, Robert; Hold, Betina; Chakrabarti, Chaitali; Mudge, Trevor; Blaauw, David.

International Conference for High Performance Computing, Networking, Storage and Analysis, SC. IEEE Computer Society, 2013. 23.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Giridhar, B, Cieslak, M, Duggal, D, Dreslinski, R, Chen, HM, Patti, R, Hold, B, Chakrabarti, C, Mudge, T & Blaauw, D 2013, Exploring DRAM organizations for energy-efficient and resilient exascale memories. in International Conference for High Performance Computing, Networking, Storage and Analysis, SC., 23, IEEE Computer Society, 2013 International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2013, Denver, CO, United States, 11/17/13. https://doi.org/10.1145/2503210.2503215
Giridhar B, Cieslak M, Duggal D, Dreslinski R, Chen HM, Patti R et al. Exploring DRAM organizations for energy-efficient and resilient exascale memories. In International Conference for High Performance Computing, Networking, Storage and Analysis, SC. IEEE Computer Society. 2013. 23 https://doi.org/10.1145/2503210.2503215
Giridhar, Bharan ; Cieslak, Michael ; Duggal, Deepankar ; Dreslinski, Ronald ; Chen, Hsing Min ; Patti, Robert ; Hold, Betina ; Chakrabarti, Chaitali ; Mudge, Trevor ; Blaauw, David. / Exploring DRAM organizations for energy-efficient and resilient exascale memories. International Conference for High Performance Computing, Networking, Storage and Analysis, SC. IEEE Computer Society, 2013.
@inproceedings{52b09c40bd0e4dbbb3f6d3d9eed29b49,
title = "Exploring DRAM organizations for energy-efficient and resilient exascale memories",
abstract = "The power target for exascale supercomputing is 20MW, with about 30{\%} budgeted for the memory subsystem. Commodity DRAMs will not satisfy this requirement. Additionally, the large number of memory chips (>10M) required will result in crippling failure rates. Although specialized DRAM memories have been reorganized to reduce power through 3D-stacking or row buffer resizing, their implications on fault tolerance have not been considered. We show that addressing reliability and energy is a co-optimization problem involving tradeofis between error correction cost, access energy and refresh power|reducing the physical page size to decrease access energy increases the energy/area over-head of error resilience. Additionally, power can be reduced by optimizing bitline lengths. The proposed 3D-stacked memory uses a page size of 4kb and consumes 5.1pJ/bit based on simulations with NEK5000 benchmarks. Scaling to 100PB, the memory consumes 4.7MW at 100PB/s which, while well within the total power budget (20MW), is also error-resilient.",
author = "Bharan Giridhar and Michael Cieslak and Deepankar Duggal and Ronald Dreslinski and Chen, {Hsing Min} and Robert Patti and Betina Hold and Chaitali Chakrabarti and Trevor Mudge and David Blaauw",
year = "2013",
doi = "10.1145/2503210.2503215",
language = "English (US)",
isbn = "9781450323789",
booktitle = "International Conference for High Performance Computing, Networking, Storage and Analysis, SC",
publisher = "IEEE Computer Society",

}

TY - GEN

T1 - Exploring DRAM organizations for energy-efficient and resilient exascale memories

AU - Giridhar, Bharan

AU - Cieslak, Michael

AU - Duggal, Deepankar

AU - Dreslinski, Ronald

AU - Chen, Hsing Min

AU - Patti, Robert

AU - Hold, Betina

AU - Chakrabarti, Chaitali

AU - Mudge, Trevor

AU - Blaauw, David

PY - 2013

Y1 - 2013

N2 - The power target for exascale supercomputing is 20MW, with about 30% budgeted for the memory subsystem. Commodity DRAMs will not satisfy this requirement. Additionally, the large number of memory chips (>10M) required will result in crippling failure rates. Although specialized DRAM memories have been reorganized to reduce power through 3D-stacking or row buffer resizing, their implications on fault tolerance have not been considered. We show that addressing reliability and energy is a co-optimization problem involving tradeofis between error correction cost, access energy and refresh power|reducing the physical page size to decrease access energy increases the energy/area over-head of error resilience. Additionally, power can be reduced by optimizing bitline lengths. The proposed 3D-stacked memory uses a page size of 4kb and consumes 5.1pJ/bit based on simulations with NEK5000 benchmarks. Scaling to 100PB, the memory consumes 4.7MW at 100PB/s which, while well within the total power budget (20MW), is also error-resilient.

AB - The power target for exascale supercomputing is 20MW, with about 30% budgeted for the memory subsystem. Commodity DRAMs will not satisfy this requirement. Additionally, the large number of memory chips (>10M) required will result in crippling failure rates. Although specialized DRAM memories have been reorganized to reduce power through 3D-stacking or row buffer resizing, their implications on fault tolerance have not been considered. We show that addressing reliability and energy is a co-optimization problem involving tradeofis between error correction cost, access energy and refresh power|reducing the physical page size to decrease access energy increases the energy/area over-head of error resilience. Additionally, power can be reduced by optimizing bitline lengths. The proposed 3D-stacked memory uses a page size of 4kb and consumes 5.1pJ/bit based on simulations with NEK5000 benchmarks. Scaling to 100PB, the memory consumes 4.7MW at 100PB/s which, while well within the total power budget (20MW), is also error-resilient.

UR - http://www.scopus.com/inward/record.url?scp=84899667235&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84899667235&partnerID=8YFLogxK

U2 - 10.1145/2503210.2503215

DO - 10.1145/2503210.2503215

M3 - Conference contribution

AN - SCOPUS:84899667235

SN - 9781450323789

BT - International Conference for High Performance Computing, Networking, Storage and Analysis, SC

PB - IEEE Computer Society

ER -