Abstract

Most server-grade memory systems provide Chipkill-Correct error protection at the expense of power and/or performance overhead. In this paper we present low overhead schemes for improving the reliability of commodity DRAM systems with better power and IPC performance compared to Chipkill- Correct solutions. Specifically, we propose two erasure and error correction (E-ECC) schemes for x8 memory systems that have 12.5% storage overhead and do not require any change in the existing memory architecture. Both schemes have superior error performance due to the use of a strong ECC code, namely, RS(36,32) over GF(28). Scheme 1 activates 18 chips per access and has stronger reliability com- pared to Chipkill-Correct solutions. If the location of the faulty chip is known, Scheme 1 can correct an additional random error in a second chip. Scheme 2 trades of reliabil- ity for higher energy efficiency by activating only 9 chips per access. It cannot correct random errors due to a chip failure but can detect them with 99.9986% probability, and once a chip is marked faulty due to persistent errors, it can correct all errors due to that chip. Synthesis results in 28nm node show that the RS (36,32) code results in a very low decod- ing latency that can be well-hidden in commodity memory systems and, therefore, it has minimal effect on the DRAM access latency. Evaluations based on SPEC CPU 2006 se- quential and multi-programmed workloads show that com- pared to Chipkill-Correct, the proposed Schemes 1 and 2 improve IPC by an average of 3.2% (maximum of 13.8%) and 4.8% (maximum of 31.8%) and reduce the power con- sumption by an average of 16.2% (maximum of 25%) and 26.8% (maximum of 36%), respectively.

Original languageEnglish (US)
Title of host publicationMEMSYS 2015 - Proceedings of the 1st International Symposium on Memory Systems
PublisherAssociation for Computing Machinery
Pages60-70
Number of pages11
ISBN (Electronic)9781450336048
DOIs
StatePublished - Oct 5 2015
Event1st International Symposium on Memory Systems, MEMSYS 2015 - Washington, United States
Duration: Aug 14 2015Aug 15 2015

Publication series

NameACM International Conference Proceeding Series
Volume05-08-October-2015

Other

Other1st International Symposium on Memory Systems, MEMSYS 2015
CountryUnited States
CityWashington
Period8/14/158/15/15

Keywords

  • Chipkill- correct
  • Dramerrors
  • Drammemory system
  • Erasure and error correction
  • Reliability

ASJC Scopus subject areas

  • Software
  • Human-Computer Interaction
  • Computer Vision and Pattern Recognition
  • Computer Networks and Communications

Fingerprint Dive into the research topics of 'E-ECC: Low power erasure and error correction schemes for increasing reliability of commodity DRAM systems'. Together they form a unique fingerprint.

  • Cite this

    Chen, H. M., Arunkumar, A., Wu, C-J., Mudge, T., & Chakrabarti, C. (2015). E-ECC: Low power erasure and error correction schemes for increasing reliability of commodity DRAM systems. In MEMSYS 2015 - Proceedings of the 1st International Symposium on Memory Systems (pp. 60-70). (ACM International Conference Proceeding Series; Vol. 05-08-October-2015). Association for Computing Machinery. https://doi.org/10.1145/2818950.2818961