Abstract

Most server-grade systems provide Chipkill-Correct error protection at the expense of power and performance. In this paper we present a low overhead solution to improving the reliability of commodity DRAM systems with no change in the existing memory architecture. Specifically, we propose five erasure and error correction (E-ECC) schemes that provide at least Chipkill-Correct protection for x4 (Schemes 1, 2 and 3), x8 (Scheme 4) and x16 (Scheme 5) DRAM systems. All schemes have superior error correction performance due to the use of strong symbol-based codes. Synthesis results in 28 nm node show that the decoding latency of these codes is negligible compared to the DRAM access latency. In addition, we make use of erasure codes to extend the lifetime of the DRAM systems. Specifically, once a chip is marked faulty due to persistent errors, all E-ECC schemes correct erasures due to that faulty chip and also correct an additional random error in a second chip. Evaluation with SPEC2006 workloads show that compared to x4 Chipkill-Correct schemes, Scheme 5 has the highest IPC improvement (mean of 7 percent) and Scheme 4 has the largest power reduction (mean of 18 percent) and the largest increase in energy efficiency (mean of 25 percent).

Original languageEnglish (US)
Article number7447716
Pages (from-to)3766-3779
Number of pages14
JournalIEEE Transactions on Computers
Volume65
Issue number12
DOIs
StatePublished - Dec 1 2016

Keywords

  • chipkill-correct
  • DRAM Memory system
  • erasure and error correction
  • error control coding (ECC)
  • reliability

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Software
  • Hardware and Architecture
  • Computational Theory and Mathematics

Fingerprint Dive into the research topics of 'Using Low Cost Erasure and Error Correction Schemes to Improve Reliability of Commodity DRAM Systems'. Together they form a unique fingerprint.

  • Cite this