TY - GEN
T1 - E-ECC
T2 - 1st International Symposium on Memory Systems, MEMSYS 2015
AU - Chen, Hsing Min
AU - Arunkumar, Akhil
AU - Wu, Carole-Jean
AU - Mudge, Trevor
AU - Chakrabarti, Chaitali
PY - 2015/10/5
Y1 - 2015/10/5
N2 - Most server-grade memory systems provide Chipkill-Correct error protection at the expense of power and/or performance overhead. In this paper we present low overhead schemes for improving the reliability of commodity DRAM systems with better power and IPC performance compared to Chipkill- Correct solutions. Specifically, we propose two erasure and error correction (E-ECC) schemes for x8 memory systems that have 12.5% storage overhead and do not require any change in the existing memory architecture. Both schemes have superior error performance due to the use of a strong ECC code, namely, RS(36,32) over GF(28). Scheme 1 activates 18 chips per access and has stronger reliability com- pared to Chipkill-Correct solutions. If the location of the faulty chip is known, Scheme 1 can correct an additional random error in a second chip. Scheme 2 trades of reliabil- ity for higher energy efficiency by activating only 9 chips per access. It cannot correct random errors due to a chip failure but can detect them with 99.9986% probability, and once a chip is marked faulty due to persistent errors, it can correct all errors due to that chip. Synthesis results in 28nm node show that the RS (36,32) code results in a very low decod- ing latency that can be well-hidden in commodity memory systems and, therefore, it has minimal effect on the DRAM access latency. Evaluations based on SPEC CPU 2006 se- quential and multi-programmed workloads show that com- pared to Chipkill-Correct, the proposed Schemes 1 and 2 improve IPC by an average of 3.2% (maximum of 13.8%) and 4.8% (maximum of 31.8%) and reduce the power con- sumption by an average of 16.2% (maximum of 25%) and 26.8% (maximum of 36%), respectively.
AB - Most server-grade memory systems provide Chipkill-Correct error protection at the expense of power and/or performance overhead. In this paper we present low overhead schemes for improving the reliability of commodity DRAM systems with better power and IPC performance compared to Chipkill- Correct solutions. Specifically, we propose two erasure and error correction (E-ECC) schemes for x8 memory systems that have 12.5% storage overhead and do not require any change in the existing memory architecture. Both schemes have superior error performance due to the use of a strong ECC code, namely, RS(36,32) over GF(28). Scheme 1 activates 18 chips per access and has stronger reliability com- pared to Chipkill-Correct solutions. If the location of the faulty chip is known, Scheme 1 can correct an additional random error in a second chip. Scheme 2 trades of reliabil- ity for higher energy efficiency by activating only 9 chips per access. It cannot correct random errors due to a chip failure but can detect them with 99.9986% probability, and once a chip is marked faulty due to persistent errors, it can correct all errors due to that chip. Synthesis results in 28nm node show that the RS (36,32) code results in a very low decod- ing latency that can be well-hidden in commodity memory systems and, therefore, it has minimal effect on the DRAM access latency. Evaluations based on SPEC CPU 2006 se- quential and multi-programmed workloads show that com- pared to Chipkill-Correct, the proposed Schemes 1 and 2 improve IPC by an average of 3.2% (maximum of 13.8%) and 4.8% (maximum of 31.8%) and reduce the power con- sumption by an average of 16.2% (maximum of 25%) and 26.8% (maximum of 36%), respectively.
KW - Chipkill- correct
KW - Dramerrors
KW - Drammemory system
KW - Erasure and error correction
KW - Reliability
UR - http://www.scopus.com/inward/record.url?scp=84959325256&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84959325256&partnerID=8YFLogxK
U2 - 10.1145/2818950.2818961
DO - 10.1145/2818950.2818961
M3 - Conference contribution
AN - SCOPUS:84959325256
T3 - ACM International Conference Proceeding Series
SP - 60
EP - 70
BT - MEMSYS 2015 - Proceedings of the 1st International Symposium on Memory Systems
PB - Association for Computing Machinery
Y2 - 14 August 2015 through 15 August 2015
ER -