In a decade feature sizes of integrated circuits are expected to shrink from present day 45nm to 12nm, increasing soft error rates from once-per-year to once-per-day . The ITRS report  recognizes reliability as one of the most important challenges for the next decade, and points out that soft errors are the primary threat. Soft errors are transient faults, caused mostly by cosmic radiations and can lead to incorrect results or total system failure. Soft errors have been a concern for space-based applications for a long time, but soon they will start affecting terrestrial systems. The impact of soft errors on terrestrial systems can be both dire and sweeping, with targets including financial systems, health-care databases, power-grid, and communication infrastructure. Although much work has been done towards protecting computing systems from soft errors, the need for even more power, performance and area-efficient schemes for protection against soft errors is undeniable. This proposal will build upon existing hardware and microarchitectural schemes to provide even more power-efficient protection from soft errors and we will do this by better application analysis. The underlying theme of this proposal is that more power-efficient soft error protection can be achieved by better analyzing applications and changing the execution to make the best use of existing protection mechanisms. To illustrate how our work builds upon traditional works, consider the example of protecting the register file in the processor, which is a well known important site for soft errors. Traditional fault tolerant schemes suggest protecting all the registers by Error Correction Codes (ECC). For power-conscious yet sufficient protection, soft error researchers have proposed to protect only some of the registers. We take the next step, and propose to better use the few protected registers for more effective and power-efficient protection of program variables. This can be done by mapping variables that have long lifetimes, but are rarely accessed to the protected registers. In addition to helping existing microarchitectural techniques, stand-alone compiler techniques for power-efficient protection can also be imagined. For instance, just spilling a value that will be used after a long time into memory, and bringing it back just before it is needed will protect it.
|Effective start/end date||1/1/11 → 12/31/17|
- National Science Foundation (NSF): $401,654.00