Failed disk recovery in double erasure RAID arrays

Kaushik Srinivasan, Charles Colbourn

Research output: Contribution to journalArticlepeer-review

3 Scopus citations

Abstract

Reliability is a major concern in the design of large disk arrays. In this paper, we examine the effect of encountering more failures than that for which the RAID array was initially designed. Erasure codes are incorporated to enable system recovery from a specified number of disk erasures, and strive beyond that threshold to recover the system as frequently, and as thoroughly, as is possible. Erasure codes for tolerating two disk failures are examined. For these double erasure codes, we establish a correspondence between system operation and acyclicity of its graph model. For the most compact double erasure code, the full 2-code, this underlies an efficient algorithm for the computation of system operation probability (all disks operating or recoverable). When the system has failed, some disks are nonetheless recoverable. We extend the graph model to determine the probability that d disks have failed, a of which are recoverable by solving one linear equation, b of which are further recoverable by solving systems of linear equations, and d - a - b of which cannot be recovered. These statistics are efficiently calculated for the full 2-code by developing a three variable ordinary generating function whose coefficients give the specified values. Finally, examples are given to illustrate the probability that an individual disk can be recovered, even when the system is in a failed state.

Original languageEnglish (US)
Pages (from-to)115-128
Number of pages14
JournalJournal of Discrete Algorithms
Volume5
Issue number1
DOIs
StatePublished - Mar 2007

Keywords

  • Disk architecture
  • Erasure resilient code
  • Error-correcting code
  • Generating function
  • Parity check matrix

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Discrete Mathematics and Combinatorics
  • Computational Theory and Mathematics

Fingerprint

Dive into the research topics of 'Failed disk recovery in double erasure RAID arrays'. Together they form a unique fingerprint.

Cite this