Region-based fault-tolerant distributed file storage system design in networks

Arunabha Sen, Anisha Mazumder, Sujogya Banerjee, Arun Das, Chenyang Zhou, Shahrzad Shirazipourazad

Research output: Contribution to journalArticlepeer-review

6 Scopus citations

Abstract

Distributed storage of data files in different nodes of a network enhances its fault tolerance capability by offering protection against node and link failures. Reliability is often achieved through redundancy in one of the following two ways: (i) storage of multiple copies of the entire file at different locations (nodes) or (ii) storage of file segments (not entire files) at different node locations. In the (N, K) file distribution scheme, N file segments from a file F are created in such a way that it is possible to reconstruct the entire file, just by accessing any K ≤ N segments. For the reconstruction scheme to work, it is essential that the K segments of the file are stored in nodes that are connected in the network. However, in the event of node/link failures, the network might become disconnected (i.e., split into several connected components). We focus on node failures that are spatially correlated or region based. Such failures are often encountered in disaster situations or natural calamities where only the nodes in the disaster zone are affected. The first goal of this research is to design a least cost file storage scheme to ensure that no matter which region is destroyed; resulting in fragmentation of the network, a largest connected component of the residual network will have enough file segments with which to reconstruct the entire file. In case the least cost to ensure this objective is within the allocated budget, the storage design will be all region fault-tolerant (ARFT). In case the least cost exceeds the allocated budget, design of an ARFT file storage system design is impossible. The second goal of this research is to design file storage schemes that will be maximum region fault-tolerant within the allocated budget. The third goal of this research is to investigate the impact of the coding parameters N and K on storage requirements for ensuring all region or \textit{maximum region} fault-tolerant design. We provide maximum region fault-tolerant design. We provide approximation algorithms for the problems and evaluate their performance through simulation using two real networks and compare their results to the optimal solutions obtained using Integer Linear Program. The simulation results demonstrate that the approximation algorithms almost always produce near optimal results in a fraction of the time needed to find the optimal solution.

Original languageEnglish (US)
Pages (from-to)380-395
Number of pages16
JournalNetworks
Volume66
Issue number4
DOIs
StatePublished - Dec 1 2015

Keywords

  • (N, K) coding
  • budget constraints
  • connected components
  • distributed data storage
  • fault tolerance
  • region-based faults

ASJC Scopus subject areas

  • Information Systems
  • Computer Networks and Communications

Fingerprint

Dive into the research topics of 'Region-based fault-tolerant distributed file storage system design in networks'. Together they form a unique fingerprint.

Cite this