Architecture of a distributed storage that combines file system, memory and computation in a single layer

Jia Zou, Arun Iyengar, Chris Jermaine

Research output: Contribution to journalArticlepeer-review

1 Scopus citations

Abstract

Storage and memory systems for modern data analytics are heavily layered, managing shared persistent data, cached data, and non-shared execution data in separate systems such as a distributed file system like HDFS, an in-memory file system like Alluxio, and a computation framework like Spark. Such layering introduces significant performance and management costs. In this paper, we propose a single system called Pangea that can manage all data—both intermediate and long-lived data, and their buffer/caching, page replacement, data placement optimization, and failure recovery—all in one monolithic distributed storage system, without any layering. We present a detailed performance evaluation of Pangea and show that its performance compares favorably with several widely used layered systems such as Spark.

Original languageEnglish (US)
Pages (from-to)1049-1073
Number of pages25
JournalVLDB Journal
Volume29
Issue number5
DOIs
StatePublished - Sep 1 2020

Keywords

  • Big Data analytics
  • Distributed system
  • Heterogeneous replication
  • Monolithic storage

ASJC Scopus subject areas

  • Information Systems
  • Hardware and Architecture

Fingerprint

Dive into the research topics of 'Architecture of a distributed storage that combines file system, memory and computation in a single layer'. Together they form a unique fingerprint.

Cite this