DA placement

A dual-aware data placement in a deduplicated and erasure-coded storage system

Mingzhu Deng, Ming Zhao, Fang Liu, Zhiguang Chen, Nong Xiao

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Simultaneously incorporating deduplication as well as erasure coding is preferred for modern storage systems for the enhanced storage efficiency and economical data reliability. However, simple incorporation suffers from the “read imbalance problem”, in which parallel data accesses are curbed by throttled storage nodes. This problem is due to the uneven data placement in the system, which is unaware of the employment of both deduplication and erasure coding, each of whom alters the order of data if unattended. This paper proposes a systematic design and implementation of a Dual-Aware(DA) placement in a combined storage system to achieve both deduplication-awareness and erasure-coding-awareness at the same time. DA not only records the node number of each unique data to allow for quick references with ease, but also dynamically tracks used nodes for each writes request. In this way, deduplication awareness is formed to skip inconvenient placement locations. Besides, DA serializes the placement of parity blocks with a stripe and across stripes. Such realization of erasure coding awareness ensures the separation of data and parity, as well as maintains data sequentiality at bordering stripes. Additionally, DA manages to extend with further load-balancing through an innovative use of the deduplication level, which intuitively predicts future accesses of a piece of data. In short, DA manages to boost system performance with little memory or computation cost. Extensive experiments using both real-world traces and synthesized workloads, prove DA achieves a better read performance. For example, DA respectively leads an average latency margin of 30.86% and 29.63%, over the baseline rolling placement(BA) and random placement(RA) under CAFTL traces over a default cluster of 12 nodes with RS(8,4).

Original languageEnglish (US)
Title of host publicationAlgorithms and Architectures for Parallel Processing - 18th International Conference, ICA3PP 2018, Proceedings
EditorsJaideep Vaidya, Jin Li
PublisherSpringer Verlag
Pages358-377
Number of pages20
ISBN (Print)9783030050504
DOIs
StatePublished - Jan 1 2018
Event18th International Conference on Algorithms and Architectures for Parallel Processing, ICA3PP 2018 - Guangzhou, China
Duration: Nov 15 2018Nov 17 2018

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume11334 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference18th International Conference on Algorithms and Architectures for Parallel Processing, ICA3PP 2018
CountryChina
CityGuangzhou
Period11/15/1811/17/18

Fingerprint

Data Placement
Storage System
Placement
Resource allocation
Data storage equipment
Coding
Costs
Experiments
Vertex of a graph
Parity
Trace
Load Balancing
Margin
Workload
Latency
System Performance
Baseline
Predict
Awareness

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Computer Science(all)

Cite this

Deng, M., Zhao, M., Liu, F., Chen, Z., & Xiao, N. (2018). DA placement: A dual-aware data placement in a deduplicated and erasure-coded storage system. In J. Vaidya, & J. Li (Eds.), Algorithms and Architectures for Parallel Processing - 18th International Conference, ICA3PP 2018, Proceedings (pp. 358-377). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 11334 LNCS). Springer Verlag. https://doi.org/10.1007/978-3-030-05051-1_25

DA placement : A dual-aware data placement in a deduplicated and erasure-coded storage system. / Deng, Mingzhu; Zhao, Ming; Liu, Fang; Chen, Zhiguang; Xiao, Nong.

Algorithms and Architectures for Parallel Processing - 18th International Conference, ICA3PP 2018, Proceedings. ed. / Jaideep Vaidya; Jin Li. Springer Verlag, 2018. p. 358-377 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 11334 LNCS).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Deng, M, Zhao, M, Liu, F, Chen, Z & Xiao, N 2018, DA placement: A dual-aware data placement in a deduplicated and erasure-coded storage system. in J Vaidya & J Li (eds), Algorithms and Architectures for Parallel Processing - 18th International Conference, ICA3PP 2018, Proceedings. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 11334 LNCS, Springer Verlag, pp. 358-377, 18th International Conference on Algorithms and Architectures for Parallel Processing, ICA3PP 2018, Guangzhou, China, 11/15/18. https://doi.org/10.1007/978-3-030-05051-1_25
Deng M, Zhao M, Liu F, Chen Z, Xiao N. DA placement: A dual-aware data placement in a deduplicated and erasure-coded storage system. In Vaidya J, Li J, editors, Algorithms and Architectures for Parallel Processing - 18th International Conference, ICA3PP 2018, Proceedings. Springer Verlag. 2018. p. 358-377. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). https://doi.org/10.1007/978-3-030-05051-1_25
Deng, Mingzhu ; Zhao, Ming ; Liu, Fang ; Chen, Zhiguang ; Xiao, Nong. / DA placement : A dual-aware data placement in a deduplicated and erasure-coded storage system. Algorithms and Architectures for Parallel Processing - 18th International Conference, ICA3PP 2018, Proceedings. editor / Jaideep Vaidya ; Jin Li. Springer Verlag, 2018. pp. 358-377 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
@inproceedings{4871125ffed14d98ae4c033045804436,
title = "DA placement: A dual-aware data placement in a deduplicated and erasure-coded storage system",
abstract = "Simultaneously incorporating deduplication as well as erasure coding is preferred for modern storage systems for the enhanced storage efficiency and economical data reliability. However, simple incorporation suffers from the “read imbalance problem”, in which parallel data accesses are curbed by throttled storage nodes. This problem is due to the uneven data placement in the system, which is unaware of the employment of both deduplication and erasure coding, each of whom alters the order of data if unattended. This paper proposes a systematic design and implementation of a Dual-Aware(DA) placement in a combined storage system to achieve both deduplication-awareness and erasure-coding-awareness at the same time. DA not only records the node number of each unique data to allow for quick references with ease, but also dynamically tracks used nodes for each writes request. In this way, deduplication awareness is formed to skip inconvenient placement locations. Besides, DA serializes the placement of parity blocks with a stripe and across stripes. Such realization of erasure coding awareness ensures the separation of data and parity, as well as maintains data sequentiality at bordering stripes. Additionally, DA manages to extend with further load-balancing through an innovative use of the deduplication level, which intuitively predicts future accesses of a piece of data. In short, DA manages to boost system performance with little memory or computation cost. Extensive experiments using both real-world traces and synthesized workloads, prove DA achieves a better read performance. For example, DA respectively leads an average latency margin of 30.86{\%} and 29.63{\%}, over the baseline rolling placement(BA) and random placement(RA) under CAFTL traces over a default cluster of 12 nodes with RS(8,4).",
author = "Mingzhu Deng and Ming Zhao and Fang Liu and Zhiguang Chen and Nong Xiao",
year = "2018",
month = "1",
day = "1",
doi = "10.1007/978-3-030-05051-1_25",
language = "English (US)",
isbn = "9783030050504",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
publisher = "Springer Verlag",
pages = "358--377",
editor = "Jaideep Vaidya and Jin Li",
booktitle = "Algorithms and Architectures for Parallel Processing - 18th International Conference, ICA3PP 2018, Proceedings",

}

TY - GEN

T1 - DA placement

T2 - A dual-aware data placement in a deduplicated and erasure-coded storage system

AU - Deng, Mingzhu

AU - Zhao, Ming

AU - Liu, Fang

AU - Chen, Zhiguang

AU - Xiao, Nong

PY - 2018/1/1

Y1 - 2018/1/1

N2 - Simultaneously incorporating deduplication as well as erasure coding is preferred for modern storage systems for the enhanced storage efficiency and economical data reliability. However, simple incorporation suffers from the “read imbalance problem”, in which parallel data accesses are curbed by throttled storage nodes. This problem is due to the uneven data placement in the system, which is unaware of the employment of both deduplication and erasure coding, each of whom alters the order of data if unattended. This paper proposes a systematic design and implementation of a Dual-Aware(DA) placement in a combined storage system to achieve both deduplication-awareness and erasure-coding-awareness at the same time. DA not only records the node number of each unique data to allow for quick references with ease, but also dynamically tracks used nodes for each writes request. In this way, deduplication awareness is formed to skip inconvenient placement locations. Besides, DA serializes the placement of parity blocks with a stripe and across stripes. Such realization of erasure coding awareness ensures the separation of data and parity, as well as maintains data sequentiality at bordering stripes. Additionally, DA manages to extend with further load-balancing through an innovative use of the deduplication level, which intuitively predicts future accesses of a piece of data. In short, DA manages to boost system performance with little memory or computation cost. Extensive experiments using both real-world traces and synthesized workloads, prove DA achieves a better read performance. For example, DA respectively leads an average latency margin of 30.86% and 29.63%, over the baseline rolling placement(BA) and random placement(RA) under CAFTL traces over a default cluster of 12 nodes with RS(8,4).

AB - Simultaneously incorporating deduplication as well as erasure coding is preferred for modern storage systems for the enhanced storage efficiency and economical data reliability. However, simple incorporation suffers from the “read imbalance problem”, in which parallel data accesses are curbed by throttled storage nodes. This problem is due to the uneven data placement in the system, which is unaware of the employment of both deduplication and erasure coding, each of whom alters the order of data if unattended. This paper proposes a systematic design and implementation of a Dual-Aware(DA) placement in a combined storage system to achieve both deduplication-awareness and erasure-coding-awareness at the same time. DA not only records the node number of each unique data to allow for quick references with ease, but also dynamically tracks used nodes for each writes request. In this way, deduplication awareness is formed to skip inconvenient placement locations. Besides, DA serializes the placement of parity blocks with a stripe and across stripes. Such realization of erasure coding awareness ensures the separation of data and parity, as well as maintains data sequentiality at bordering stripes. Additionally, DA manages to extend with further load-balancing through an innovative use of the deduplication level, which intuitively predicts future accesses of a piece of data. In short, DA manages to boost system performance with little memory or computation cost. Extensive experiments using both real-world traces and synthesized workloads, prove DA achieves a better read performance. For example, DA respectively leads an average latency margin of 30.86% and 29.63%, over the baseline rolling placement(BA) and random placement(RA) under CAFTL traces over a default cluster of 12 nodes with RS(8,4).

UR - http://www.scopus.com/inward/record.url?scp=85058627477&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85058627477&partnerID=8YFLogxK

U2 - 10.1007/978-3-030-05051-1_25

DO - 10.1007/978-3-030-05051-1_25

M3 - Conference contribution

SN - 9783030050504

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 358

EP - 377

BT - Algorithms and Architectures for Parallel Processing - 18th International Conference, ICA3PP 2018, Proceedings

A2 - Vaidya, Jaideep

A2 - Li, Jin

PB - Springer Verlag

ER -