DA placement: A dual-aware data placement in a deduplicated and erasure-coded storage system

Mingzhu Deng; Ming Zhao; Fang Liu; Zhiguang Chen; Nong Xiao

doi:10.1007/978-3-030-05051-1_25

DA placement: A dual-aware data placement in a deduplicated and erasure-coded storage system

Mingzhu Deng, Ming Zhao, Fang Liu, Zhiguang Chen, Nong Xiao

Computing and Augmented Intelligence, School of (IAFSE-SCAI)

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Abstract

Simultaneously incorporating deduplication as well as erasure coding is preferred for modern storage systems for the enhanced storage efficiency and economical data reliability. However, simple incorporation suffers from the “read imbalance problem”, in which parallel data accesses are curbed by throttled storage nodes. This problem is due to the uneven data placement in the system, which is unaware of the employment of both deduplication and erasure coding, each of whom alters the order of data if unattended. This paper proposes a systematic design and implementation of a Dual-Aware(DA) placement in a combined storage system to achieve both deduplication-awareness and erasure-coding-awareness at the same time. DA not only records the node number of each unique data to allow for quick references with ease, but also dynamically tracks used nodes for each writes request. In this way, deduplication awareness is formed to skip inconvenient placement locations. Besides, DA serializes the placement of parity blocks with a stripe and across stripes. Such realization of erasure coding awareness ensures the separation of data and parity, as well as maintains data sequentiality at bordering stripes. Additionally, DA manages to extend with further load-balancing through an innovative use of the deduplication level, which intuitively predicts future accesses of a piece of data. In short, DA manages to boost system performance with little memory or computation cost. Extensive experiments using both real-world traces and synthesized workloads, prove DA achieves a better read performance. For example, DA respectively leads an average latency margin of 30.86% and 29.63%, over the baseline rolling placement(BA) and random placement(RA) under CAFTL traces over a default cluster of 12 nodes with RS(8,4).

Original language	English (US)
Title of host publication	Algorithms and Architectures for Parallel Processing - 18th International Conference, ICA3PP 2018, Proceedings
Editors	Jaideep Vaidya, Jin Li
Publisher	Springer Verlag
Pages	358-377
Number of pages	20
ISBN (Print)	9783030050504
DOIs	https://doi.org/10.1007/978-3-030-05051-1_25
State	Published - 2018
Event	18th International Conference on Algorithms and Architectures for Parallel Processing, ICA3PP 2018 - Guangzhou, China Duration: Nov 15 2018 → Nov 17 2018

Publication series

Name	Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume	11334 LNCS
ISSN (Print)	0302-9743
ISSN (Electronic)	1611-3349

Conference

Conference	18th International Conference on Algorithms and Architectures for Parallel Processing, ICA3PP 2018
Country/Territory	China
City	Guangzhou
Period	11/15/18 → 11/17/18

ASJC Scopus subject areas

Theoretical Computer Science
General Computer Science

Access to Document

10.1007/978-3-030-05051-1_25

Cite this

Deng, M., Zhao, M., Liu, F., Chen, Z., & Xiao, N. (2018). DA placement: A dual-aware data placement in a deduplicated and erasure-coded storage system. In J. Vaidya, & J. Li (Eds.), Algorithms and Architectures for Parallel Processing - 18th International Conference, ICA3PP 2018, Proceedings (pp. 358-377). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 11334 LNCS). Springer Verlag. https://doi.org/10.1007/978-3-030-05051-1_25

DA placement: A dual-aware data placement in a deduplicated and erasure-coded storage system. / Deng, Mingzhu; Zhao, Ming; Liu, Fang et al.
Algorithms and Architectures for Parallel Processing - 18th International Conference, ICA3PP 2018, Proceedings. ed. / Jaideep Vaidya; Jin Li. Springer Verlag, 2018. p. 358-377 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 11334 LNCS).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Deng, M, Zhao, M, Liu, F, Chen, Z & Xiao, N 2018, DA placement: A dual-aware data placement in a deduplicated and erasure-coded storage system. in J Vaidya & J Li (eds), Algorithms and Architectures for Parallel Processing - 18th International Conference, ICA3PP 2018, Proceedings. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 11334 LNCS, Springer Verlag, pp. 358-377, 18th International Conference on Algorithms and Architectures for Parallel Processing, ICA3PP 2018, Guangzhou, China, 11/15/18. https://doi.org/10.1007/978-3-030-05051-1_25

Deng M, Zhao M, Liu F, Chen Z, Xiao N. DA placement: A dual-aware data placement in a deduplicated and erasure-coded storage system. In Vaidya J, Li J, editors, Algorithms and Architectures for Parallel Processing - 18th International Conference, ICA3PP 2018, Proceedings. Springer Verlag. 2018. p. 358-377. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). doi: 10.1007/978-3-030-05051-1_25

Deng, Mingzhu ; Zhao, Ming ; Liu, Fang et al. / DA placement : A dual-aware data placement in a deduplicated and erasure-coded storage system. Algorithms and Architectures for Parallel Processing - 18th International Conference, ICA3PP 2018, Proceedings. editor / Jaideep Vaidya ; Jin Li. Springer Verlag, 2018. pp. 358-377 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).

@inproceedings{4871125ffed14d98ae4c033045804436,

title = "DA placement: A dual-aware data placement in a deduplicated and erasure-coded storage system",

abstract = "Simultaneously incorporating deduplication as well as erasure coding is preferred for modern storage systems for the enhanced storage efficiency and economical data reliability. However, simple incorporation suffers from the “read imbalance problem”, in which parallel data accesses are curbed by throttled storage nodes. This problem is due to the uneven data placement in the system, which is unaware of the employment of both deduplication and erasure coding, each of whom alters the order of data if unattended. This paper proposes a systematic design and implementation of a Dual-Aware(DA) placement in a combined storage system to achieve both deduplication-awareness and erasure-coding-awareness at the same time. DA not only records the node number of each unique data to allow for quick references with ease, but also dynamically tracks used nodes for each writes request. In this way, deduplication awareness is formed to skip inconvenient placement locations. Besides, DA serializes the placement of parity blocks with a stripe and across stripes. Such realization of erasure coding awareness ensures the separation of data and parity, as well as maintains data sequentiality at bordering stripes. Additionally, DA manages to extend with further load-balancing through an innovative use of the deduplication level, which intuitively predicts future accesses of a piece of data. In short, DA manages to boost system performance with little memory or computation cost. Extensive experiments using both real-world traces and synthesized workloads, prove DA achieves a better read performance. For example, DA respectively leads an average latency margin of 30.86% and 29.63%, over the baseline rolling placement(BA) and random placement(RA) under CAFTL traces over a default cluster of 12 nodes with RS(8,4).",

author = "Mingzhu Deng and Ming Zhao and Fang Liu and Zhiguang Chen and Nong Xiao",

note = "Funding Information: Acknowledgment. We would like to greatly appreciate the anonymous reviewers for their insightful comments. This work is supported by the National Natural Science Foundation of China under Grant Nos. 61433019, U1435217, and the National High Technology Research and Development Program of China under Grant No. 2016YFB1000302. Publisher Copyright: {\textcopyright} Springer Nature Switzerland AG 2018.; 18th International Conference on Algorithms and Architectures for Parallel Processing, ICA3PP 2018 ; Conference date: 15-11-2018 Through 17-11-2018",

year = "2018",

doi = "10.1007/978-3-030-05051-1_25",

language = "English (US)",

isbn = "9783030050504",

series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",

publisher = "Springer Verlag",

pages = "358--377",

editor = "Jaideep Vaidya and Jin Li",

booktitle = "Algorithms and Architectures for Parallel Processing - 18th International Conference, ICA3PP 2018, Proceedings",

}

TY - GEN

T1 - DA placement

T2 - 18th International Conference on Algorithms and Architectures for Parallel Processing, ICA3PP 2018

AU - Deng, Mingzhu

AU - Zhao, Ming

AU - Liu, Fang

AU - Chen, Zhiguang

AU - Xiao, Nong

N1 - Funding Information: Acknowledgment. We would like to greatly appreciate the anonymous reviewers for their insightful comments. This work is supported by the National Natural Science Foundation of China under Grant Nos. 61433019, U1435217, and the National High Technology Research and Development Program of China under Grant No. 2016YFB1000302. Publisher Copyright: © Springer Nature Switzerland AG 2018.

PY - 2018

Y1 - 2018

N2 - Simultaneously incorporating deduplication as well as erasure coding is preferred for modern storage systems for the enhanced storage efficiency and economical data reliability. However, simple incorporation suffers from the “read imbalance problem”, in which parallel data accesses are curbed by throttled storage nodes. This problem is due to the uneven data placement in the system, which is unaware of the employment of both deduplication and erasure coding, each of whom alters the order of data if unattended. This paper proposes a systematic design and implementation of a Dual-Aware(DA) placement in a combined storage system to achieve both deduplication-awareness and erasure-coding-awareness at the same time. DA not only records the node number of each unique data to allow for quick references with ease, but also dynamically tracks used nodes for each writes request. In this way, deduplication awareness is formed to skip inconvenient placement locations. Besides, DA serializes the placement of parity blocks with a stripe and across stripes. Such realization of erasure coding awareness ensures the separation of data and parity, as well as maintains data sequentiality at bordering stripes. Additionally, DA manages to extend with further load-balancing through an innovative use of the deduplication level, which intuitively predicts future accesses of a piece of data. In short, DA manages to boost system performance with little memory or computation cost. Extensive experiments using both real-world traces and synthesized workloads, prove DA achieves a better read performance. For example, DA respectively leads an average latency margin of 30.86% and 29.63%, over the baseline rolling placement(BA) and random placement(RA) under CAFTL traces over a default cluster of 12 nodes with RS(8,4).

AB - Simultaneously incorporating deduplication as well as erasure coding is preferred for modern storage systems for the enhanced storage efficiency and economical data reliability. However, simple incorporation suffers from the “read imbalance problem”, in which parallel data accesses are curbed by throttled storage nodes. This problem is due to the uneven data placement in the system, which is unaware of the employment of both deduplication and erasure coding, each of whom alters the order of data if unattended. This paper proposes a systematic design and implementation of a Dual-Aware(DA) placement in a combined storage system to achieve both deduplication-awareness and erasure-coding-awareness at the same time. DA not only records the node number of each unique data to allow for quick references with ease, but also dynamically tracks used nodes for each writes request. In this way, deduplication awareness is formed to skip inconvenient placement locations. Besides, DA serializes the placement of parity blocks with a stripe and across stripes. Such realization of erasure coding awareness ensures the separation of data and parity, as well as maintains data sequentiality at bordering stripes. Additionally, DA manages to extend with further load-balancing through an innovative use of the deduplication level, which intuitively predicts future accesses of a piece of data. In short, DA manages to boost system performance with little memory or computation cost. Extensive experiments using both real-world traces and synthesized workloads, prove DA achieves a better read performance. For example, DA respectively leads an average latency margin of 30.86% and 29.63%, over the baseline rolling placement(BA) and random placement(RA) under CAFTL traces over a default cluster of 12 nodes with RS(8,4).

UR - http://www.scopus.com/inward/record.url?scp=85058627477&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85058627477&partnerID=8YFLogxK

U2 - 10.1007/978-3-030-05051-1_25

DO - 10.1007/978-3-030-05051-1_25

M3 - Conference contribution

AN - SCOPUS:85058627477

SN - 9783030050504

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 358

EP - 377

BT - Algorithms and Architectures for Parallel Processing - 18th International Conference, ICA3PP 2018, Proceedings

A2 - Vaidya, Jaideep

A2 - Li, Jin

PB - Springer Verlag

Y2 - 15 November 2018 through 17 November 2018

ER -

DA placement: A dual-aware data placement in a deduplicated and erasure-coded storage system

Abstract

Publication series

Conference

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this