TY - GEN
T1 - DA placement
T2 - 18th International Conference on Algorithms and Architectures for Parallel Processing, ICA3PP 2018
AU - Deng, Mingzhu
AU - Zhao, Ming
AU - Liu, Fang
AU - Chen, Zhiguang
AU - Xiao, Nong
N1 - Funding Information:
Acknowledgment. We would like to greatly appreciate the anonymous reviewers for their insightful comments. This work is supported by the National Natural Science Foundation of China under Grant Nos. 61433019, U1435217, and the National High Technology Research and Development Program of China under Grant No. 2016YFB1000302.
Publisher Copyright:
© Springer Nature Switzerland AG 2018.
PY - 2018
Y1 - 2018
N2 - Simultaneously incorporating deduplication as well as erasure coding is preferred for modern storage systems for the enhanced storage efficiency and economical data reliability. However, simple incorporation suffers from the “read imbalance problem”, in which parallel data accesses are curbed by throttled storage nodes. This problem is due to the uneven data placement in the system, which is unaware of the employment of both deduplication and erasure coding, each of whom alters the order of data if unattended. This paper proposes a systematic design and implementation of a Dual-Aware(DA) placement in a combined storage system to achieve both deduplication-awareness and erasure-coding-awareness at the same time. DA not only records the node number of each unique data to allow for quick references with ease, but also dynamically tracks used nodes for each writes request. In this way, deduplication awareness is formed to skip inconvenient placement locations. Besides, DA serializes the placement of parity blocks with a stripe and across stripes. Such realization of erasure coding awareness ensures the separation of data and parity, as well as maintains data sequentiality at bordering stripes. Additionally, DA manages to extend with further load-balancing through an innovative use of the deduplication level, which intuitively predicts future accesses of a piece of data. In short, DA manages to boost system performance with little memory or computation cost. Extensive experiments using both real-world traces and synthesized workloads, prove DA achieves a better read performance. For example, DA respectively leads an average latency margin of 30.86% and 29.63%, over the baseline rolling placement(BA) and random placement(RA) under CAFTL traces over a default cluster of 12 nodes with RS(8,4).
AB - Simultaneously incorporating deduplication as well as erasure coding is preferred for modern storage systems for the enhanced storage efficiency and economical data reliability. However, simple incorporation suffers from the “read imbalance problem”, in which parallel data accesses are curbed by throttled storage nodes. This problem is due to the uneven data placement in the system, which is unaware of the employment of both deduplication and erasure coding, each of whom alters the order of data if unattended. This paper proposes a systematic design and implementation of a Dual-Aware(DA) placement in a combined storage system to achieve both deduplication-awareness and erasure-coding-awareness at the same time. DA not only records the node number of each unique data to allow for quick references with ease, but also dynamically tracks used nodes for each writes request. In this way, deduplication awareness is formed to skip inconvenient placement locations. Besides, DA serializes the placement of parity blocks with a stripe and across stripes. Such realization of erasure coding awareness ensures the separation of data and parity, as well as maintains data sequentiality at bordering stripes. Additionally, DA manages to extend with further load-balancing through an innovative use of the deduplication level, which intuitively predicts future accesses of a piece of data. In short, DA manages to boost system performance with little memory or computation cost. Extensive experiments using both real-world traces and synthesized workloads, prove DA achieves a better read performance. For example, DA respectively leads an average latency margin of 30.86% and 29.63%, over the baseline rolling placement(BA) and random placement(RA) under CAFTL traces over a default cluster of 12 nodes with RS(8,4).
UR - http://www.scopus.com/inward/record.url?scp=85058627477&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85058627477&partnerID=8YFLogxK
U2 - 10.1007/978-3-030-05051-1_25
DO - 10.1007/978-3-030-05051-1_25
M3 - Conference contribution
AN - SCOPUS:85058627477
SN - 9783030050504
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 358
EP - 377
BT - Algorithms and Architectures for Parallel Processing - 18th International Conference, ICA3PP 2018, Proceedings
A2 - Vaidya, Jaideep
A2 - Li, Jin
PB - Springer Verlag
Y2 - 15 November 2018 through 17 November 2018
ER -