TY - GEN
T1 - BigCache for big-data systems
AU - Roger, Michel Angelo
AU - Xu, Yiqi
AU - Zhao, Ming
PY - 2015/1/7
Y1 - 2015/1/7
N2 - Big-data systems are increasingly used in many disciplines for important tasks such as knowledge discovery and decision making by processing large volumes of data. Big-data systems rely on hard-disk drive (HDD) based storage to provide the necessary capacity. However, as big-data applications grow rapidly more diverse and demanding, HDD storage becomes insufficient to satisfy their performance requirements. Emerging solid-state drives (SSDs) promise great IO performance that can be exploited by big-data applications, but they still face serious limitations in capacity, cost, and endurance and therefore must be strategically incorporated into big-data systems. This paper presents BigCache, an SSD-based distributed caching layer for big-data systems. It is designed to be seamlessly integrated with existing big-data systems and transparently accelerate IOs for diverse big-data applications. The management of the distributed SSD caches in BigCache is coordinated with the job management of big-data systems in order to support cache-locality-driven job scheduling. BigCache is prototyped in Hadoop to provide caching upon HDFS for MapReduce applications. It is evaluated using typical MapReduce applications, and the results show that BigCache reduces the runtime of WordCount by 38% and the runtime of TeraSort by 52%. The results also show that BigCache is able to achieve significant speedup by caching only partial input for the benchmarks, owing to its ability to cache partial input and its replacement policy that recognizes application access patterns.
AB - Big-data systems are increasingly used in many disciplines for important tasks such as knowledge discovery and decision making by processing large volumes of data. Big-data systems rely on hard-disk drive (HDD) based storage to provide the necessary capacity. However, as big-data applications grow rapidly more diverse and demanding, HDD storage becomes insufficient to satisfy their performance requirements. Emerging solid-state drives (SSDs) promise great IO performance that can be exploited by big-data applications, but they still face serious limitations in capacity, cost, and endurance and therefore must be strategically incorporated into big-data systems. This paper presents BigCache, an SSD-based distributed caching layer for big-data systems. It is designed to be seamlessly integrated with existing big-data systems and transparently accelerate IOs for diverse big-data applications. The management of the distributed SSD caches in BigCache is coordinated with the job management of big-data systems in order to support cache-locality-driven job scheduling. BigCache is prototyped in Hadoop to provide caching upon HDFS for MapReduce applications. It is evaluated using typical MapReduce applications, and the results show that BigCache reduces the runtime of WordCount by 38% and the runtime of TeraSort by 52%. The results also show that BigCache is able to achieve significant speedup by caching only partial input for the benchmarks, owing to its ability to cache partial input and its replacement policy that recognizes application access patterns.
UR - http://www.scopus.com/inward/record.url?scp=84921811460&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84921811460&partnerID=8YFLogxK
U2 - 10.1109/BigData.2014.7004231
DO - 10.1109/BigData.2014.7004231
M3 - Conference contribution
AN - SCOPUS:84921811460
T3 - Proceedings - 2014 IEEE International Conference on Big Data, IEEE Big Data 2014
SP - 189
EP - 194
BT - Proceedings - 2014 IEEE International Conference on Big Data, IEEE Big Data 2014
A2 - Chang, Wo
A2 - Huan, Jun
A2 - Cercone, Nick
A2 - Pyne, Saumyadipta
A2 - Honavar, Vasant
A2 - Lin, Jimmy
A2 - Hu, Xiaohua Tony
A2 - Aggarwal, Charu
A2 - Mobasher, Bamshad
A2 - Pei, Jian
A2 - Nambiar, Raghunath
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2nd IEEE International Conference on Big Data, IEEE Big Data 2014
Y2 - 27 October 2014 through 30 October 2014
ER -