PlinyCompute: A platform for high-performance, distributed, data-intensive tool development

Jia Zou; R. Matthew Barnett; Tania Lorido-Botran; Shangyu Luo; Carlos Monroy; Sourav Sikdar; Kia Teymourian; Binhang Yuan; Chris Jermaine

doi:10.1145/3183713.3196933

PlinyCompute: A platform for high-performance, distributed, data-intensive tool development

Jia Zou, R. Matthew Barnett, Tania Lorido-Botran, Shangyu Luo, Carlos Monroy, Sourav Sikdar, Kia Teymourian, Binhang Yuan, Chris Jermaine

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

13 Scopus citations

Abstract

This paper describes PlinyCompute, a system for development of high-performance, data-intensive, distributed computing tools and libraries. In the large, PlinyCompute presents the programmer with a very high-level, declarative interface, relying on automatic, relational-database style optimization to figure out how to stage distributed computations. However, in the small, PlinyCompute presents the capable systems programmer with a persistent object data model and API (the "PC object model") and associated memory management system that has been designed from the ground-up for high performance, distributed, data-intensive computing. This contrasts with most other Big Data systems, which are constructed on top of the Java Virtual Machine (JVM), and hence must at least partially cede performance-critical concerns such as memory management (including layout and de/allocation) and virtual method/-function dispatch to the JVM. This hybrid approach-declarative in the large, trusting the programmer's ability to utilize PC object model efficiently in the small-results in a system that is ideal for the development of reusable, data-intensive tools and libraries.

Original language	English (US)
Title of host publication	SIGMOD 2018 - Proceedings of the 2018 International Conference on Management of Data
Editors	Gautam Das, Christopher Jermaine, Ahmed Eldawy, Philip Bernstein
Publisher	Association for Computing Machinery
Pages	1189-1204
Number of pages	16
ISBN (Electronic)	9781450317436
DOIs	https://doi.org/10.1145/3183713.3196933
State	Published - May 27 2018
Externally published	Yes
Event	44th ACM SIGMOD International Conference on Management of Data, SIGMOD 2018 - Houston, United States Duration: Jun 10 2018 → Jun 15 2018

Publication series

Name	Proceedings of the ACM SIGMOD International Conference on Management of Data
ISSN (Print)	0730-8078

Conference

Conference	44th ACM SIGMOD International Conference on Management of Data, SIGMOD 2018
Country/Territory	United States
City	Houston
Period	6/10/18 → 6/15/18

Keywords

Distributed computing
Object model
Query compilation

ASJC Scopus subject areas

Software
Information Systems

Access to Document

10.1145/3183713.3196933

Cite this

Zou, J., Barnett, R. M., Lorido-Botran, T., Luo, S., Monroy, C., Sikdar, S., Teymourian, K., Yuan, B., & Jermaine, C. (2018). PlinyCompute: A platform for high-performance, distributed, data-intensive tool development. In G. Das, C. Jermaine, A. Eldawy, & P. Bernstein (Eds.), SIGMOD 2018 - Proceedings of the 2018 International Conference on Management of Data (pp. 1189-1204). (Proceedings of the ACM SIGMOD International Conference on Management of Data). Association for Computing Machinery. https://doi.org/10.1145/3183713.3196933

PlinyCompute: A platform for high-performance, distributed, data-intensive tool development. / Zou, Jia; Barnett, R. Matthew; Lorido-Botran, Tania et al.
SIGMOD 2018 - Proceedings of the 2018 International Conference on Management of Data. ed. / Gautam Das; Christopher Jermaine; Ahmed Eldawy; Philip Bernstein. Association for Computing Machinery, 2018. p. 1189-1204 (Proceedings of the ACM SIGMOD International Conference on Management of Data).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Zou, J, Barnett, RM, Lorido-Botran, T, Luo, S, Monroy, C, Sikdar, S, Teymourian, K, Yuan, B & Jermaine, C 2018, PlinyCompute: A platform for high-performance, distributed, data-intensive tool development. in G Das, C Jermaine, A Eldawy & P Bernstein (eds), SIGMOD 2018 - Proceedings of the 2018 International Conference on Management of Data. Proceedings of the ACM SIGMOD International Conference on Management of Data, Association for Computing Machinery, pp. 1189-1204, 44th ACM SIGMOD International Conference on Management of Data, SIGMOD 2018, Houston, United States, 6/10/18. https://doi.org/10.1145/3183713.3196933

Zou J, Barnett RM, Lorido-Botran T, Luo S, Monroy C, Sikdar S et al. PlinyCompute: A platform for high-performance, distributed, data-intensive tool development. In Das G, Jermaine C, Eldawy A, Bernstein P, editors, SIGMOD 2018 - Proceedings of the 2018 International Conference on Management of Data. Association for Computing Machinery. 2018. p. 1189-1204. (Proceedings of the ACM SIGMOD International Conference on Management of Data). doi: 10.1145/3183713.3196933

Zou, Jia ; Barnett, R. Matthew ; Lorido-Botran, Tania et al. / PlinyCompute : A platform for high-performance, distributed, data-intensive tool development. SIGMOD 2018 - Proceedings of the 2018 International Conference on Management of Data. editor / Gautam Das ; Christopher Jermaine ; Ahmed Eldawy ; Philip Bernstein. Association for Computing Machinery, 2018. pp. 1189-1204 (Proceedings of the ACM SIGMOD International Conference on Management of Data).

@inproceedings{75b8718e878e4be4b92169addff3c947,

title = "PlinyCompute: A platform for high-performance, distributed, data-intensive tool development",

abstract = "This paper describes PlinyCompute, a system for development of high-performance, data-intensive, distributed computing tools and libraries. In the large, PlinyCompute presents the programmer with a very high-level, declarative interface, relying on automatic, relational-database style optimization to figure out how to stage distributed computations. However, in the small, PlinyCompute presents the capable systems programmer with a persistent object data model and API (the {"}PC object model{"}) and associated memory management system that has been designed from the ground-up for high performance, distributed, data-intensive computing. This contrasts with most other Big Data systems, which are constructed on top of the Java Virtual Machine (JVM), and hence must at least partially cede performance-critical concerns such as memory management (including layout and de/allocation) and virtual method/-function dispatch to the JVM. This hybrid approach-declarative in the large, trusting the programmer's ability to utilize PC object model efficiently in the small-results in a system that is ideal for the development of reusable, data-intensive tools and libraries.",

keywords = "Distributed computing, Object model, Query compilation",

author = "Jia Zou and Barnett, {R. Matthew} and Tania Lorido-Botran and Shangyu Luo and Carlos Monroy and Sourav Sikdar and Kia Teymourian and Binhang Yuan and Chris Jermaine",

note = "Publisher Copyright: {\textcopyright} 2018 Association for Computing Machinery.; 44th ACM SIGMOD International Conference on Management of Data, SIGMOD 2018 ; Conference date: 10-06-2018 Through 15-06-2018",

year = "2018",

month = may,

day = "27",

doi = "10.1145/3183713.3196933",

language = "English (US)",

series = "Proceedings of the ACM SIGMOD International Conference on Management of Data",

publisher = "Association for Computing Machinery",

pages = "1189--1204",

editor = "Gautam Das and Christopher Jermaine and Ahmed Eldawy and Philip Bernstein",

booktitle = "SIGMOD 2018 - Proceedings of the 2018 International Conference on Management of Data",

}

TY - GEN

T1 - PlinyCompute

T2 - 44th ACM SIGMOD International Conference on Management of Data, SIGMOD 2018

AU - Zou, Jia

AU - Barnett, R. Matthew

AU - Lorido-Botran, Tania

AU - Luo, Shangyu

AU - Monroy, Carlos

AU - Sikdar, Sourav

AU - Teymourian, Kia

AU - Yuan, Binhang

AU - Jermaine, Chris

PY - 2018/5/27

Y1 - 2018/5/27

N2 - This paper describes PlinyCompute, a system for development of high-performance, data-intensive, distributed computing tools and libraries. In the large, PlinyCompute presents the programmer with a very high-level, declarative interface, relying on automatic, relational-database style optimization to figure out how to stage distributed computations. However, in the small, PlinyCompute presents the capable systems programmer with a persistent object data model and API (the "PC object model") and associated memory management system that has been designed from the ground-up for high performance, distributed, data-intensive computing. This contrasts with most other Big Data systems, which are constructed on top of the Java Virtual Machine (JVM), and hence must at least partially cede performance-critical concerns such as memory management (including layout and de/allocation) and virtual method/-function dispatch to the JVM. This hybrid approach-declarative in the large, trusting the programmer's ability to utilize PC object model efficiently in the small-results in a system that is ideal for the development of reusable, data-intensive tools and libraries.

AB - This paper describes PlinyCompute, a system for development of high-performance, data-intensive, distributed computing tools and libraries. In the large, PlinyCompute presents the programmer with a very high-level, declarative interface, relying on automatic, relational-database style optimization to figure out how to stage distributed computations. However, in the small, PlinyCompute presents the capable systems programmer with a persistent object data model and API (the "PC object model") and associated memory management system that has been designed from the ground-up for high performance, distributed, data-intensive computing. This contrasts with most other Big Data systems, which are constructed on top of the Java Virtual Machine (JVM), and hence must at least partially cede performance-critical concerns such as memory management (including layout and de/allocation) and virtual method/-function dispatch to the JVM. This hybrid approach-declarative in the large, trusting the programmer's ability to utilize PC object model efficiently in the small-results in a system that is ideal for the development of reusable, data-intensive tools and libraries.

KW - Distributed computing

KW - Object model

KW - Query compilation

UR - http://www.scopus.com/inward/record.url?scp=85048760137&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85048760137&partnerID=8YFLogxK

U2 - 10.1145/3183713.3196933

DO - 10.1145/3183713.3196933

M3 - Conference contribution

AN - SCOPUS:85048760137

T3 - Proceedings of the ACM SIGMOD International Conference on Management of Data

SP - 1189

EP - 1204

BT - SIGMOD 2018 - Proceedings of the 2018 International Conference on Management of Data

A2 - Das, Gautam

A2 - Jermaine, Christopher

A2 - Eldawy, Ahmed

A2 - Bernstein, Philip

PB - Association for Computing Machinery

Y2 - 10 June 2018 through 15 June 2018

ER -

PlinyCompute: A platform for high-performance, distributed, data-intensive tool development

Abstract

Publication series

Conference

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this