PlinyCompute: A platform for high-performance, distributed, data-intensive tool development

Jia Zou, R. Matthew Barnett, Tania Lorido-Botran, Shangyu Luo, Carlos Monroy, Sourav Sikdar, Kia Teymourian, Binhang Yuan, Chris Jermaine

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Citation (Scopus)

Abstract

This paper describes PlinyCompute, a system for development of high-performance, data-intensive, distributed computing tools and libraries. In the large, PlinyCompute presents the programmer with a very high-level, declarative interface, relying on automatic, relational-database style optimization to figure out how to stage distributed computations. However, in the small, PlinyCompute presents the capable systems programmer with a persistent object data model and API (the "PC object model") and associated memory management system that has been designed from the ground-up for high performance, distributed, data-intensive computing. This contrasts with most other Big Data systems, which are constructed on top of the Java Virtual Machine (JVM), and hence must at least partially cede performance-critical concerns such as memory management (including layout and de/allocation) and virtual method/-function dispatch to the JVM. This hybrid approach-declarative in the large, trusting the programmer's ability to utilize PC object model efficiently in the small-results in a system that is ideal for the development of reusable, data-intensive tools and libraries.

Original languageEnglish (US)
Title of host publicationSIGMOD 2018 - Proceedings of the 2018 International Conference on Management of Data
EditorsGautam Das, Christopher Jermaine, Ahmed Eldawy, Philip Bernstein
PublisherAssociation for Computing Machinery
Pages1189-1204
Number of pages16
ISBN (Electronic)9781450317436
DOIs
StatePublished - May 27 2018
Externally publishedYes
Event44th ACM SIGMOD International Conference on Management of Data, SIGMOD 2018 - Houston, United States
Duration: Jun 10 2018Jun 15 2018

Publication series

NameProceedings of the ACM SIGMOD International Conference on Management of Data
ISSN (Print)0730-8078

Conference

Conference44th ACM SIGMOD International Conference on Management of Data, SIGMOD 2018
CountryUnited States
CityHouston
Period6/10/186/15/18

Fingerprint

Data storage equipment
Distributed computer systems
Application programming interfaces (API)
Data structures
Virtual machine
Big data

Keywords

  • Distributed computing
  • Object model
  • Query compilation

ASJC Scopus subject areas

  • Software
  • Information Systems

Cite this

Zou, J., Barnett, R. M., Lorido-Botran, T., Luo, S., Monroy, C., Sikdar, S., ... Jermaine, C. (2018). PlinyCompute: A platform for high-performance, distributed, data-intensive tool development. In G. Das, C. Jermaine, A. Eldawy, & P. Bernstein (Eds.), SIGMOD 2018 - Proceedings of the 2018 International Conference on Management of Data (pp. 1189-1204). (Proceedings of the ACM SIGMOD International Conference on Management of Data). Association for Computing Machinery. https://doi.org/10.1145/3183713.3196933

PlinyCompute : A platform for high-performance, distributed, data-intensive tool development. / Zou, Jia; Barnett, R. Matthew; Lorido-Botran, Tania; Luo, Shangyu; Monroy, Carlos; Sikdar, Sourav; Teymourian, Kia; Yuan, Binhang; Jermaine, Chris.

SIGMOD 2018 - Proceedings of the 2018 International Conference on Management of Data. ed. / Gautam Das; Christopher Jermaine; Ahmed Eldawy; Philip Bernstein. Association for Computing Machinery, 2018. p. 1189-1204 (Proceedings of the ACM SIGMOD International Conference on Management of Data).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Zou, J, Barnett, RM, Lorido-Botran, T, Luo, S, Monroy, C, Sikdar, S, Teymourian, K, Yuan, B & Jermaine, C 2018, PlinyCompute: A platform for high-performance, distributed, data-intensive tool development. in G Das, C Jermaine, A Eldawy & P Bernstein (eds), SIGMOD 2018 - Proceedings of the 2018 International Conference on Management of Data. Proceedings of the ACM SIGMOD International Conference on Management of Data, Association for Computing Machinery, pp. 1189-1204, 44th ACM SIGMOD International Conference on Management of Data, SIGMOD 2018, Houston, United States, 6/10/18. https://doi.org/10.1145/3183713.3196933
Zou J, Barnett RM, Lorido-Botran T, Luo S, Monroy C, Sikdar S et al. PlinyCompute: A platform for high-performance, distributed, data-intensive tool development. In Das G, Jermaine C, Eldawy A, Bernstein P, editors, SIGMOD 2018 - Proceedings of the 2018 International Conference on Management of Data. Association for Computing Machinery. 2018. p. 1189-1204. (Proceedings of the ACM SIGMOD International Conference on Management of Data). https://doi.org/10.1145/3183713.3196933
Zou, Jia ; Barnett, R. Matthew ; Lorido-Botran, Tania ; Luo, Shangyu ; Monroy, Carlos ; Sikdar, Sourav ; Teymourian, Kia ; Yuan, Binhang ; Jermaine, Chris. / PlinyCompute : A platform for high-performance, distributed, data-intensive tool development. SIGMOD 2018 - Proceedings of the 2018 International Conference on Management of Data. editor / Gautam Das ; Christopher Jermaine ; Ahmed Eldawy ; Philip Bernstein. Association for Computing Machinery, 2018. pp. 1189-1204 (Proceedings of the ACM SIGMOD International Conference on Management of Data).
@inproceedings{75b8718e878e4be4b92169addff3c947,
title = "PlinyCompute: A platform for high-performance, distributed, data-intensive tool development",
abstract = "This paper describes PlinyCompute, a system for development of high-performance, data-intensive, distributed computing tools and libraries. In the large, PlinyCompute presents the programmer with a very high-level, declarative interface, relying on automatic, relational-database style optimization to figure out how to stage distributed computations. However, in the small, PlinyCompute presents the capable systems programmer with a persistent object data model and API (the {"}PC object model{"}) and associated memory management system that has been designed from the ground-up for high performance, distributed, data-intensive computing. This contrasts with most other Big Data systems, which are constructed on top of the Java Virtual Machine (JVM), and hence must at least partially cede performance-critical concerns such as memory management (including layout and de/allocation) and virtual method/-function dispatch to the JVM. This hybrid approach-declarative in the large, trusting the programmer's ability to utilize PC object model efficiently in the small-results in a system that is ideal for the development of reusable, data-intensive tools and libraries.",
keywords = "Distributed computing, Object model, Query compilation",
author = "Jia Zou and Barnett, {R. Matthew} and Tania Lorido-Botran and Shangyu Luo and Carlos Monroy and Sourav Sikdar and Kia Teymourian and Binhang Yuan and Chris Jermaine",
year = "2018",
month = "5",
day = "27",
doi = "10.1145/3183713.3196933",
language = "English (US)",
series = "Proceedings of the ACM SIGMOD International Conference on Management of Data",
publisher = "Association for Computing Machinery",
pages = "1189--1204",
editor = "Gautam Das and Christopher Jermaine and Ahmed Eldawy and Philip Bernstein",
booktitle = "SIGMOD 2018 - Proceedings of the 2018 International Conference on Management of Data",

}

TY - GEN

T1 - PlinyCompute

T2 - A platform for high-performance, distributed, data-intensive tool development

AU - Zou, Jia

AU - Barnett, R. Matthew

AU - Lorido-Botran, Tania

AU - Luo, Shangyu

AU - Monroy, Carlos

AU - Sikdar, Sourav

AU - Teymourian, Kia

AU - Yuan, Binhang

AU - Jermaine, Chris

PY - 2018/5/27

Y1 - 2018/5/27

N2 - This paper describes PlinyCompute, a system for development of high-performance, data-intensive, distributed computing tools and libraries. In the large, PlinyCompute presents the programmer with a very high-level, declarative interface, relying on automatic, relational-database style optimization to figure out how to stage distributed computations. However, in the small, PlinyCompute presents the capable systems programmer with a persistent object data model and API (the "PC object model") and associated memory management system that has been designed from the ground-up for high performance, distributed, data-intensive computing. This contrasts with most other Big Data systems, which are constructed on top of the Java Virtual Machine (JVM), and hence must at least partially cede performance-critical concerns such as memory management (including layout and de/allocation) and virtual method/-function dispatch to the JVM. This hybrid approach-declarative in the large, trusting the programmer's ability to utilize PC object model efficiently in the small-results in a system that is ideal for the development of reusable, data-intensive tools and libraries.

AB - This paper describes PlinyCompute, a system for development of high-performance, data-intensive, distributed computing tools and libraries. In the large, PlinyCompute presents the programmer with a very high-level, declarative interface, relying on automatic, relational-database style optimization to figure out how to stage distributed computations. However, in the small, PlinyCompute presents the capable systems programmer with a persistent object data model and API (the "PC object model") and associated memory management system that has been designed from the ground-up for high performance, distributed, data-intensive computing. This contrasts with most other Big Data systems, which are constructed on top of the Java Virtual Machine (JVM), and hence must at least partially cede performance-critical concerns such as memory management (including layout and de/allocation) and virtual method/-function dispatch to the JVM. This hybrid approach-declarative in the large, trusting the programmer's ability to utilize PC object model efficiently in the small-results in a system that is ideal for the development of reusable, data-intensive tools and libraries.

KW - Distributed computing

KW - Object model

KW - Query compilation

UR - http://www.scopus.com/inward/record.url?scp=85048760137&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85048760137&partnerID=8YFLogxK

U2 - 10.1145/3183713.3196933

DO - 10.1145/3183713.3196933

M3 - Conference contribution

AN - SCOPUS:85048760137

T3 - Proceedings of the ACM SIGMOD International Conference on Management of Data

SP - 1189

EP - 1204

BT - SIGMOD 2018 - Proceedings of the 2018 International Conference on Management of Data

A2 - Das, Gautam

A2 - Jermaine, Christopher

A2 - Eldawy, Ahmed

A2 - Bernstein, Philip

PB - Association for Computing Machinery

ER -