Distributed SPARQL over Big RDF Data: A Comparative Analysis Using Presto and MapReduce

Mulugeta Mammo; Srividya Bansal

doi:10.1109/BigDataCongress.2015.15

Distributed SPARQL over Big RDF Data: A Comparative Analysis Using Presto and MapReduce

Mulugeta Mammo, Srividya Bansal

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

4 Scopus citations

Abstract

The processing of large volumes of RDF data require an efficient storage and query processing engine that can scale well with the volume of data. The initial attempts to address this issue focused on optimizing native RDF stores as well as conventional relational databases management systems. But as the volume of RDF data grew to exponential proportions, the limitations of these systems became apparent and researchers began to focus on using big data analysis tools, most notably Hadoop, to process RDF data. This paper presents a comparative analysis of performance of Presto (distributed SQL query engine) in processing big RDF data against Apache Hive. To evaluate the performance Presto for big RDF data processing, a map-reduce program and a compiler, based on Flex and Bison, were implemented. The map-reduce program loads RDF data into HDFS while the compiler translates SPARQL queries into a subset of SQL that Presto (and Hive) can understand.

Original language	English (US)
Title of host publication	Proceedings - 2015 IEEE International Congress on Big Data, BigData Congress 2015
Editors	Latifur Khan, Carminati Barbara
Publisher	Institute of Electrical and Electronics Engineers Inc.
Pages	33-40
Number of pages	8
ISBN (Electronic)	9781467372787
DOIs	https://doi.org/10.1109/BigDataCongress.2015.15
State	Published - Aug 17 2015
Event	4th IEEE International Congress on Big Data, BigData Congress 2015 - New York City, United States Duration: Jun 27 2015 → Jul 2 2015

Publication series

Name	Proceedings - 2015 IEEE International Congress on Big Data, BigData Congress 2015

Other

Other	4th IEEE International Congress on Big Data, BigData Congress 2015
Country/Territory	United States
City	New York City
Period	6/27/15 → 7/2/15

Keywords

Big Data processing
Database Performance
Evaluation
Querying
Semantic Web data

ASJC Scopus subject areas

Computer Networks and Communications
Computer Science Applications
Information Systems

Access to Document

10.1109/BigDataCongress.2015.15

Cite this

Mammo, M., & Bansal, S. (2015). Distributed SPARQL over Big RDF Data: A Comparative Analysis Using Presto and MapReduce. In L. Khan, & C. Barbara (Eds.), Proceedings - 2015 IEEE International Congress on Big Data, BigData Congress 2015 (pp. 33-40). Article 7207199 (Proceedings - 2015 IEEE International Congress on Big Data, BigData Congress 2015). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/BigDataCongress.2015.15

Distributed SPARQL over Big RDF Data: A Comparative Analysis Using Presto and MapReduce. / Mammo, Mulugeta; Bansal, Srividya.
Proceedings - 2015 IEEE International Congress on Big Data, BigData Congress 2015. ed. / Latifur Khan; Carminati Barbara. Institute of Electrical and Electronics Engineers Inc., 2015. p. 33-40 7207199 (Proceedings - 2015 IEEE International Congress on Big Data, BigData Congress 2015).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Mammo, M & Bansal, S 2015, Distributed SPARQL over Big RDF Data: A Comparative Analysis Using Presto and MapReduce. in L Khan & C Barbara (eds), Proceedings - 2015 IEEE International Congress on Big Data, BigData Congress 2015., 7207199, Proceedings - 2015 IEEE International Congress on Big Data, BigData Congress 2015, Institute of Electrical and Electronics Engineers Inc., pp. 33-40, 4th IEEE International Congress on Big Data, BigData Congress 2015, New York City, United States, 6/27/15. https://doi.org/10.1109/BigDataCongress.2015.15

Mammo M, Bansal S. Distributed SPARQL over Big RDF Data: A Comparative Analysis Using Presto and MapReduce. In Khan L, Barbara C, editors, Proceedings - 2015 IEEE International Congress on Big Data, BigData Congress 2015. Institute of Electrical and Electronics Engineers Inc. 2015. p. 33-40. 7207199. (Proceedings - 2015 IEEE International Congress on Big Data, BigData Congress 2015). doi: 10.1109/BigDataCongress.2015.15

Mammo, Mulugeta ; Bansal, Srividya. / Distributed SPARQL over Big RDF Data : A Comparative Analysis Using Presto and MapReduce. Proceedings - 2015 IEEE International Congress on Big Data, BigData Congress 2015. editor / Latifur Khan ; Carminati Barbara. Institute of Electrical and Electronics Engineers Inc., 2015. pp. 33-40 (Proceedings - 2015 IEEE International Congress on Big Data, BigData Congress 2015).

@inproceedings{9037d27e8f18494181d06c5051a2e1e5,

title = "Distributed SPARQL over Big RDF Data: A Comparative Analysis Using Presto and MapReduce",

abstract = "The processing of large volumes of RDF data require an efficient storage and query processing engine that can scale well with the volume of data. The initial attempts to address this issue focused on optimizing native RDF stores as well as conventional relational databases management systems. But as the volume of RDF data grew to exponential proportions, the limitations of these systems became apparent and researchers began to focus on using big data analysis tools, most notably Hadoop, to process RDF data. This paper presents a comparative analysis of performance of Presto (distributed SQL query engine) in processing big RDF data against Apache Hive. To evaluate the performance Presto for big RDF data processing, a map-reduce program and a compiler, based on Flex and Bison, were implemented. The map-reduce program loads RDF data into HDFS while the compiler translates SPARQL queries into a subset of SQL that Presto (and Hive) can understand.",

keywords = "Big Data processing, Database Performance, Evaluation, Querying, Semantic Web data",

author = "Mulugeta Mammo and Srividya Bansal",

note = "Publisher Copyright: {\textcopyright} 2015 IEEE.; 4th IEEE International Congress on Big Data, BigData Congress 2015 ; Conference date: 27-06-2015 Through 02-07-2015",

year = "2015",

month = aug,

day = "17",

doi = "10.1109/BigDataCongress.2015.15",

language = "English (US)",

series = "Proceedings - 2015 IEEE International Congress on Big Data, BigData Congress 2015",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

pages = "33--40",

editor = "Latifur Khan and Carminati Barbara",

booktitle = "Proceedings - 2015 IEEE International Congress on Big Data, BigData Congress 2015",

}

TY - GEN

T1 - Distributed SPARQL over Big RDF Data

T2 - 4th IEEE International Congress on Big Data, BigData Congress 2015

AU - Mammo, Mulugeta

AU - Bansal, Srividya

PY - 2015/8/17

Y1 - 2015/8/17

N2 - The processing of large volumes of RDF data require an efficient storage and query processing engine that can scale well with the volume of data. The initial attempts to address this issue focused on optimizing native RDF stores as well as conventional relational databases management systems. But as the volume of RDF data grew to exponential proportions, the limitations of these systems became apparent and researchers began to focus on using big data analysis tools, most notably Hadoop, to process RDF data. This paper presents a comparative analysis of performance of Presto (distributed SQL query engine) in processing big RDF data against Apache Hive. To evaluate the performance Presto for big RDF data processing, a map-reduce program and a compiler, based on Flex and Bison, were implemented. The map-reduce program loads RDF data into HDFS while the compiler translates SPARQL queries into a subset of SQL that Presto (and Hive) can understand.

AB - The processing of large volumes of RDF data require an efficient storage and query processing engine that can scale well with the volume of data. The initial attempts to address this issue focused on optimizing native RDF stores as well as conventional relational databases management systems. But as the volume of RDF data grew to exponential proportions, the limitations of these systems became apparent and researchers began to focus on using big data analysis tools, most notably Hadoop, to process RDF data. This paper presents a comparative analysis of performance of Presto (distributed SQL query engine) in processing big RDF data against Apache Hive. To evaluate the performance Presto for big RDF data processing, a map-reduce program and a compiler, based on Flex and Bison, were implemented. The map-reduce program loads RDF data into HDFS while the compiler translates SPARQL queries into a subset of SQL that Presto (and Hive) can understand.

KW - Big Data processing

KW - Database Performance

KW - Evaluation

KW - Querying

KW - Semantic Web data

UR - http://www.scopus.com/inward/record.url?scp=84959475325&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84959475325&partnerID=8YFLogxK

U2 - 10.1109/BigDataCongress.2015.15

DO - 10.1109/BigDataCongress.2015.15

M3 - Conference contribution

AN - SCOPUS:84959475325

T3 - Proceedings - 2015 IEEE International Congress on Big Data, BigData Congress 2015

SP - 33

EP - 40

BT - Proceedings - 2015 IEEE International Congress on Big Data, BigData Congress 2015

A2 - Khan, Latifur

A2 - Barbara, Carminati

PB - Institute of Electrical and Electronics Engineers Inc.

Y2 - 27 June 2015 through 2 July 2015

ER -

Distributed SPARQL over Big RDF Data: A Comparative Analysis Using Presto and MapReduce

Abstract

Publication series

Other

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this