Distributed SPARQL over Big RDF Data: A Comparative Analysis Using Presto and MapReduce

Mulugeta Mammo, Srividya Bansal

Research output: Chapter in Book/Report/Conference proceedingConference contribution

4 Scopus citations

Abstract

The processing of large volumes of RDF data require an efficient storage and query processing engine that can scale well with the volume of data. The initial attempts to address this issue focused on optimizing native RDF stores as well as conventional relational databases management systems. But as the volume of RDF data grew to exponential proportions, the limitations of these systems became apparent and researchers began to focus on using big data analysis tools, most notably Hadoop, to process RDF data. This paper presents a comparative analysis of performance of Presto (distributed SQL query engine) in processing big RDF data against Apache Hive. To evaluate the performance Presto for big RDF data processing, a map-reduce program and a compiler, based on Flex and Bison, were implemented. The map-reduce program loads RDF data into HDFS while the compiler translates SPARQL queries into a subset of SQL that Presto (and Hive) can understand.

Original languageEnglish (US)
Title of host publicationProceedings - 2015 IEEE International Congress on Big Data, BigData Congress 2015
EditorsLatifur Khan, Carminati Barbara
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages33-40
Number of pages8
ISBN (Electronic)9781467372787
DOIs
StatePublished - Aug 17 2015
Event4th IEEE International Congress on Big Data, BigData Congress 2015 - New York City, United States
Duration: Jun 27 2015Jul 2 2015

Publication series

NameProceedings - 2015 IEEE International Congress on Big Data, BigData Congress 2015

Other

Other4th IEEE International Congress on Big Data, BigData Congress 2015
Country/TerritoryUnited States
CityNew York City
Period6/27/157/2/15

Keywords

  • Big Data processing
  • Database Performance
  • Evaluation
  • Querying
  • Semantic Web data

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Computer Science Applications
  • Information Systems

Fingerprint

Dive into the research topics of 'Distributed SPARQL over Big RDF Data: A Comparative Analysis Using Presto and MapReduce'. Together they form a unique fingerprint.

Cite this