RDF data storage techniques for efficient SPARQL query processing using distributed computation engines

Mahmudul Hassan, Srividya Bansal

Research output: Chapter in Book/Report/Conference proceedingConference contribution

3 Scopus citations

Abstract

The rapidly growing amount of linked open data demands semantic RDF services that are efficient, scalable, and distributed along with high availability for reuse and fault tolerance. To address this concern, the Big Data processing infrastructure Hadoop has been adopted for RDF data management systems. In this paper, we introduce distributed RDF data stores, namely VPExp and 3CStore, based on the existing vertical partitioning (VP) approach. In the VPExp approach, we propose splitting of predicates based on explicit type information of an object. The 3CStore scheme is designed with a 3-column store, comprising of a subset of triples from the VP table based on different join correlations, to reduce the number of join operations while executing SPARQL queries as SQL in a distributed system. We evaluate these two RDF data storage approaches by comparing them with vertical partitioning approach and state-of-the-art RDF management system S2RDF. We also present an evaluation of query performance of these systems built upon two popular distributed computation engines namely, Spark and Drill.

Original languageEnglish (US)
Title of host publicationProceedings - 2018 IEEE 19th International Conference on Information Reuse and Integration for Data Science, IRI 2018
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages323-330
Number of pages8
ISBN (Print)9781538626597
DOIs
StatePublished - Aug 2 2018
Event19th IEEE International Conference on Information Reuse and Integration for Data Science, IRI 2018 - Salt Lake City, United States
Duration: Jul 7 2018Jul 9 2018

Other

Other19th IEEE International Conference on Information Reuse and Integration for Data Science, IRI 2018
CountryUnited States
CitySalt Lake City
Period7/7/187/9/18

Keywords

  • Drill
  • Hadoop
  • In-memory processing engine
  • Information reuse
  • RDF data storage
  • Semantic web
  • Spark
  • SPARQL Querying

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Software
  • Artificial Intelligence
  • Information Systems and Management
  • Safety, Risk, Reliability and Quality
  • Public Administration

Fingerprint Dive into the research topics of 'RDF data storage techniques for efficient SPARQL query processing using distributed computation engines'. Together they form a unique fingerprint.

Cite this