Data Partitioning Scheme for Efficient Distributed RDF Querying Using Apache Spark

Mahmudul Hassan, Srividya Bansal

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Scopus citations

Abstract

The rapid growth of semantic data in the form of Resource Description Framework (RDF) triples demands an efficient, scalable, and distributed storage and parallel processing strategies along with high availability and fault tolerance for its management and reuse. There are three open issues with distributed RDF data management systems that are not well addressed altogether in existing work. First is the querying efficiency, second, solutions are optimized for certain types of query patterns and don't necessarily work well for all types of query patterns, and the third is concerned with reducing pre-processing and data loading times. To address these issues, we propose a relational partitioning scheme called Subset Property Table (SPT) for RDF data that further partitions the existing Property Table approach into subsets of tables to minimize query input and join operation. We combine SPT with another existing model Vertical Partitioning (VP) for storing RDF datasets and demonstrate that our proposed combined (SPT + VP) approach outperforms state-of-the-art systems based on in-memory processing engine in a distributed environment.

Original languageEnglish (US)
Title of host publicationProceedings - 13th IEEE International Conference on Semantic Computing, ICSC 2019
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages24-31
Number of pages8
ISBN (Electronic)9781538667835
DOIs
StatePublished - Mar 11 2019
Event13th IEEE International Conference on Semantic Computing, ICSC 2019 - Newport Beach, United States
Duration: Jan 30 2019Feb 1 2019

Publication series

NameProceedings - 13th IEEE International Conference on Semantic Computing, ICSC 2019

Conference

Conference13th IEEE International Conference on Semantic Computing, ICSC 2019
CountryUnited States
CityNewport Beach
Period1/30/192/1/19

    Fingerprint

Keywords

  • Data Partitioning
  • Resource Description Framework
  • Semantic Web
  • Spark
  • SPARQL Querying

ASJC Scopus subject areas

  • Artificial Intelligence
  • Software

Cite this

Hassan, M., & Bansal, S. (2019). Data Partitioning Scheme for Efficient Distributed RDF Querying Using Apache Spark. In Proceedings - 13th IEEE International Conference on Semantic Computing, ICSC 2019 (pp. 24-31). [8665614] (Proceedings - 13th IEEE International Conference on Semantic Computing, ICSC 2019). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ICOSC.2019.8665614