S3QLRDF: Property Table Partitioning Scheme for Distributed SPARQL Querying of large-scale RDF data

Mahmudul Hassan, Srividya K. Bansal

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

The proliferation of the semantic web in the form of Resource Description Framework (RDF) demands an efficient, scalable, and distributed storage along with a highly available and fault-tolerant parallel processing strategy. More precisely, the rapid growth of RDF data raises the need for an efficient partitioning strategy over distributed data management systems to improve SPARQL query performance regardless of its pattern shape with minimized pre-processing time. In this context, we propose a new relational partitioning scheme called Property Table Partitioning (PTP) for RDF data, that further partitions existing Property Table into multiple tables based on distinct properties (comprising of all subjects with non-null values for those distinct properties) in order to minimize input data and join operations. In this paper, we introduce a distributed RDF data management system called S3QLRDF, which is built on top of Spark and utilizes SQL to execute SPARQL queries over PTP schema. We perform an extensive experimental evaluation with respect to preprocessing costs and query performance, using Lehigh University Benchmark (LUBM) and Waterloo SPARQL Diversity Test Suite (WatDiv) datasets with up to 1.4 billion triples. Our results demonstrate that S3QLRDF outperforms state-of-the-art distributed RDF management systems.

Original languageEnglish (US)
Title of host publicationProceedings - 2020 IEEE International Conference on Smart Data Services, SMDS 2020
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages133-140
Number of pages8
ISBN (Electronic)9781728187778
DOIs
StatePublished - Oct 2020
Event2020 IEEE International Conference on Smart Data Services, SMDS 2020 - Virtual, Beijing, China
Duration: Oct 18 2020Oct 24 2020

Publication series

NameProceedings - 2020 IEEE International Conference on Smart Data Services, SMDS 2020

Conference

Conference2020 IEEE International Conference on Smart Data Services, SMDS 2020
Country/TerritoryChina
CityVirtual, Beijing
Period10/18/2010/24/20

Keywords

  • Resource Description Framework, Semantic Web, SPARQL Querying, Data Partitioning, Spark.

ASJC Scopus subject areas

  • Artificial Intelligence
  • Computer Science Applications
  • Information Systems
  • Information Systems and Management

Fingerprint

Dive into the research topics of 'S3QLRDF: Property Table Partitioning Scheme for Distributed SPARQL Querying of large-scale RDF data'. Together they form a unique fingerprint.

Cite this