S3QLRDF: Property Table Partitioning Scheme for Distributed SPARQL Querying of large-scale RDF data

Mahmudul Hassan; Srividya K. Bansal

doi:10.1109/SMDS49396.2020.00023

S3QLRDF: Property Table Partitioning Scheme for Distributed SPARQL Querying of large-scale RDF data

Mahmudul Hassan, Srividya K. Bansal

Engineering, Ira A. Fulton Schools of (IAFSE)

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

4 Scopus citations

Abstract

The proliferation of the semantic web in the form of Resource Description Framework (RDF) demands an efficient, scalable, and distributed storage along with a highly available and fault-tolerant parallel processing strategy. More precisely, the rapid growth of RDF data raises the need for an efficient partitioning strategy over distributed data management systems to improve SPARQL query performance regardless of its pattern shape with minimized pre-processing time. In this context, we propose a new relational partitioning scheme called Property Table Partitioning (PTP) for RDF data, that further partitions existing Property Table into multiple tables based on distinct properties (comprising of all subjects with non-null values for those distinct properties) in order to minimize input data and join operations. In this paper, we introduce a distributed RDF data management system called S3QLRDF, which is built on top of Spark and utilizes SQL to execute SPARQL queries over PTP schema. We perform an extensive experimental evaluation with respect to preprocessing costs and query performance, using Lehigh University Benchmark (LUBM) and Waterloo SPARQL Diversity Test Suite (WatDiv) datasets with up to 1.4 billion triples. Our results demonstrate that S3QLRDF outperforms state-of-the-art distributed RDF management systems.

Original language	English (US)
Title of host publication	Proceedings - 2020 IEEE International Conference on Smart Data Services, SMDS 2020
Publisher	Institute of Electrical and Electronics Engineers Inc.
Pages	133-140
Number of pages	8
ISBN (Electronic)	9781728187778
DOIs	https://doi.org/10.1109/SMDS49396.2020.00023
State	Published - Oct 2020
Event	2020 IEEE International Conference on Smart Data Services, SMDS 2020 - Virtual, Beijing, China Duration: Oct 18 2020 → Oct 24 2020

Publication series

Name	Proceedings - 2020 IEEE International Conference on Smart Data Services, SMDS 2020

Conference

Conference	2020 IEEE International Conference on Smart Data Services, SMDS 2020
Country/Territory	China
City	Virtual, Beijing
Period	10/18/20 → 10/24/20

Keywords

Resource Description Framework, Semantic Web, SPARQL Querying, Data Partitioning, Spark.

ASJC Scopus subject areas

Artificial Intelligence
Computer Science Applications
Information Systems
Information Systems and Management

Access to Document

10.1109/SMDS49396.2020.00023

Cite this

Hassan, M., & Bansal, S. K. (2020). S3QLRDF: Property Table Partitioning Scheme for Distributed SPARQL Querying of large-scale RDF data. In Proceedings - 2020 IEEE International Conference on Smart Data Services, SMDS 2020 (pp. 133-140). Article 9288498 (Proceedings - 2020 IEEE International Conference on Smart Data Services, SMDS 2020). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/SMDS49396.2020.00023

S3QLRDF: Property Table Partitioning Scheme for Distributed SPARQL Querying of large-scale RDF data. / Hassan, Mahmudul; Bansal, Srividya K.
Proceedings - 2020 IEEE International Conference on Smart Data Services, SMDS 2020. Institute of Electrical and Electronics Engineers Inc., 2020. p. 133-140 9288498 (Proceedings - 2020 IEEE International Conference on Smart Data Services, SMDS 2020).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Hassan, M & Bansal, SK 2020, S3QLRDF: Property Table Partitioning Scheme for Distributed SPARQL Querying of large-scale RDF data. in Proceedings - 2020 IEEE International Conference on Smart Data Services, SMDS 2020., 9288498, Proceedings - 2020 IEEE International Conference on Smart Data Services, SMDS 2020, Institute of Electrical and Electronics Engineers Inc., pp. 133-140, 2020 IEEE International Conference on Smart Data Services, SMDS 2020, Virtual, Beijing, China, 10/18/20. https://doi.org/10.1109/SMDS49396.2020.00023

Hassan M, Bansal SK. S3QLRDF: Property Table Partitioning Scheme for Distributed SPARQL Querying of large-scale RDF data. In Proceedings - 2020 IEEE International Conference on Smart Data Services, SMDS 2020. Institute of Electrical and Electronics Engineers Inc. 2020. p. 133-140. 9288498. (Proceedings - 2020 IEEE International Conference on Smart Data Services, SMDS 2020). doi: 10.1109/SMDS49396.2020.00023

Hassan, Mahmudul ; Bansal, Srividya K. / S3QLRDF : Property Table Partitioning Scheme for Distributed SPARQL Querying of large-scale RDF data. Proceedings - 2020 IEEE International Conference on Smart Data Services, SMDS 2020. Institute of Electrical and Electronics Engineers Inc., 2020. pp. 133-140 (Proceedings - 2020 IEEE International Conference on Smart Data Services, SMDS 2020).

@inproceedings{1209337b14f7445fa0fcf4c171181c96,

title = "S3QLRDF: Property Table Partitioning Scheme for Distributed SPARQL Querying of large-scale RDF data",

abstract = "The proliferation of the semantic web in the form of Resource Description Framework (RDF) demands an efficient, scalable, and distributed storage along with a highly available and fault-tolerant parallel processing strategy. More precisely, the rapid growth of RDF data raises the need for an efficient partitioning strategy over distributed data management systems to improve SPARQL query performance regardless of its pattern shape with minimized pre-processing time. In this context, we propose a new relational partitioning scheme called Property Table Partitioning (PTP) for RDF data, that further partitions existing Property Table into multiple tables based on distinct properties (comprising of all subjects with non-null values for those distinct properties) in order to minimize input data and join operations. In this paper, we introduce a distributed RDF data management system called S3QLRDF, which is built on top of Spark and utilizes SQL to execute SPARQL queries over PTP schema. We perform an extensive experimental evaluation with respect to preprocessing costs and query performance, using Lehigh University Benchmark (LUBM) and Waterloo SPARQL Diversity Test Suite (WatDiv) datasets with up to 1.4 billion triples. Our results demonstrate that S3QLRDF outperforms state-of-the-art distributed RDF management systems.",

keywords = "Resource Description Framework, Semantic Web, SPARQL Querying, Data Partitioning, Spark.",

author = "Mahmudul Hassan and Bansal, {Srividya K.}",

note = "Publisher Copyright: {\textcopyright} 2020 IEEE.; 2020 IEEE International Conference on Smart Data Services, SMDS 2020 ; Conference date: 18-10-2020 Through 24-10-2020",

year = "2020",

month = oct,

doi = "10.1109/SMDS49396.2020.00023",

language = "English (US)",

series = "Proceedings - 2020 IEEE International Conference on Smart Data Services, SMDS 2020",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

pages = "133--140",

booktitle = "Proceedings - 2020 IEEE International Conference on Smart Data Services, SMDS 2020",

}

TY - GEN

T1 - S3QLRDF

T2 - 2020 IEEE International Conference on Smart Data Services, SMDS 2020

AU - Hassan, Mahmudul

AU - Bansal, Srividya K.

PY - 2020/10

Y1 - 2020/10

N2 - The proliferation of the semantic web in the form of Resource Description Framework (RDF) demands an efficient, scalable, and distributed storage along with a highly available and fault-tolerant parallel processing strategy. More precisely, the rapid growth of RDF data raises the need for an efficient partitioning strategy over distributed data management systems to improve SPARQL query performance regardless of its pattern shape with minimized pre-processing time. In this context, we propose a new relational partitioning scheme called Property Table Partitioning (PTP) for RDF data, that further partitions existing Property Table into multiple tables based on distinct properties (comprising of all subjects with non-null values for those distinct properties) in order to minimize input data and join operations. In this paper, we introduce a distributed RDF data management system called S3QLRDF, which is built on top of Spark and utilizes SQL to execute SPARQL queries over PTP schema. We perform an extensive experimental evaluation with respect to preprocessing costs and query performance, using Lehigh University Benchmark (LUBM) and Waterloo SPARQL Diversity Test Suite (WatDiv) datasets with up to 1.4 billion triples. Our results demonstrate that S3QLRDF outperforms state-of-the-art distributed RDF management systems.

AB - The proliferation of the semantic web in the form of Resource Description Framework (RDF) demands an efficient, scalable, and distributed storage along with a highly available and fault-tolerant parallel processing strategy. More precisely, the rapid growth of RDF data raises the need for an efficient partitioning strategy over distributed data management systems to improve SPARQL query performance regardless of its pattern shape with minimized pre-processing time. In this context, we propose a new relational partitioning scheme called Property Table Partitioning (PTP) for RDF data, that further partitions existing Property Table into multiple tables based on distinct properties (comprising of all subjects with non-null values for those distinct properties) in order to minimize input data and join operations. In this paper, we introduce a distributed RDF data management system called S3QLRDF, which is built on top of Spark and utilizes SQL to execute SPARQL queries over PTP schema. We perform an extensive experimental evaluation with respect to preprocessing costs and query performance, using Lehigh University Benchmark (LUBM) and Waterloo SPARQL Diversity Test Suite (WatDiv) datasets with up to 1.4 billion triples. Our results demonstrate that S3QLRDF outperforms state-of-the-art distributed RDF management systems.

KW - Resource Description Framework, Semantic Web, SPARQL Querying, Data Partitioning, Spark.

UR - http://www.scopus.com/inward/record.url?scp=85099258897&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85099258897&partnerID=8YFLogxK

U2 - 10.1109/SMDS49396.2020.00023

DO - 10.1109/SMDS49396.2020.00023

M3 - Conference contribution

AN - SCOPUS:85099258897

T3 - Proceedings - 2020 IEEE International Conference on Smart Data Services, SMDS 2020

SP - 133

EP - 140

BT - Proceedings - 2020 IEEE International Conference on Smart Data Services, SMDS 2020

PB - Institute of Electrical and Electronics Engineers Inc.

Y2 - 18 October 2020 through 24 October 2020

ER -

S3QLRDF: Property Table Partitioning Scheme for Distributed SPARQL Querying of large-scale RDF data

Abstract

Publication series

Conference

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this