Data Partitioning Scheme for Efficient Distributed RDF Querying Using Apache Spark

Mahmudul Hassan; Srividya Bansal

doi:10.1109/ICOSC.2019.8665614

Data Partitioning Scheme for Efficient Distributed RDF Querying Using Apache Spark

Mahmudul Hassan, Srividya Bansal

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

11 Scopus citations

Abstract

The rapid growth of semantic data in the form of Resource Description Framework (RDF) triples demands an efficient, scalable, and distributed storage and parallel processing strategies along with high availability and fault tolerance for its management and reuse. There are three open issues with distributed RDF data management systems that are not well addressed altogether in existing work. First is the querying efficiency, second, solutions are optimized for certain types of query patterns and don't necessarily work well for all types of query patterns, and the third is concerned with reducing pre-processing and data loading times. To address these issues, we propose a relational partitioning scheme called Subset Property Table (SPT) for RDF data that further partitions the existing Property Table approach into subsets of tables to minimize query input and join operation. We combine SPT with another existing model Vertical Partitioning (VP) for storing RDF datasets and demonstrate that our proposed combined (SPT + VP) approach outperforms state-of-the-art systems based on in-memory processing engine in a distributed environment.

Original language	English (US)
Title of host publication	Proceedings - 13th IEEE International Conference on Semantic Computing, ICSC 2019
Publisher	Institute of Electrical and Electronics Engineers Inc.
Pages	24-31
Number of pages	8
ISBN (Electronic)	9781538667835
DOIs	https://doi.org/10.1109/ICOSC.2019.8665614
State	Published - Mar 11 2019
Event	13th IEEE International Conference on Semantic Computing, ICSC 2019 - Newport Beach, United States Duration: Jan 30 2019 → Feb 1 2019

Publication series

Name	Proceedings - 13th IEEE International Conference on Semantic Computing, ICSC 2019

Conference

Conference	13th IEEE International Conference on Semantic Computing, ICSC 2019
Country/Territory	United States
City	Newport Beach
Period	1/30/19 → 2/1/19

Keywords

Data Partitioning
Resource Description Framework
Semantic Web
Spark
SPARQL Querying

ASJC Scopus subject areas

Artificial Intelligence
Software

Access to Document

10.1109/ICOSC.2019.8665614

Cite this

Hassan, M., & Bansal, S. (2019). Data Partitioning Scheme for Efficient Distributed RDF Querying Using Apache Spark. In Proceedings - 13th IEEE International Conference on Semantic Computing, ICSC 2019 (pp. 24-31). Article 8665614 (Proceedings - 13th IEEE International Conference on Semantic Computing, ICSC 2019). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ICOSC.2019.8665614

Data Partitioning Scheme for Efficient Distributed RDF Querying Using Apache Spark. / Hassan, Mahmudul; Bansal, Srividya.
Proceedings - 13th IEEE International Conference on Semantic Computing, ICSC 2019. Institute of Electrical and Electronics Engineers Inc., 2019. p. 24-31 8665614 (Proceedings - 13th IEEE International Conference on Semantic Computing, ICSC 2019).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Hassan, M & Bansal, S 2019, Data Partitioning Scheme for Efficient Distributed RDF Querying Using Apache Spark. in Proceedings - 13th IEEE International Conference on Semantic Computing, ICSC 2019., 8665614, Proceedings - 13th IEEE International Conference on Semantic Computing, ICSC 2019, Institute of Electrical and Electronics Engineers Inc., pp. 24-31, 13th IEEE International Conference on Semantic Computing, ICSC 2019, Newport Beach, United States, 1/30/19. https://doi.org/10.1109/ICOSC.2019.8665614

Hassan M, Bansal S. Data Partitioning Scheme for Efficient Distributed RDF Querying Using Apache Spark. In Proceedings - 13th IEEE International Conference on Semantic Computing, ICSC 2019. Institute of Electrical and Electronics Engineers Inc. 2019. p. 24-31. 8665614. (Proceedings - 13th IEEE International Conference on Semantic Computing, ICSC 2019). doi: 10.1109/ICOSC.2019.8665614

@inproceedings{a50293ff95894b809c65b654e5934b4e,

title = "Data Partitioning Scheme for Efficient Distributed RDF Querying Using Apache Spark",

abstract = "The rapid growth of semantic data in the form of Resource Description Framework (RDF) triples demands an efficient, scalable, and distributed storage and parallel processing strategies along with high availability and fault tolerance for its management and reuse. There are three open issues with distributed RDF data management systems that are not well addressed altogether in existing work. First is the querying efficiency, second, solutions are optimized for certain types of query patterns and don't necessarily work well for all types of query patterns, and the third is concerned with reducing pre-processing and data loading times. To address these issues, we propose a relational partitioning scheme called Subset Property Table (SPT) for RDF data that further partitions the existing Property Table approach into subsets of tables to minimize query input and join operation. We combine SPT with another existing model Vertical Partitioning (VP) for storing RDF datasets and demonstrate that our proposed combined (SPT + VP) approach outperforms state-of-the-art systems based on in-memory processing engine in a distributed environment.",

keywords = "Data Partitioning, Resource Description Framework, Semantic Web, Spark, SPARQL Querying",

author = "Mahmudul Hassan and Srividya Bansal",

year = "2019",

month = mar,

day = "11",

doi = "10.1109/ICOSC.2019.8665614",

language = "English (US)",

series = "Proceedings - 13th IEEE International Conference on Semantic Computing, ICSC 2019",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

pages = "24--31",

booktitle = "Proceedings - 13th IEEE International Conference on Semantic Computing, ICSC 2019",

note = "13th IEEE International Conference on Semantic Computing, ICSC 2019 ; Conference date: 30-01-2019 Through 01-02-2019",

}

TY - GEN

T1 - Data Partitioning Scheme for Efficient Distributed RDF Querying Using Apache Spark

AU - Hassan, Mahmudul

AU - Bansal, Srividya

PY - 2019/3/11

Y1 - 2019/3/11

N2 - The rapid growth of semantic data in the form of Resource Description Framework (RDF) triples demands an efficient, scalable, and distributed storage and parallel processing strategies along with high availability and fault tolerance for its management and reuse. There are three open issues with distributed RDF data management systems that are not well addressed altogether in existing work. First is the querying efficiency, second, solutions are optimized for certain types of query patterns and don't necessarily work well for all types of query patterns, and the third is concerned with reducing pre-processing and data loading times. To address these issues, we propose a relational partitioning scheme called Subset Property Table (SPT) for RDF data that further partitions the existing Property Table approach into subsets of tables to minimize query input and join operation. We combine SPT with another existing model Vertical Partitioning (VP) for storing RDF datasets and demonstrate that our proposed combined (SPT + VP) approach outperforms state-of-the-art systems based on in-memory processing engine in a distributed environment.

AB - The rapid growth of semantic data in the form of Resource Description Framework (RDF) triples demands an efficient, scalable, and distributed storage and parallel processing strategies along with high availability and fault tolerance for its management and reuse. There are three open issues with distributed RDF data management systems that are not well addressed altogether in existing work. First is the querying efficiency, second, solutions are optimized for certain types of query patterns and don't necessarily work well for all types of query patterns, and the third is concerned with reducing pre-processing and data loading times. To address these issues, we propose a relational partitioning scheme called Subset Property Table (SPT) for RDF data that further partitions the existing Property Table approach into subsets of tables to minimize query input and join operation. We combine SPT with another existing model Vertical Partitioning (VP) for storing RDF datasets and demonstrate that our proposed combined (SPT + VP) approach outperforms state-of-the-art systems based on in-memory processing engine in a distributed environment.

KW - Data Partitioning

KW - Resource Description Framework

KW - Semantic Web

KW - Spark

KW - SPARQL Querying

UR - http://www.scopus.com/inward/record.url?scp=85064139924&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85064139924&partnerID=8YFLogxK

U2 - 10.1109/ICOSC.2019.8665614

DO - 10.1109/ICOSC.2019.8665614

M3 - Conference contribution

AN - SCOPUS:85064139924

T3 - Proceedings - 13th IEEE International Conference on Semantic Computing, ICSC 2019

SP - 24

EP - 31

BT - Proceedings - 13th IEEE International Conference on Semantic Computing, ICSC 2019

PB - Institute of Electrical and Electronics Engineers Inc.

T2 - 13th IEEE International Conference on Semantic Computing, ICSC 2019

Y2 - 30 January 2019 through 1 February 2019

ER -

Data Partitioning Scheme for Efficient Distributed RDF Querying Using Apache Spark

Abstract

Publication series

Conference

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this