Efficient processing of skyline-join queries over multiple data sources

Mithila Nagendra; Kasim Candan

doi:10.1145/2699483

Efficient processing of skyline-join queries over multiple data sources

Mithila Nagendra, Kasim Candan

Research output: Contribution to journal › Article › peer-review

10 Scopus citations

Abstract

Efficient processing of skyline queries has been an area of growing interest. Many of the earlier skyline techniques assumed that the skyline query is applied to a single data table. Naturally, these algorithms were not suitable for many applications in which the skyline query may involve attributes belonging to multiple data sources. In other words, if the data used in the skyline query are stored in multiple tables, then join operations would be required before the skyline can be searched. The task of computing skylines on multiple data sources has been coined as the skyline-join problem and various skyline-join algorithms have been proposed. However, the current proposals suffer several drawbacks: they often need to scan the input tables exhaustively in order to obtain the set of skyline-join results; moreover, the pruning techniques employed to eliminate the tuples are largely based on expensive pairwise tuple-to-tuple comparisons. In this article, we aim to address these shortcomings by proposing two novel skyline-join algorithms, namely skyline-sensitive join (S²J) and symmetric skyline-sensitive join (S³J), to process skyline queries over two data sources. Our approaches compute the results using a novel layer/region pruning technique (LR-pruning) that prunes the join space in blocks as opposed to individual data points, thereby avoiding excessive pairwise point-to-point dominance checks. Furthermore, the S³J algorithm utilizes an early stopping condition in order to successfully compute the skyline results by accessing only a subset of the input tables. In addition to S²J and S³J, we also propose the S²J-M and S³J-M algorithms. These algorithms extend S²J's and S³J's two-way skyline-join ability to efficiently process skyline-join queries over more than two data sources. S²J-M and S³J-M leverage the extended concept of LR-pruning, called M-way LR-pruning, to compute multi-way skyline-joins in which more than two data sources are integrated during skyline processing. We report extensive experimental results that confirm the advantages of the proposed algorithms over state-of-the-art skyline-join techniques.

Original language	English (US)
Article number	10
Journal	ACM Transactions on Database Systems
Volume	40
Issue number	2
DOIs	https://doi.org/10.1145/2699483
State	Published - Jun 1 2015

Keywords

Algorithms
Design
Performance

ASJC Scopus subject areas

Information Systems

Access to Document

10.1145/2699483

Cite this

@article{7256d7c3686c40cb8565114ed102bed0,

title = "Efficient processing of skyline-join queries over multiple data sources",

abstract = "Efficient processing of skyline queries has been an area of growing interest. Many of the earlier skyline techniques assumed that the skyline query is applied to a single data table. Naturally, these algorithms were not suitable for many applications in which the skyline query may involve attributes belonging to multiple data sources. In other words, if the data used in the skyline query are stored in multiple tables, then join operations would be required before the skyline can be searched. The task of computing skylines on multiple data sources has been coined as the skyline-join problem and various skyline-join algorithms have been proposed. However, the current proposals suffer several drawbacks: they often need to scan the input tables exhaustively in order to obtain the set of skyline-join results; moreover, the pruning techniques employed to eliminate the tuples are largely based on expensive pairwise tuple-to-tuple comparisons. In this article, we aim to address these shortcomings by proposing two novel skyline-join algorithms, namely skyline-sensitive join (S2J) and symmetric skyline-sensitive join (S3J), to process skyline queries over two data sources. Our approaches compute the results using a novel layer/region pruning technique (LR-pruning) that prunes the join space in blocks as opposed to individual data points, thereby avoiding excessive pairwise point-to-point dominance checks. Furthermore, the S3J algorithm utilizes an early stopping condition in order to successfully compute the skyline results by accessing only a subset of the input tables. In addition to S2J and S3J, we also propose the S2J-M and S3J-M algorithms. These algorithms extend S2J's and S3J's two-way skyline-join ability to efficiently process skyline-join queries over more than two data sources. S2J-M and S3J-M leverage the extended concept of LR-pruning, called M-way LR-pruning, to compute multi-way skyline-joins in which more than two data sources are integrated during skyline processing. We report extensive experimental results that confirm the advantages of the proposed algorithms over state-of-the-art skyline-join techniques.",

keywords = "Algorithms, Design, Performance",

author = "Mithila Nagendra and Kasim Candan",

year = "2015",

month = jun,

day = "1",

doi = "10.1145/2699483",

language = "English (US)",

volume = "40",

journal = "ACM Transactions on Database Systems",

issn = "0362-5915",

publisher = "Association for Computing Machinery (ACM)",

number = "2",

}

TY - JOUR

T1 - Efficient processing of skyline-join queries over multiple data sources

AU - Nagendra, Mithila

AU - Candan, Kasim

PY - 2015/6/1

Y1 - 2015/6/1

N2 - Efficient processing of skyline queries has been an area of growing interest. Many of the earlier skyline techniques assumed that the skyline query is applied to a single data table. Naturally, these algorithms were not suitable for many applications in which the skyline query may involve attributes belonging to multiple data sources. In other words, if the data used in the skyline query are stored in multiple tables, then join operations would be required before the skyline can be searched. The task of computing skylines on multiple data sources has been coined as the skyline-join problem and various skyline-join algorithms have been proposed. However, the current proposals suffer several drawbacks: they often need to scan the input tables exhaustively in order to obtain the set of skyline-join results; moreover, the pruning techniques employed to eliminate the tuples are largely based on expensive pairwise tuple-to-tuple comparisons. In this article, we aim to address these shortcomings by proposing two novel skyline-join algorithms, namely skyline-sensitive join (S2J) and symmetric skyline-sensitive join (S3J), to process skyline queries over two data sources. Our approaches compute the results using a novel layer/region pruning technique (LR-pruning) that prunes the join space in blocks as opposed to individual data points, thereby avoiding excessive pairwise point-to-point dominance checks. Furthermore, the S3J algorithm utilizes an early stopping condition in order to successfully compute the skyline results by accessing only a subset of the input tables. In addition to S2J and S3J, we also propose the S2J-M and S3J-M algorithms. These algorithms extend S2J's and S3J's two-way skyline-join ability to efficiently process skyline-join queries over more than two data sources. S2J-M and S3J-M leverage the extended concept of LR-pruning, called M-way LR-pruning, to compute multi-way skyline-joins in which more than two data sources are integrated during skyline processing. We report extensive experimental results that confirm the advantages of the proposed algorithms over state-of-the-art skyline-join techniques.

AB - Efficient processing of skyline queries has been an area of growing interest. Many of the earlier skyline techniques assumed that the skyline query is applied to a single data table. Naturally, these algorithms were not suitable for many applications in which the skyline query may involve attributes belonging to multiple data sources. In other words, if the data used in the skyline query are stored in multiple tables, then join operations would be required before the skyline can be searched. The task of computing skylines on multiple data sources has been coined as the skyline-join problem and various skyline-join algorithms have been proposed. However, the current proposals suffer several drawbacks: they often need to scan the input tables exhaustively in order to obtain the set of skyline-join results; moreover, the pruning techniques employed to eliminate the tuples are largely based on expensive pairwise tuple-to-tuple comparisons. In this article, we aim to address these shortcomings by proposing two novel skyline-join algorithms, namely skyline-sensitive join (S2J) and symmetric skyline-sensitive join (S3J), to process skyline queries over two data sources. Our approaches compute the results using a novel layer/region pruning technique (LR-pruning) that prunes the join space in blocks as opposed to individual data points, thereby avoiding excessive pairwise point-to-point dominance checks. Furthermore, the S3J algorithm utilizes an early stopping condition in order to successfully compute the skyline results by accessing only a subset of the input tables. In addition to S2J and S3J, we also propose the S2J-M and S3J-M algorithms. These algorithms extend S2J's and S3J's two-way skyline-join ability to efficiently process skyline-join queries over more than two data sources. S2J-M and S3J-M leverage the extended concept of LR-pruning, called M-way LR-pruning, to compute multi-way skyline-joins in which more than two data sources are integrated during skyline processing. We report extensive experimental results that confirm the advantages of the proposed algorithms over state-of-the-art skyline-join techniques.

KW - Algorithms

KW - Design

KW - Performance

UR - http://www.scopus.com/inward/record.url?scp=84934768251&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84934768251&partnerID=8YFLogxK

U2 - 10.1145/2699483

DO - 10.1145/2699483

M3 - Article

AN - SCOPUS:84934768251

SN - 0362-5915

VL - 40

JO - ACM Transactions on Database Systems

JF - ACM Transactions on Database Systems

IS - 2

M1 - 10

ER -

Efficient processing of skyline-join queries over multiple data sources

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Cite this