Similarity group-by

Yasin N. Silva; Walid G. Aref; Mohamed H. Ali

doi:10.1109/ICDE.2009.113

Similarity group-by

Yasin N. Silva, Walid G. Aref, Mohamed H. Ali

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

23 Scopus citations

Abstract

Group-by is a core database operation that is used extensively in OLTP, OLAP, and decision support systems. In many application scenarios, it is required to group similar but not necessarily equal values. In this paper we propose a new SQL construct that supports similarity-based Group-by (SGB). SGB is not a new clustering algorithm, but rather is a practical and fast similarity grouping query operator that is compatible with other SQL operators and can be combined with them to answer similarity-based queries efficiently. In contrast to expensive clustering algorithms, the proposed similarity group-by operator maintains low execution times while still generating meaningful groupings that address many application needs. The paper presents a general definition of the similarity group-by operation and gives three instances of this definition. The paper also discusses how optimization techniques for the regular group-by can be extended to the case of SGB. The proposed operators are implemented inside PostgreSQL. The performance study shows that the proposed similarity-based group-by operators have good scalability properties with at most only 25% increase in execution time over the regular group-by.

Original language	English (US)
Title of host publication	Proceedings - 25th IEEE International Conference on Data Engineering, ICDE 2009
Pages	904-915
Number of pages	12
DOIs	https://doi.org/10.1109/ICDE.2009.113
State	Published - 2009
Externally published	Yes
Event	25th IEEE International Conference on Data Engineering, ICDE 2009 - Shanghai, China Duration: Mar 29 2009 → Apr 2 2009

Publication series

Name	Proceedings - International Conference on Data Engineering
ISSN (Print)	1084-4627

Other

Other	25th IEEE International Conference on Data Engineering, ICDE 2009
Country/Territory	China
City	Shanghai
Period	3/29/09 → 4/2/09

ASJC Scopus subject areas

Software
Signal Processing
Information Systems

Access to Document

10.1109/ICDE.2009.113

Cite this

@inproceedings{eb5fd22c57cb4a7990f53e90b70cce31,

title = "Similarity group-by",

abstract = "Group-by is a core database operation that is used extensively in OLTP, OLAP, and decision support systems. In many application scenarios, it is required to group similar but not necessarily equal values. In this paper we propose a new SQL construct that supports similarity-based Group-by (SGB). SGB is not a new clustering algorithm, but rather is a practical and fast similarity grouping query operator that is compatible with other SQL operators and can be combined with them to answer similarity-based queries efficiently. In contrast to expensive clustering algorithms, the proposed similarity group-by operator maintains low execution times while still generating meaningful groupings that address many application needs. The paper presents a general definition of the similarity group-by operation and gives three instances of this definition. The paper also discusses how optimization techniques for the regular group-by can be extended to the case of SGB. The proposed operators are implemented inside PostgreSQL. The performance study shows that the proposed similarity-based group-by operators have good scalability properties with at most only 25% increase in execution time over the regular group-by.",

author = "Silva, {Yasin N.} and Aref, {Walid G.} and Ali, {Mohamed H.}",

year = "2009",

doi = "10.1109/ICDE.2009.113",

language = "English (US)",

isbn = "9780769535456",

series = "Proceedings - International Conference on Data Engineering",

pages = "904--915",

booktitle = "Proceedings - 25th IEEE International Conference on Data Engineering, ICDE 2009",

note = "25th IEEE International Conference on Data Engineering, ICDE 2009 ; Conference date: 29-03-2009 Through 02-04-2009",

}

TY - GEN

T1 - Similarity group-by

AU - Silva, Yasin N.

AU - Aref, Walid G.

AU - Ali, Mohamed H.

PY - 2009

Y1 - 2009

N2 - Group-by is a core database operation that is used extensively in OLTP, OLAP, and decision support systems. In many application scenarios, it is required to group similar but not necessarily equal values. In this paper we propose a new SQL construct that supports similarity-based Group-by (SGB). SGB is not a new clustering algorithm, but rather is a practical and fast similarity grouping query operator that is compatible with other SQL operators and can be combined with them to answer similarity-based queries efficiently. In contrast to expensive clustering algorithms, the proposed similarity group-by operator maintains low execution times while still generating meaningful groupings that address many application needs. The paper presents a general definition of the similarity group-by operation and gives three instances of this definition. The paper also discusses how optimization techniques for the regular group-by can be extended to the case of SGB. The proposed operators are implemented inside PostgreSQL. The performance study shows that the proposed similarity-based group-by operators have good scalability properties with at most only 25% increase in execution time over the regular group-by.

AB - Group-by is a core database operation that is used extensively in OLTP, OLAP, and decision support systems. In many application scenarios, it is required to group similar but not necessarily equal values. In this paper we propose a new SQL construct that supports similarity-based Group-by (SGB). SGB is not a new clustering algorithm, but rather is a practical and fast similarity grouping query operator that is compatible with other SQL operators and can be combined with them to answer similarity-based queries efficiently. In contrast to expensive clustering algorithms, the proposed similarity group-by operator maintains low execution times while still generating meaningful groupings that address many application needs. The paper presents a general definition of the similarity group-by operation and gives three instances of this definition. The paper also discusses how optimization techniques for the regular group-by can be extended to the case of SGB. The proposed operators are implemented inside PostgreSQL. The performance study shows that the proposed similarity-based group-by operators have good scalability properties with at most only 25% increase in execution time over the regular group-by.

UR - http://www.scopus.com/inward/record.url?scp=67649641445&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=67649641445&partnerID=8YFLogxK

U2 - 10.1109/ICDE.2009.113

DO - 10.1109/ICDE.2009.113

M3 - Conference contribution

AN - SCOPUS:67649641445

SN - 9780769535456

T3 - Proceedings - International Conference on Data Engineering

SP - 904

EP - 915

BT - Proceedings - 25th IEEE International Conference on Data Engineering, ICDE 2009

T2 - 25th IEEE International Conference on Data Engineering, ICDE 2009

Y2 - 29 March 2009 through 2 April 2009

ER -

Similarity group-by

Abstract

Publication series

Other

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this