Similarity group-by

Yasin Silva, Walid G. Aref, Mohamed H. Ali

Research output: Chapter in Book/Report/Conference proceedingConference contribution

21 Citations (Scopus)

Abstract

Group-by is a core database operation that is used extensively in OLTP, OLAP, and decision support systems. In many application scenarios, it is required to group similar but not necessarily equal values. In this paper we propose a new SQL construct that supports similarity-based Group-by (SGB). SGB is not a new clustering algorithm, but rather is a practical and fast similarity grouping query operator that is compatible with other SQL operators and can be combined with them to answer similarity-based queries efficiently. In contrast to expensive clustering algorithms, the proposed similarity group-by operator maintains low execution times while still generating meaningful groupings that address many application needs. The paper presents a general definition of the similarity group-by operation and gives three instances of this definition. The paper also discusses how optimization techniques for the regular group-by can be extended to the case of SGB. The proposed operators are implemented inside PostgreSQL. The performance study shows that the proposed similarity-based group-by operators have good scalability properties with at most only 25% increase in execution time over the regular group-by.

Original languageEnglish (US)
Title of host publicationProceedings - International Conference on Data Engineering
Pages904-915
Number of pages12
DOIs
StatePublished - 2009
Externally publishedYes
Event25th IEEE International Conference on Data Engineering, ICDE 2009 - Shanghai, China
Duration: Mar 29 2009Apr 2 2009

Other

Other25th IEEE International Conference on Data Engineering, ICDE 2009
CountryChina
CityShanghai
Period3/29/094/2/09

Fingerprint

Clustering algorithms
Decision support systems
Mathematical operators
Scalability

ASJC Scopus subject areas

  • Information Systems
  • Signal Processing
  • Software

Cite this

Silva, Y., Aref, W. G., & Ali, M. H. (2009). Similarity group-by. In Proceedings - International Conference on Data Engineering (pp. 904-915). [4812464] https://doi.org/10.1109/ICDE.2009.113

Similarity group-by. / Silva, Yasin; Aref, Walid G.; Ali, Mohamed H.

Proceedings - International Conference on Data Engineering. 2009. p. 904-915 4812464.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Silva, Y, Aref, WG & Ali, MH 2009, Similarity group-by. in Proceedings - International Conference on Data Engineering., 4812464, pp. 904-915, 25th IEEE International Conference on Data Engineering, ICDE 2009, Shanghai, China, 3/29/09. https://doi.org/10.1109/ICDE.2009.113
Silva Y, Aref WG, Ali MH. Similarity group-by. In Proceedings - International Conference on Data Engineering. 2009. p. 904-915. 4812464 https://doi.org/10.1109/ICDE.2009.113
Silva, Yasin ; Aref, Walid G. ; Ali, Mohamed H. / Similarity group-by. Proceedings - International Conference on Data Engineering. 2009. pp. 904-915
@inproceedings{eb5fd22c57cb4a7990f53e90b70cce31,
title = "Similarity group-by",
abstract = "Group-by is a core database operation that is used extensively in OLTP, OLAP, and decision support systems. In many application scenarios, it is required to group similar but not necessarily equal values. In this paper we propose a new SQL construct that supports similarity-based Group-by (SGB). SGB is not a new clustering algorithm, but rather is a practical and fast similarity grouping query operator that is compatible with other SQL operators and can be combined with them to answer similarity-based queries efficiently. In contrast to expensive clustering algorithms, the proposed similarity group-by operator maintains low execution times while still generating meaningful groupings that address many application needs. The paper presents a general definition of the similarity group-by operation and gives three instances of this definition. The paper also discusses how optimization techniques for the regular group-by can be extended to the case of SGB. The proposed operators are implemented inside PostgreSQL. The performance study shows that the proposed similarity-based group-by operators have good scalability properties with at most only 25{\%} increase in execution time over the regular group-by.",
author = "Yasin Silva and Aref, {Walid G.} and Ali, {Mohamed H.}",
year = "2009",
doi = "10.1109/ICDE.2009.113",
language = "English (US)",
isbn = "9780769535456",
pages = "904--915",
booktitle = "Proceedings - International Conference on Data Engineering",

}

TY - GEN

T1 - Similarity group-by

AU - Silva, Yasin

AU - Aref, Walid G.

AU - Ali, Mohamed H.

PY - 2009

Y1 - 2009

N2 - Group-by is a core database operation that is used extensively in OLTP, OLAP, and decision support systems. In many application scenarios, it is required to group similar but not necessarily equal values. In this paper we propose a new SQL construct that supports similarity-based Group-by (SGB). SGB is not a new clustering algorithm, but rather is a practical and fast similarity grouping query operator that is compatible with other SQL operators and can be combined with them to answer similarity-based queries efficiently. In contrast to expensive clustering algorithms, the proposed similarity group-by operator maintains low execution times while still generating meaningful groupings that address many application needs. The paper presents a general definition of the similarity group-by operation and gives three instances of this definition. The paper also discusses how optimization techniques for the regular group-by can be extended to the case of SGB. The proposed operators are implemented inside PostgreSQL. The performance study shows that the proposed similarity-based group-by operators have good scalability properties with at most only 25% increase in execution time over the regular group-by.

AB - Group-by is a core database operation that is used extensively in OLTP, OLAP, and decision support systems. In many application scenarios, it is required to group similar but not necessarily equal values. In this paper we propose a new SQL construct that supports similarity-based Group-by (SGB). SGB is not a new clustering algorithm, but rather is a practical and fast similarity grouping query operator that is compatible with other SQL operators and can be combined with them to answer similarity-based queries efficiently. In contrast to expensive clustering algorithms, the proposed similarity group-by operator maintains low execution times while still generating meaningful groupings that address many application needs. The paper presents a general definition of the similarity group-by operation and gives three instances of this definition. The paper also discusses how optimization techniques for the regular group-by can be extended to the case of SGB. The proposed operators are implemented inside PostgreSQL. The performance study shows that the proposed similarity-based group-by operators have good scalability properties with at most only 25% increase in execution time over the regular group-by.

UR - http://www.scopus.com/inward/record.url?scp=67649641445&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=67649641445&partnerID=8YFLogxK

U2 - 10.1109/ICDE.2009.113

DO - 10.1109/ICDE.2009.113

M3 - Conference contribution

SN - 9780769535456

SP - 904

EP - 915

BT - Proceedings - International Conference on Data Engineering

ER -