TY - GEN
T1 - Similarity group-by
AU - Silva, Yasin N.
AU - Aref, Walid G.
AU - Ali, Mohamed H.
PY - 2009
Y1 - 2009
N2 - Group-by is a core database operation that is used extensively in OLTP, OLAP, and decision support systems. In many application scenarios, it is required to group similar but not necessarily equal values. In this paper we propose a new SQL construct that supports similarity-based Group-by (SGB). SGB is not a new clustering algorithm, but rather is a practical and fast similarity grouping query operator that is compatible with other SQL operators and can be combined with them to answer similarity-based queries efficiently. In contrast to expensive clustering algorithms, the proposed similarity group-by operator maintains low execution times while still generating meaningful groupings that address many application needs. The paper presents a general definition of the similarity group-by operation and gives three instances of this definition. The paper also discusses how optimization techniques for the regular group-by can be extended to the case of SGB. The proposed operators are implemented inside PostgreSQL. The performance study shows that the proposed similarity-based group-by operators have good scalability properties with at most only 25% increase in execution time over the regular group-by.
AB - Group-by is a core database operation that is used extensively in OLTP, OLAP, and decision support systems. In many application scenarios, it is required to group similar but not necessarily equal values. In this paper we propose a new SQL construct that supports similarity-based Group-by (SGB). SGB is not a new clustering algorithm, but rather is a practical and fast similarity grouping query operator that is compatible with other SQL operators and can be combined with them to answer similarity-based queries efficiently. In contrast to expensive clustering algorithms, the proposed similarity group-by operator maintains low execution times while still generating meaningful groupings that address many application needs. The paper presents a general definition of the similarity group-by operation and gives three instances of this definition. The paper also discusses how optimization techniques for the regular group-by can be extended to the case of SGB. The proposed operators are implemented inside PostgreSQL. The performance study shows that the proposed similarity-based group-by operators have good scalability properties with at most only 25% increase in execution time over the regular group-by.
UR - http://www.scopus.com/inward/record.url?scp=67649641445&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=67649641445&partnerID=8YFLogxK
U2 - 10.1109/ICDE.2009.113
DO - 10.1109/ICDE.2009.113
M3 - Conference contribution
AN - SCOPUS:67649641445
SN - 9780769535456
T3 - Proceedings - International Conference on Data Engineering
SP - 904
EP - 915
BT - Proceedings - 25th IEEE International Conference on Data Engineering, ICDE 2009
T2 - 25th IEEE International Conference on Data Engineering, ICDE 2009
Y2 - 29 March 2009 through 2 April 2009
ER -