Efficient hierarchical clustering algorithms using partially overlapping partitions

Manoranjan Dash, Huan Liu

Research output: Chapter in Book/Report/Conference proceedingConference contribution

4 Scopus citations

Abstract

Clustering is an important data exploration task. A promi- nent clustering algorithm is agglomerative hierarchical clustering. Roughly, in each iteration, it merges the closest pair of clusters. It was first proposed way back in 1951, and since then there have been numer- ous modifications. Some of its good features are: a natural, simple, and non-parametric grouping of similar objects which is capable of finding clusters of different shape such as spherical and arbitrary. But large CPU time and high memory requirement limit its use for large data. In this paper we show that geometric metric (centroid, median, and minimum variance) algorithms obey a 90-10 relationship where roughly the first 90iterations are spent on merging clusters with distance less than 10the maximum merging distance. This characteristic is exploited by partially overlapping partitioning. It is shown with experiments and analyses that different types of existing algorithms benefit excellently by drastically reducing CPU time and memory. Other contributions of this paper in- clude comparison study of multi-dimensional vis-a-vis single-dimensional partitioning, and analytical and experimental discussions on setting of parameters such as number of partitions and dimensions for partitioning.

Original languageEnglish (US)
Title of host publicationAdvances in Knowledge Discovery and Data Mining - 5th Pacific-Asia Conference, PAKDD 2001, Proceedings
EditorsDavid Cheung, Graham J. Williams, Qing Li
PublisherSpringer Verlag
Pages495-506
Number of pages12
ISBN (Print)3540419101, 9783540419105
DOIs
StatePublished - 2001
Event5th Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD 2001 - Kowloon, Hong Kong
Duration: Apr 16 2001Apr 18 2001

Publication series

NameLecture Notes in Artificial Intelligence (Subseries of Lecture Notes in Computer Science)
Volume2035
ISSN (Print)0302-9743

Other

Other5th Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD 2001
Country/TerritoryHong Kong
CityKowloon
Period4/16/014/18/01

ASJC Scopus subject areas

  • Theoretical Computer Science
  • General Computer Science

Fingerprint

Dive into the research topics of 'Efficient hierarchical clustering algorithms using partially overlapping partitions'. Together they form a unique fingerprint.

Cite this