Efficient yet accurate clustering

Manoranjan Dash, Kian Lee Tan, Huan Liu

Research output: Chapter in Book/Report/Conference proceedingConference contribution

4 Scopus citations

Abstract

In this paper we show that most hierarchical agglomerative clustering (HAC) algorithms follow a 90-10 rule where roughly 90% iterations from the beginning merge cluster pairs with dissimilarity less than 10% of the maximum dissimilarity. We propose two algorithms - 2-phase and nested - based on partially overlapping partitioning (POP). To handle high-dimensional data eficiently, we propose a tree structure particularly suitable for POP. Extensive experiments show that the proposed algorithms reduce the time and memory requirement of existing HAC algorithms significantly without compromising in accuracy.

Original languageEnglish (US)
Title of host publicationProceedings - 2001 IEEE International Conference on Data Mining, ICDM'01
Pages99-106
Number of pages8
StatePublished - Dec 1 2001
Event1st IEEE International Conference on Data Mining, ICDM'01 - San Jose, CA, United States
Duration: Nov 29 2001Dec 2 2001

Publication series

NameProceedings - IEEE International Conference on Data Mining, ICDM
ISSN (Print)1550-4786

Other

Other1st IEEE International Conference on Data Mining, ICDM'01
CountryUnited States
CitySan Jose, CA
Period11/29/0112/2/01

ASJC Scopus subject areas

  • Engineering(all)

Fingerprint Dive into the research topics of 'Efficient yet accurate clustering'. Together they form a unique fingerprint.

  • Cite this

    Dash, M., Tan, K. L., & Liu, H. (2001). Efficient yet accurate clustering. In Proceedings - 2001 IEEE International Conference on Data Mining, ICDM'01 (pp. 99-106). (Proceedings - IEEE International Conference on Data Mining, ICDM).