Jointly clustering rows and columns of binary matrices: Algorithms and trade-offs

Jiaming Xu, Rui Wu, Kai Zhu, Bruce Hajek, R. Srikant, Lei Ying

Research output: Chapter in Book/Report/Conference proceedingConference contribution

9 Citations (Scopus)

Abstract

In standard clustering problems, data points are represented by vectors, and by stacking them together, one forms a data matrix with row or column cluster structure. In this paper, we consider a class of binary matrices, arising in many applications, which exhibit both row and column cluster structure, and our goal is to exactly recover the underlying row and column clusters by observing only a small fraction of noisy entries. We first derive a lower bound on the minimum number of observations needed for exact cluster recovery. Then, we study three algorithms with different running time and compare the number of observations needed by them for successful cluster recovery. Our analytical results show smooth time-data trade-offs: one can gradually reduce the computational complexity when increasingly more observations are available.

Original languageEnglish (US)
Title of host publicationPerformance Evaluation Review
PublisherAssociation for Computing Machinery
Pages29-41
Number of pages13
Volume42
Edition1
DOIs
StatePublished - Jun 20 2014
EventACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems, SIGMETRICS 2014 - Austin, United States
Duration: Jun 16 2014Jun 20 2014

Other

OtherACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems, SIGMETRICS 2014
CountryUnited States
CityAustin
Period6/16/146/20/14

Fingerprint

Recovery
Computational complexity

Keywords

  • Clustering
  • Low-rank matrix recovery
  • Spectral method

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Hardware and Architecture
  • Software

Cite this

Xu, J., Wu, R., Zhu, K., Hajek, B., Srikant, R., & Ying, L. (2014). Jointly clustering rows and columns of binary matrices: Algorithms and trade-offs. In Performance Evaluation Review (1 ed., Vol. 42, pp. 29-41). Association for Computing Machinery. https://doi.org/10.1145/2591971.2592005

Jointly clustering rows and columns of binary matrices : Algorithms and trade-offs. / Xu, Jiaming; Wu, Rui; Zhu, Kai; Hajek, Bruce; Srikant, R.; Ying, Lei.

Performance Evaluation Review. Vol. 42 1. ed. Association for Computing Machinery, 2014. p. 29-41.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Xu, J, Wu, R, Zhu, K, Hajek, B, Srikant, R & Ying, L 2014, Jointly clustering rows and columns of binary matrices: Algorithms and trade-offs. in Performance Evaluation Review. 1 edn, vol. 42, Association for Computing Machinery, pp. 29-41, ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems, SIGMETRICS 2014, Austin, United States, 6/16/14. https://doi.org/10.1145/2591971.2592005
Xu J, Wu R, Zhu K, Hajek B, Srikant R, Ying L. Jointly clustering rows and columns of binary matrices: Algorithms and trade-offs. In Performance Evaluation Review. 1 ed. Vol. 42. Association for Computing Machinery. 2014. p. 29-41 https://doi.org/10.1145/2591971.2592005
Xu, Jiaming ; Wu, Rui ; Zhu, Kai ; Hajek, Bruce ; Srikant, R. ; Ying, Lei. / Jointly clustering rows and columns of binary matrices : Algorithms and trade-offs. Performance Evaluation Review. Vol. 42 1. ed. Association for Computing Machinery, 2014. pp. 29-41
@inproceedings{051039b4576246ef9dffe10883711930,
title = "Jointly clustering rows and columns of binary matrices: Algorithms and trade-offs",
abstract = "In standard clustering problems, data points are represented by vectors, and by stacking them together, one forms a data matrix with row or column cluster structure. In this paper, we consider a class of binary matrices, arising in many applications, which exhibit both row and column cluster structure, and our goal is to exactly recover the underlying row and column clusters by observing only a small fraction of noisy entries. We first derive a lower bound on the minimum number of observations needed for exact cluster recovery. Then, we study three algorithms with different running time and compare the number of observations needed by them for successful cluster recovery. Our analytical results show smooth time-data trade-offs: one can gradually reduce the computational complexity when increasingly more observations are available.",
keywords = "Clustering, Low-rank matrix recovery, Spectral method",
author = "Jiaming Xu and Rui Wu and Kai Zhu and Bruce Hajek and R. Srikant and Lei Ying",
year = "2014",
month = "6",
day = "20",
doi = "10.1145/2591971.2592005",
language = "English (US)",
volume = "42",
pages = "29--41",
booktitle = "Performance Evaluation Review",
publisher = "Association for Computing Machinery",
edition = "1",

}

TY - GEN

T1 - Jointly clustering rows and columns of binary matrices

T2 - Algorithms and trade-offs

AU - Xu, Jiaming

AU - Wu, Rui

AU - Zhu, Kai

AU - Hajek, Bruce

AU - Srikant, R.

AU - Ying, Lei

PY - 2014/6/20

Y1 - 2014/6/20

N2 - In standard clustering problems, data points are represented by vectors, and by stacking them together, one forms a data matrix with row or column cluster structure. In this paper, we consider a class of binary matrices, arising in many applications, which exhibit both row and column cluster structure, and our goal is to exactly recover the underlying row and column clusters by observing only a small fraction of noisy entries. We first derive a lower bound on the minimum number of observations needed for exact cluster recovery. Then, we study three algorithms with different running time and compare the number of observations needed by them for successful cluster recovery. Our analytical results show smooth time-data trade-offs: one can gradually reduce the computational complexity when increasingly more observations are available.

AB - In standard clustering problems, data points are represented by vectors, and by stacking them together, one forms a data matrix with row or column cluster structure. In this paper, we consider a class of binary matrices, arising in many applications, which exhibit both row and column cluster structure, and our goal is to exactly recover the underlying row and column clusters by observing only a small fraction of noisy entries. We first derive a lower bound on the minimum number of observations needed for exact cluster recovery. Then, we study three algorithms with different running time and compare the number of observations needed by them for successful cluster recovery. Our analytical results show smooth time-data trade-offs: one can gradually reduce the computational complexity when increasingly more observations are available.

KW - Clustering

KW - Low-rank matrix recovery

KW - Spectral method

UR - http://www.scopus.com/inward/record.url?scp=84955607655&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84955607655&partnerID=8YFLogxK

U2 - 10.1145/2591971.2592005

DO - 10.1145/2591971.2592005

M3 - Conference contribution

AN - SCOPUS:84904346897

VL - 42

SP - 29

EP - 41

BT - Performance Evaluation Review

PB - Association for Computing Machinery

ER -