Nonlinear adaptive distance metric learning for clustering

Jianhui Chen; Zheng Zhao; Jieping Ye; Huan Liu

doi:10.1145/1281192.1281209

Nonlinear adaptive distance metric learning for clustering

Jianhui Chen, Zheng Zhao, Jieping Ye, Huan Liu

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

61 Scopus citations

Abstract

A good distance metric is crucial for many data mining tasks. To learn a metric in the unsupervised setting, most metric learning algorithms project observed data to a low-dimensional manifold, where geometric relationships such as pairwise distances are preserved. It can be extended to the nonlinear case by applying the kernel trick, which embeds the data into a feature space by specifying the kernel function that computes the dot products between data points in the feature space. In this paper, we propose a novel unsupervised Nonlinear Adaptive Metric Learning algorithm, called NAML, which performs clustering and distance metric learning simultaneously. NAML firstmaps the data to a high-dimensional space through a kernel function; then applies a linear projection to find a low-dimensional manifold where the separability of the data is maximized; and finally performs clustering in the low-dimensional space. The performance of NAML depends on the selection of the kernel function and the projection. We show that the joint kernel learning, dimensionality reduction, and clustering can be formulated as a trace maximization problem, which can be solved via an iterative procedure in the EM framework. Experimental results demonstrated the efficacy of the proposed algorithm.

Original language	English (US)
Title of host publication	KDD-2007
Subtitle of host publication	Proceedings of the Thirteenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
Pages	123-132
Number of pages	10
DOIs	https://doi.org/10.1145/1281192.1281209
State	Published - 2007
Event	KDD-2007: 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining - San Jose, CA, United States Duration: Aug 12 2007 → Aug 15 2007

Publication series

Name	Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

Other

Other	KDD-2007: 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
Country/Territory	United States
City	San Jose, CA
Period	8/12/07 → 8/15/07

Keywords

Clustering
Convex programming
Distance metric
Kernel

ASJC Scopus subject areas

Software
Information Systems

Access to Document

10.1145/1281192.1281209

Cite this

Chen, J., Zhao, Z., Ye, J., & Liu, H. (2007). Nonlinear adaptive distance metric learning for clustering. In KDD-2007: Proceedings of the Thirteenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 123-132). (Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining). https://doi.org/10.1145/1281192.1281209

Nonlinear adaptive distance metric learning for clustering. / Chen, Jianhui; Zhao, Zheng; Ye, Jieping et al.
KDD-2007: Proceedings of the Thirteenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2007. p. 123-132 (Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Chen, J, Zhao, Z, Ye, J & Liu, H 2007, Nonlinear adaptive distance metric learning for clustering. in KDD-2007: Proceedings of the Thirteenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 123-132, KDD-2007: 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Jose, CA, United States, 8/12/07. https://doi.org/10.1145/1281192.1281209

@inproceedings{33500b2f1c4c4145a53450a42bffcb51,

title = "Nonlinear adaptive distance metric learning for clustering",

abstract = "A good distance metric is crucial for many data mining tasks. To learn a metric in the unsupervised setting, most metric learning algorithms project observed data to a low-dimensional manifold, where geometric relationships such as pairwise distances are preserved. It can be extended to the nonlinear case by applying the kernel trick, which embeds the data into a feature space by specifying the kernel function that computes the dot products between data points in the feature space. In this paper, we propose a novel unsupervised Nonlinear Adaptive Metric Learning algorithm, called NAML, which performs clustering and distance metric learning simultaneously. NAML firstmaps the data to a high-dimensional space through a kernel function; then applies a linear projection to find a low-dimensional manifold where the separability of the data is maximized; and finally performs clustering in the low-dimensional space. The performance of NAML depends on the selection of the kernel function and the projection. We show that the joint kernel learning, dimensionality reduction, and clustering can be formulated as a trace maximization problem, which can be solved via an iterative procedure in the EM framework. Experimental results demonstrated the efficacy of the proposed algorithm.",

keywords = "Clustering, Convex programming, Distance metric, Kernel",

author = "Jianhui Chen and Zheng Zhao and Jieping Ye and Huan Liu",

note = "Copyright: Copyright 2011 Elsevier B.V., All rights reserved.; KDD-2007: 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining ; Conference date: 12-08-2007 Through 15-08-2007",

year = "2007",

doi = "10.1145/1281192.1281209",

language = "English (US)",

isbn = "1595936092",

series = "Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining",

pages = "123--132",

booktitle = "KDD-2007",

}

TY - GEN

T1 - Nonlinear adaptive distance metric learning for clustering

AU - Chen, Jianhui

AU - Zhao, Zheng

AU - Ye, Jieping

AU - Liu, Huan

PY - 2007

Y1 - 2007

N2 - A good distance metric is crucial for many data mining tasks. To learn a metric in the unsupervised setting, most metric learning algorithms project observed data to a low-dimensional manifold, where geometric relationships such as pairwise distances are preserved. It can be extended to the nonlinear case by applying the kernel trick, which embeds the data into a feature space by specifying the kernel function that computes the dot products between data points in the feature space. In this paper, we propose a novel unsupervised Nonlinear Adaptive Metric Learning algorithm, called NAML, which performs clustering and distance metric learning simultaneously. NAML firstmaps the data to a high-dimensional space through a kernel function; then applies a linear projection to find a low-dimensional manifold where the separability of the data is maximized; and finally performs clustering in the low-dimensional space. The performance of NAML depends on the selection of the kernel function and the projection. We show that the joint kernel learning, dimensionality reduction, and clustering can be formulated as a trace maximization problem, which can be solved via an iterative procedure in the EM framework. Experimental results demonstrated the efficacy of the proposed algorithm.

AB - A good distance metric is crucial for many data mining tasks. To learn a metric in the unsupervised setting, most metric learning algorithms project observed data to a low-dimensional manifold, where geometric relationships such as pairwise distances are preserved. It can be extended to the nonlinear case by applying the kernel trick, which embeds the data into a feature space by specifying the kernel function that computes the dot products between data points in the feature space. In this paper, we propose a novel unsupervised Nonlinear Adaptive Metric Learning algorithm, called NAML, which performs clustering and distance metric learning simultaneously. NAML firstmaps the data to a high-dimensional space through a kernel function; then applies a linear projection to find a low-dimensional manifold where the separability of the data is maximized; and finally performs clustering in the low-dimensional space. The performance of NAML depends on the selection of the kernel function and the projection. We show that the joint kernel learning, dimensionality reduction, and clustering can be formulated as a trace maximization problem, which can be solved via an iterative procedure in the EM framework. Experimental results demonstrated the efficacy of the proposed algorithm.

KW - Clustering

KW - Convex programming

KW - Distance metric

KW - Kernel

UR - http://www.scopus.com/inward/record.url?scp=36849021609&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=36849021609&partnerID=8YFLogxK

U2 - 10.1145/1281192.1281209

DO - 10.1145/1281192.1281209

M3 - Conference contribution

AN - SCOPUS:36849021609

SN - 1595936092

SN - 9781595936097

T3 - Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

SP - 123

EP - 132

BT - KDD-2007

T2 - KDD-2007: 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

Y2 - 12 August 2007 through 15 August 2007

ER -

Nonlinear adaptive distance metric learning for clustering

Abstract

Publication series

Other

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this