TY - JOUR
T1 - Optimizing kernel machines using deep learning
AU - Song, Huan
AU - Thiagarajan, Jayaraman J.
AU - Sattigeri, Prasanna
AU - Spanias, Andreas
N1 - Funding Information:
Manuscript received June 24, 2017; revised November 14, 2017; accepted January 30, 2018. Date of publication March 6, 2018; date of current version October 16, 2018. This work was supported in part by the SenSIP Center at Arizona State University and in part by the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344. (Corresponding author: Huan Song.) H. Song and A. Spanias are with the SenSIP Center, School of Electrical, Computer and Energy Engineering, Arizona State University, Tempe, AZ 85287 USA (e-mail: huan.song@asu.edu; spanias@asu.edu).
Publisher Copyright:
© 2018 IEEE.
PY - 2018/11
Y1 - 2018/11
N2 - Building highly nonlinear and nonparametric models is central to several state-of-the-art machine learning systems. Kernel methods form an important class of techniques that induce a reproducing kernel Hilbert space (RKHS) for inferring non-linear models through the construction of similarity functions from data. These methods are particularly preferred in cases where the training data sizes are limited and when prior knowledge of the data similarities is available. Despite their usefulness, they are limited by the computational complexity and their inability to support end-to-end learning with a task-specific objective. On the other hand, deep neural networks have become the de facto solution for end-to-end inference in several learning paradigms. In this paper, we explore the idea of using deep architectures to perform kernel machine optimization, for both computational efficiency and end-to-end inferencing. To this end, we develop the deep kernel machine optimization framework, that creates an ensemble of dense embeddings using Nyström kernel approximations and utilizes deep learning to generate task-specific representations through the fusion of the embeddings. Intuitively, the filters of the network are trained to fuse information from an ensemble of linear subspaces in the RKHS. Furthermore, we introduce the kernel dropout regularization to enable improved training convergence. Finally, we extend this framework to the multiple kernel case, by coupling a global fusion layer with pretrained deep kernel machines for each of the constituent kernels. Using case studies with limited training data, and lack of explicit feature sources, we demonstrate the effectiveness of our framework over conventional model inferencing techniques.
AB - Building highly nonlinear and nonparametric models is central to several state-of-the-art machine learning systems. Kernel methods form an important class of techniques that induce a reproducing kernel Hilbert space (RKHS) for inferring non-linear models through the construction of similarity functions from data. These methods are particularly preferred in cases where the training data sizes are limited and when prior knowledge of the data similarities is available. Despite their usefulness, they are limited by the computational complexity and their inability to support end-to-end learning with a task-specific objective. On the other hand, deep neural networks have become the de facto solution for end-to-end inference in several learning paradigms. In this paper, we explore the idea of using deep architectures to perform kernel machine optimization, for both computational efficiency and end-to-end inferencing. To this end, we develop the deep kernel machine optimization framework, that creates an ensemble of dense embeddings using Nyström kernel approximations and utilizes deep learning to generate task-specific representations through the fusion of the embeddings. Intuitively, the filters of the network are trained to fuse information from an ensemble of linear subspaces in the RKHS. Furthermore, we introduce the kernel dropout regularization to enable improved training convergence. Finally, we extend this framework to the multiple kernel case, by coupling a global fusion layer with pretrained deep kernel machines for each of the constituent kernels. Using case studies with limited training data, and lack of explicit feature sources, we demonstrate the effectiveness of our framework over conventional model inferencing techniques.
KW - Deep neural networks (DNNs)
KW - Nyström approximation
KW - kernel methods
KW - multiple kernel learning (MKL)
UR - http://www.scopus.com/inward/record.url?scp=85043391928&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85043391928&partnerID=8YFLogxK
U2 - 10.1109/TNNLS.2018.2804895
DO - 10.1109/TNNLS.2018.2804895
M3 - Article
C2 - 29993616
AN - SCOPUS:85043391928
VL - 29
SP - 5528
EP - 5540
JO - IEEE Transactions on Neural Networks
JF - IEEE Transactions on Neural Networks
SN - 2162-237X
IS - 11
M1 - 8307246
ER -