TY - GEN
T1 - MI2LS
T2 - 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2013
AU - Zhang, Dan
AU - He, Jingrui
AU - Lawrence, Richard D.
N1 - Publisher Copyright:
Copyright © 2013 ACM.
PY - 2013/8/11
Y1 - 2013/8/11
N2 - In Multiple Instance Learning (MIL), each entity is normally expressed as a set of instances. Most of the current MIL methods only deal with the case when each instance is represented by one type of features. However, in many real world applications, entities are often described from several different information sources/views. For example, when applying MIL to image categorization, the characteristics of each image can be derived from both its RGB features and SIFT features. Previous research work has shown that, in traditional learning methods, leveraging the consistencies between different information sources could improve the classification performance drastically. Out of a similar motivation, to incorporate the consistencies between different information sources into MIL, we propose a novel research framework - Multi-Instance Learning from Multiple Information Sources (MI2LS). Based on this framework, an algorithm - Fast MI2LS (FMI2LS) is designed, which combines Constraint Concave-Convex Programming (CCCP) method and an adapted Stoachastic Gradient Descent (SGD) method. Some theoretical analysis on the optimality of the adapted SGD method and the generalized error bound of the formulation are given based on the proposed method. Experimental results on document classification and a novel application - Insider Threat Detection (ITD), clearly demonstrate the superior performance of the proposed method over state-of-The-Art MIL methods.
AB - In Multiple Instance Learning (MIL), each entity is normally expressed as a set of instances. Most of the current MIL methods only deal with the case when each instance is represented by one type of features. However, in many real world applications, entities are often described from several different information sources/views. For example, when applying MIL to image categorization, the characteristics of each image can be derived from both its RGB features and SIFT features. Previous research work has shown that, in traditional learning methods, leveraging the consistencies between different information sources could improve the classification performance drastically. Out of a similar motivation, to incorporate the consistencies between different information sources into MIL, we propose a novel research framework - Multi-Instance Learning from Multiple Information Sources (MI2LS). Based on this framework, an algorithm - Fast MI2LS (FMI2LS) is designed, which combines Constraint Concave-Convex Programming (CCCP) method and an adapted Stoachastic Gradient Descent (SGD) method. Some theoretical analysis on the optimality of the adapted SGD method and the generalized error bound of the formulation are given based on the proposed method. Experimental results on document classification and a novel application - Insider Threat Detection (ITD), clearly demonstrate the superior performance of the proposed method over state-of-The-Art MIL methods.
KW - Multi-instance learning
KW - Multi-view learning
KW - Stoachastic gradient descent
UR - http://www.scopus.com/inward/record.url?scp=85014560717&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85014560717&partnerID=8YFLogxK
U2 - 10.1145/2487575.2487651
DO - 10.1145/2487575.2487651
M3 - Conference contribution
AN - SCOPUS:85014560717
T3 - Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
SP - 149
EP - 157
BT - KDD 2013 - 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
A2 - Parekh, Rajesh
A2 - He, Jingrui
A2 - Inderjit, Dhillon S.
A2 - Bradley, Paul
A2 - Koren, Yehuda
A2 - Ghani, Rayid
A2 - Senator, Ted E.
A2 - Grossman, Robert L.
A2 - Uthurusamy, Ramasamy
PB - Association for Computing Machinery
Y2 - 11 August 2013 through 14 August 2013
ER -