TY - GEN
T1 - Identification of outliers through clustering and semi-supervised learning for all sky surveys
AU - Bhattacharyya, Sharmodeep
AU - Richards, Joseph W.
AU - Rice, John
AU - Starr, Dan L.
AU - Butler, Nathaniel R.
AU - Bloom, Joshua S.
PY - 2013/12/1
Y1 - 2013/12/1
N2 - Recently there has been a huge surge of data in astronomy, making outlier or novelty detection a crucial step in analyzing these data. Here, we introduce a clustering based semi-supervised approach for outlier detection. The training data, (X1,Y1), . . . , (Xn,Yn), where n = 1,542, comes from Hipparcos and Optical Gravitational Lensing Experiment (OGLE) surveys, with, Xi ∈ Rp (p = 64) as the features and Yi is a categorical variable having one of the 25 class labels. The set of 64 periodic and non-periodic features are extracted from the light curves. The test data, Z1, . . . ,Zm, where m = 11,375, is the test data, where, Zi ∈ Rp.We select these 11,375 low noise variable light sources for our analysis from a set of unlabeled light curves of ∼50,000 variable light sources from All Sky Automated Survey (ASAS). Our goal is to find outlier data points in the unlabeled data set whose labels can not be properly predicted by the information in the labeled data set. We propose a new hierarchical algorithm for outlier detection in this partially labeled setup based on clustering and semi-supervised learning.We apply our method to identify interesting sources in the ASAS data set, with the training data. We present the ASAS light curves of some of these interesting sources, and elaborate on the possible physical mechanisms driving their variability.
AB - Recently there has been a huge surge of data in astronomy, making outlier or novelty detection a crucial step in analyzing these data. Here, we introduce a clustering based semi-supervised approach for outlier detection. The training data, (X1,Y1), . . . , (Xn,Yn), where n = 1,542, comes from Hipparcos and Optical Gravitational Lensing Experiment (OGLE) surveys, with, Xi ∈ Rp (p = 64) as the features and Yi is a categorical variable having one of the 25 class labels. The set of 64 periodic and non-periodic features are extracted from the light curves. The test data, Z1, . . . ,Zm, where m = 11,375, is the test data, where, Zi ∈ Rp.We select these 11,375 low noise variable light sources for our analysis from a set of unlabeled light curves of ∼50,000 variable light sources from All Sky Automated Survey (ASAS). Our goal is to find outlier data points in the unlabeled data set whose labels can not be properly predicted by the information in the labeled data set. We propose a new hierarchical algorithm for outlier detection in this partially labeled setup based on clustering and semi-supervised learning.We apply our method to identify interesting sources in the ASAS data set, with the training data. We present the ASAS light curves of some of these interesting sources, and elaborate on the possible physical mechanisms driving their variability.
UR - http://www.scopus.com/inward/record.url?scp=84894350202&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84894350202&partnerID=8YFLogxK
U2 - 10.1007/978-1-4614-3520-4-46
DO - 10.1007/978-1-4614-3520-4-46
M3 - Conference contribution
AN - SCOPUS:84894350202
SN - 9781461449508
T3 - Information Systems Development: Reflections, Challenges and New Directions
SP - 483
EP - 485
BT - Information Systems Development
T2 - 20th International Conference on Information Systems Development: Reflections, Challenges and New Directions, ISD 2011
Y2 - 24 August 2011 through 26 August 2011
ER -