Identification of outliers through clustering and semi-supervised learning for all sky surveys

Sharmodeep Bhattacharyya; Joseph W. Richards; John Rice; Dan L. Starr; Nathaniel R. Butler; Joshua S. Bloom

doi:10.1007/978-1-4614-3520-4-46

Identification of outliers through clustering and semi-supervised learning for all sky surveys

Sharmodeep Bhattacharyya, Joseph W. Richards, John Rice, Dan L. Starr, Nathaniel R. Butler, Joshua S. Bloom

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Abstract

Recently there has been a huge surge of data in astronomy, making outlier or novelty detection a crucial step in analyzing these data. Here, we introduce a clustering based semi-supervised approach for outlier detection. The training data, (X₁,Y₁), . . . , (X_n,Y_n), where n = 1,542, comes from Hipparcos and Optical Gravitational Lensing Experiment (OGLE) surveys, with, X_i ∈ R^p (p = 64) as the features and Yi is a categorical variable having one of the 25 class labels. The set of 64 periodic and non-periodic features are extracted from the light curves. The test data, Z₁, . . . ,Z_m, where m = 11,375, is the test data, where, Z_i ∈ R^p.We select these 11,375 low noise variable light sources for our analysis from a set of unlabeled light curves of ∼50,000 variable light sources from All Sky Automated Survey (ASAS). Our goal is to find outlier data points in the unlabeled data set whose labels can not be properly predicted by the information in the labeled data set. We propose a new hierarchical algorithm for outlier detection in this partially labeled setup based on clustering and semi-supervised learning.We apply our method to identify interesting sources in the ASAS data set, with the training data. We present the ASAS light curves of some of these interesting sources, and elaborate on the possible physical mechanisms driving their variability.

Original language	English (US)
Title of host publication	Information Systems Development
Subtitle of host publication	Reflections, Challenges and New Directions
Pages	483-485
Number of pages	3
DOIs	https://doi.org/10.1007/978-1-4614-3520-4-46
State	Published - Dec 1 2013
Externally published	Yes
Event	20th International Conference on Information Systems Development: Reflections, Challenges and New Directions, ISD 2011 - Edinburgh, United Kingdom Duration: Aug 24 2011 → Aug 26 2011

Publication series

Name	Information Systems Development: Reflections, Challenges and New Directions

Other

Other	20th International Conference on Information Systems Development: Reflections, Challenges and New Directions, ISD 2011
Country/Territory	United Kingdom
City	Edinburgh
Period	8/24/11 → 8/26/11

ASJC Scopus subject areas

Information Systems

Access to Document

10.1007/978-1-4614-3520-4-46

Cite this

Bhattacharyya, S., Richards, J. W., Rice, J., Starr, D. L., Butler, N. R., & Bloom, J. S. (2013). Identification of outliers through clustering and semi-supervised learning for all sky surveys. In Information Systems Development: Reflections, Challenges and New Directions (pp. 483-485). (Information Systems Development: Reflections, Challenges and New Directions). https://doi.org/10.1007/978-1-4614-3520-4-46

Identification of outliers through clustering and semi-supervised learning for all sky surveys. / Bhattacharyya, Sharmodeep; Richards, Joseph W.; Rice, John et al.
Information Systems Development: Reflections, Challenges and New Directions. 2013. p. 483-485 (Information Systems Development: Reflections, Challenges and New Directions).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Bhattacharyya, S, Richards, JW, Rice, J, Starr, DL, Butler, NR & Bloom, JS 2013, Identification of outliers through clustering and semi-supervised learning for all sky surveys. in Information Systems Development: Reflections, Challenges and New Directions. Information Systems Development: Reflections, Challenges and New Directions, pp. 483-485, 20th International Conference on Information Systems Development: Reflections, Challenges and New Directions, ISD 2011, Edinburgh, United Kingdom, 8/24/11. https://doi.org/10.1007/978-1-4614-3520-4-46

Bhattacharyya S, Richards JW, Rice J, Starr DL, Butler NR, Bloom JS. Identification of outliers through clustering and semi-supervised learning for all sky surveys. In Information Systems Development: Reflections, Challenges and New Directions. 2013. p. 483-485. (Information Systems Development: Reflections, Challenges and New Directions). doi: 10.1007/978-1-4614-3520-4-46

@inproceedings{d5b1ba08437f4c5b80ad50499b76f99e,

title = "Identification of outliers through clustering and semi-supervised learning for all sky surveys",

abstract = "Recently there has been a huge surge of data in astronomy, making outlier or novelty detection a crucial step in analyzing these data. Here, we introduce a clustering based semi-supervised approach for outlier detection. The training data, (X1,Y1), . . . , (Xn,Yn), where n = 1,542, comes from Hipparcos and Optical Gravitational Lensing Experiment (OGLE) surveys, with, Xi ∈ Rp (p = 64) as the features and Yi is a categorical variable having one of the 25 class labels. The set of 64 periodic and non-periodic features are extracted from the light curves. The test data, Z1, . . . ,Zm, where m = 11,375, is the test data, where, Zi ∈ Rp.We select these 11,375 low noise variable light sources for our analysis from a set of unlabeled light curves of ∼50,000 variable light sources from All Sky Automated Survey (ASAS). Our goal is to find outlier data points in the unlabeled data set whose labels can not be properly predicted by the information in the labeled data set. We propose a new hierarchical algorithm for outlier detection in this partially labeled setup based on clustering and semi-supervised learning.We apply our method to identify interesting sources in the ASAS data set, with the training data. We present the ASAS light curves of some of these interesting sources, and elaborate on the possible physical mechanisms driving their variability.",

author = "Sharmodeep Bhattacharyya and Richards, {Joseph W.} and John Rice and Starr, {Dan L.} and Butler, {Nathaniel R.} and Bloom, {Joshua S.}",

year = "2013",

month = dec,

day = "1",

doi = "10.1007/978-1-4614-3520-4-46",

language = "English (US)",

isbn = "9781461449508",

series = "Information Systems Development: Reflections, Challenges and New Directions",

pages = "483--485",

booktitle = "Information Systems Development",

note = "20th International Conference on Information Systems Development: Reflections, Challenges and New Directions, ISD 2011 ; Conference date: 24-08-2011 Through 26-08-2011",

}

TY - GEN

T1 - Identification of outliers through clustering and semi-supervised learning for all sky surveys

AU - Bhattacharyya, Sharmodeep

AU - Richards, Joseph W.

AU - Rice, John

AU - Starr, Dan L.

AU - Butler, Nathaniel R.

AU - Bloom, Joshua S.

PY - 2013/12/1

Y1 - 2013/12/1

N2 - Recently there has been a huge surge of data in astronomy, making outlier or novelty detection a crucial step in analyzing these data. Here, we introduce a clustering based semi-supervised approach for outlier detection. The training data, (X1,Y1), . . . , (Xn,Yn), where n = 1,542, comes from Hipparcos and Optical Gravitational Lensing Experiment (OGLE) surveys, with, Xi ∈ Rp (p = 64) as the features and Yi is a categorical variable having one of the 25 class labels. The set of 64 periodic and non-periodic features are extracted from the light curves. The test data, Z1, . . . ,Zm, where m = 11,375, is the test data, where, Zi ∈ Rp.We select these 11,375 low noise variable light sources for our analysis from a set of unlabeled light curves of ∼50,000 variable light sources from All Sky Automated Survey (ASAS). Our goal is to find outlier data points in the unlabeled data set whose labels can not be properly predicted by the information in the labeled data set. We propose a new hierarchical algorithm for outlier detection in this partially labeled setup based on clustering and semi-supervised learning.We apply our method to identify interesting sources in the ASAS data set, with the training data. We present the ASAS light curves of some of these interesting sources, and elaborate on the possible physical mechanisms driving their variability.

AB - Recently there has been a huge surge of data in astronomy, making outlier or novelty detection a crucial step in analyzing these data. Here, we introduce a clustering based semi-supervised approach for outlier detection. The training data, (X1,Y1), . . . , (Xn,Yn), where n = 1,542, comes from Hipparcos and Optical Gravitational Lensing Experiment (OGLE) surveys, with, Xi ∈ Rp (p = 64) as the features and Yi is a categorical variable having one of the 25 class labels. The set of 64 periodic and non-periodic features are extracted from the light curves. The test data, Z1, . . . ,Zm, where m = 11,375, is the test data, where, Zi ∈ Rp.We select these 11,375 low noise variable light sources for our analysis from a set of unlabeled light curves of ∼50,000 variable light sources from All Sky Automated Survey (ASAS). Our goal is to find outlier data points in the unlabeled data set whose labels can not be properly predicted by the information in the labeled data set. We propose a new hierarchical algorithm for outlier detection in this partially labeled setup based on clustering and semi-supervised learning.We apply our method to identify interesting sources in the ASAS data set, with the training data. We present the ASAS light curves of some of these interesting sources, and elaborate on the possible physical mechanisms driving their variability.

UR - http://www.scopus.com/inward/record.url?scp=84894350202&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84894350202&partnerID=8YFLogxK

U2 - 10.1007/978-1-4614-3520-4-46

DO - 10.1007/978-1-4614-3520-4-46

M3 - Conference contribution

AN - SCOPUS:84894350202

SN - 9781461449508

T3 - Information Systems Development: Reflections, Challenges and New Directions

SP - 483

EP - 485

BT - Information Systems Development

T2 - 20th International Conference on Information Systems Development: Reflections, Challenges and New Directions, ISD 2011

Y2 - 24 August 2011 through 26 August 2011

ER -

Identification of outliers through clustering and semi-supervised learning for all sky surveys

Abstract

Publication series

Other

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this