Construction of a calibrated probabilistic classification catalog

Application to 50k variable sources in the All-Sky automated survey

Joseph W. Richards, Dan L. Starr, Adam A. Miller, Joshua S. Bloom, Nathaniel Butler, Henrik Brink, Arien Crellin-Quick

Research output: Contribution to journalArticle

57 Citations (Scopus)

Abstract

With growing data volumes from synoptic surveys, astronomers necessarily must become more abstracted from the discovery and introspection processes. Given the scarcity of follow-up resources, there is a particularly sharp onus on the frameworks that replace these human roles to provide accurate and well-calibrated probabilistic classification catalogs. Such catalogs inform the subsequent follow-up, allowing consumers to optimize the selection of specific sources for further study and permitting rigorous treatment of classification purities and efficiencies for population studies. Here, we describe a process to produce a probabilistic classification catalog of variability with machine learning from a multi-epoch photometric survey. In addition to producing accurate classifications, we show how to estimate calibrated class probabilities and motivate the importance of probability calibration. We also introduce a methodology for feature-based anomaly detection, which allows discovery of objects in the survey that do not fit within the predefined class taxonomy. Finally, we apply these methods to sources observed by the All-Sky Automated Survey (ASAS), and release the Machine-learned ASAS Classification Catalog (MACC), a 28 class probabilistic classification catalog of 50,124 ASAS sources in the ASAS Catalog of Variable Stars. We estimate that MACC achieves a sub-20% classification error rate and demonstrate that the class posterior probabilities are reasonably calibrated. MACC classifications compare favorably to the classifications of several previous domain-specific ASAS papers and to the ASAS Catalog of Variable Stars, which had classified only 24% of those sources into one of 12 science classes.

Original languageEnglish (US)
Article number32
JournalAstrophysical Journal, Supplement Series
Volume203
Issue number2
DOIs
StatePublished - Dec 2012

Fingerprint

catalogs
sky
Advanced Solid-State Array Spectroradiometer
variable stars
taxonomy
machine learning
estimates
resources
purity
time measurement
methodology
anomalies
calibration
anomaly
resource

Keywords

  • catalogs
  • methods: data analysis
  • methods: statistical
  • stars: variables: general
  • techniques: photometric

ASJC Scopus subject areas

  • Space and Planetary Science
  • Astronomy and Astrophysics

Cite this

Construction of a calibrated probabilistic classification catalog : Application to 50k variable sources in the All-Sky automated survey. / Richards, Joseph W.; Starr, Dan L.; Miller, Adam A.; Bloom, Joshua S.; Butler, Nathaniel; Brink, Henrik; Crellin-Quick, Arien.

In: Astrophysical Journal, Supplement Series, Vol. 203, No. 2, 32, 12.2012.

Research output: Contribution to journalArticle

Richards, Joseph W. ; Starr, Dan L. ; Miller, Adam A. ; Bloom, Joshua S. ; Butler, Nathaniel ; Brink, Henrik ; Crellin-Quick, Arien. / Construction of a calibrated probabilistic classification catalog : Application to 50k variable sources in the All-Sky automated survey. In: Astrophysical Journal, Supplement Series. 2012 ; Vol. 203, No. 2.
@article{d6d0ebc7659d41da984e92ea4a53567c,
title = "Construction of a calibrated probabilistic classification catalog: Application to 50k variable sources in the All-Sky automated survey",
abstract = "With growing data volumes from synoptic surveys, astronomers necessarily must become more abstracted from the discovery and introspection processes. Given the scarcity of follow-up resources, there is a particularly sharp onus on the frameworks that replace these human roles to provide accurate and well-calibrated probabilistic classification catalogs. Such catalogs inform the subsequent follow-up, allowing consumers to optimize the selection of specific sources for further study and permitting rigorous treatment of classification purities and efficiencies for population studies. Here, we describe a process to produce a probabilistic classification catalog of variability with machine learning from a multi-epoch photometric survey. In addition to producing accurate classifications, we show how to estimate calibrated class probabilities and motivate the importance of probability calibration. We also introduce a methodology for feature-based anomaly detection, which allows discovery of objects in the survey that do not fit within the predefined class taxonomy. Finally, we apply these methods to sources observed by the All-Sky Automated Survey (ASAS), and release the Machine-learned ASAS Classification Catalog (MACC), a 28 class probabilistic classification catalog of 50,124 ASAS sources in the ASAS Catalog of Variable Stars. We estimate that MACC achieves a sub-20{\%} classification error rate and demonstrate that the class posterior probabilities are reasonably calibrated. MACC classifications compare favorably to the classifications of several previous domain-specific ASAS papers and to the ASAS Catalog of Variable Stars, which had classified only 24{\%} of those sources into one of 12 science classes.",
keywords = "catalogs, methods: data analysis, methods: statistical, stars: variables: general, techniques: photometric",
author = "Richards, {Joseph W.} and Starr, {Dan L.} and Miller, {Adam A.} and Bloom, {Joshua S.} and Nathaniel Butler and Henrik Brink and Arien Crellin-Quick",
year = "2012",
month = "12",
doi = "10.1088/0067-0049/203/2/32",
language = "English (US)",
volume = "203",
journal = "Astrophysical Journal, Supplement Series",
issn = "0067-0049",
publisher = "IOP Publishing Ltd.",
number = "2",

}

TY - JOUR

T1 - Construction of a calibrated probabilistic classification catalog

T2 - Application to 50k variable sources in the All-Sky automated survey

AU - Richards, Joseph W.

AU - Starr, Dan L.

AU - Miller, Adam A.

AU - Bloom, Joshua S.

AU - Butler, Nathaniel

AU - Brink, Henrik

AU - Crellin-Quick, Arien

PY - 2012/12

Y1 - 2012/12

N2 - With growing data volumes from synoptic surveys, astronomers necessarily must become more abstracted from the discovery and introspection processes. Given the scarcity of follow-up resources, there is a particularly sharp onus on the frameworks that replace these human roles to provide accurate and well-calibrated probabilistic classification catalogs. Such catalogs inform the subsequent follow-up, allowing consumers to optimize the selection of specific sources for further study and permitting rigorous treatment of classification purities and efficiencies for population studies. Here, we describe a process to produce a probabilistic classification catalog of variability with machine learning from a multi-epoch photometric survey. In addition to producing accurate classifications, we show how to estimate calibrated class probabilities and motivate the importance of probability calibration. We also introduce a methodology for feature-based anomaly detection, which allows discovery of objects in the survey that do not fit within the predefined class taxonomy. Finally, we apply these methods to sources observed by the All-Sky Automated Survey (ASAS), and release the Machine-learned ASAS Classification Catalog (MACC), a 28 class probabilistic classification catalog of 50,124 ASAS sources in the ASAS Catalog of Variable Stars. We estimate that MACC achieves a sub-20% classification error rate and demonstrate that the class posterior probabilities are reasonably calibrated. MACC classifications compare favorably to the classifications of several previous domain-specific ASAS papers and to the ASAS Catalog of Variable Stars, which had classified only 24% of those sources into one of 12 science classes.

AB - With growing data volumes from synoptic surveys, astronomers necessarily must become more abstracted from the discovery and introspection processes. Given the scarcity of follow-up resources, there is a particularly sharp onus on the frameworks that replace these human roles to provide accurate and well-calibrated probabilistic classification catalogs. Such catalogs inform the subsequent follow-up, allowing consumers to optimize the selection of specific sources for further study and permitting rigorous treatment of classification purities and efficiencies for population studies. Here, we describe a process to produce a probabilistic classification catalog of variability with machine learning from a multi-epoch photometric survey. In addition to producing accurate classifications, we show how to estimate calibrated class probabilities and motivate the importance of probability calibration. We also introduce a methodology for feature-based anomaly detection, which allows discovery of objects in the survey that do not fit within the predefined class taxonomy. Finally, we apply these methods to sources observed by the All-Sky Automated Survey (ASAS), and release the Machine-learned ASAS Classification Catalog (MACC), a 28 class probabilistic classification catalog of 50,124 ASAS sources in the ASAS Catalog of Variable Stars. We estimate that MACC achieves a sub-20% classification error rate and demonstrate that the class posterior probabilities are reasonably calibrated. MACC classifications compare favorably to the classifications of several previous domain-specific ASAS papers and to the ASAS Catalog of Variable Stars, which had classified only 24% of those sources into one of 12 science classes.

KW - catalogs

KW - methods: data analysis

KW - methods: statistical

KW - stars: variables: general

KW - techniques: photometric

UR - http://www.scopus.com/inward/record.url?scp=84871206353&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84871206353&partnerID=8YFLogxK

U2 - 10.1088/0067-0049/203/2/32

DO - 10.1088/0067-0049/203/2/32

M3 - Article

VL - 203

JO - Astrophysical Journal, Supplement Series

JF - Astrophysical Journal, Supplement Series

SN - 0067-0049

IS - 2

M1 - 32

ER -