Active data labeling for improved classifier generalizability

Visar Berisha; Douglas Cochran

doi:10.1016/j.sigpro.2014.09.016

Active data labeling for improved classifier generalizability

Visar Berisha, Douglas Cochran

Research output: Contribution to journal › Article › peer-review

1 Scopus citations

Abstract

Existing statistical learning methods perform well when evaluated on training and test data drawn from the same distribution. In practice, however, these distributions are not always the same. In this paper we derive an estimable upper bound on the test error rate that depends on a new probability distance measure between training and test distributions. Furthermore, we identify a non-parametric estimator for this distance measure that can be estimated directly from data. We show how this new probability distance measure can be used to construct algorithmic tools that improve performance. In particular, motivated by our upper bound, we propose a new active learning algorithm for domain adaptation. Comparative results confirm the efficacy of the active learning algorithm on a set of 12 speech classification tasks.

Original language	English (US)
Pages (from-to)	272-277
Number of pages	6
Journal	Signal Processing
Volume	108
DOIs	https://doi.org/10.1016/j.sigpro.2014.09.016
State	Published - Mar 2015

Keywords

Active learning
Classification
Divergence measures
Domain adaptation

ASJC Scopus subject areas

Control and Systems Engineering
Software
Signal Processing
Computer Vision and Pattern Recognition
Electrical and Electronic Engineering

Access to Document

10.1016/j.sigpro.2014.09.016

Cite this

@article{82b3ecf9f9e842319aae5c480cacb731,

title = "Active data labeling for improved classifier generalizability",

abstract = "Existing statistical learning methods perform well when evaluated on training and test data drawn from the same distribution. In practice, however, these distributions are not always the same. In this paper we derive an estimable upper bound on the test error rate that depends on a new probability distance measure between training and test distributions. Furthermore, we identify a non-parametric estimator for this distance measure that can be estimated directly from data. We show how this new probability distance measure can be used to construct algorithmic tools that improve performance. In particular, motivated by our upper bound, we propose a new active learning algorithm for domain adaptation. Comparative results confirm the efficacy of the active learning algorithm on a set of 12 speech classification tasks.",

keywords = "Active learning, Classification, Divergence measures, Domain adaptation",

author = "Visar Berisha and Douglas Cochran",

year = "2015",

month = mar,

doi = "10.1016/j.sigpro.2014.09.016",

language = "English (US)",

volume = "108",

pages = "272--277",

journal = "Signal Processing",

issn = "0165-1684",

publisher = "Elsevier",

}

TY - JOUR

T1 - Active data labeling for improved classifier generalizability

AU - Berisha, Visar

AU - Cochran, Douglas

PY - 2015/3

Y1 - 2015/3

N2 - Existing statistical learning methods perform well when evaluated on training and test data drawn from the same distribution. In practice, however, these distributions are not always the same. In this paper we derive an estimable upper bound on the test error rate that depends on a new probability distance measure between training and test distributions. Furthermore, we identify a non-parametric estimator for this distance measure that can be estimated directly from data. We show how this new probability distance measure can be used to construct algorithmic tools that improve performance. In particular, motivated by our upper bound, we propose a new active learning algorithm for domain adaptation. Comparative results confirm the efficacy of the active learning algorithm on a set of 12 speech classification tasks.

AB - Existing statistical learning methods perform well when evaluated on training and test data drawn from the same distribution. In practice, however, these distributions are not always the same. In this paper we derive an estimable upper bound on the test error rate that depends on a new probability distance measure between training and test distributions. Furthermore, we identify a non-parametric estimator for this distance measure that can be estimated directly from data. We show how this new probability distance measure can be used to construct algorithmic tools that improve performance. In particular, motivated by our upper bound, we propose a new active learning algorithm for domain adaptation. Comparative results confirm the efficacy of the active learning algorithm on a set of 12 speech classification tasks.

KW - Active learning

KW - Classification

KW - Divergence measures

KW - Domain adaptation

UR - http://www.scopus.com/inward/record.url?scp=84908338241&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84908338241&partnerID=8YFLogxK

U2 - 10.1016/j.sigpro.2014.09.016

DO - 10.1016/j.sigpro.2014.09.016

M3 - Article

AN - SCOPUS:84908338241

SN - 0165-1684

VL - 108

SP - 272

EP - 277

JO - Signal Processing

JF - Signal Processing

ER -

Active data labeling for improved classifier generalizability

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this