Abstract

Active learning has gained attention as a method to expedite the learning curve of classifiers. To this end, uncertainty sampling is a widely adopted strategy that selects instances closer to the decision boundary. However, uncertainty sampling alone may not be sufficient in batch active learning due to the redundancy of instances and its susceptibility to outliers. In this study, we utilize query-by-committee (QBC) for uncertainty and demonstrate that its performance can be improved by introducing diversity and density in instance utility. Test results show that uncertainty sampling by QBC can be significantly improved with diversity and density incorporated in instance selection. Furthermore, we investigate several distance measures for use in diversity and density and show that random forest dissimilarity can be an effective distance measure in batch active learning. The effects of the characteristics of the data on the results are also analyzed.

Original languageEnglish (US)
Pages (from-to)401-418
Number of pages18
JournalInformation Sciences
Volume454-455
DOIs
StatePublished - Jul 1 2018

Fingerprint

Active Learning
Batch
Query
Uncertainty
Distance Measure
Sampling
Learning Curve
Random Forest
Dissimilarity
Susceptibility
Outlier
Redundancy
Classifiers
Classifier
Sufficient
Problem-Based Learning
Active learning
Demonstrate
Distance measure

Keywords

  • Batch active learning
  • Density
  • Diversity
  • Query-by-committee
  • Random forest

ASJC Scopus subject areas

  • Software
  • Control and Systems Engineering
  • Theoretical Computer Science
  • Computer Science Applications
  • Information Systems and Management
  • Artificial Intelligence

Cite this

Query-by-committee improvement with diversity and density in batch active learning. / Kee, Seho; del Castillo, Enrique; Runger, George.

In: Information Sciences, Vol. 454-455, 01.07.2018, p. 401-418.

Research output: Contribution to journalArticle

@article{6a191fc54fed480a9a61140f6cea6ae8,
title = "Query-by-committee improvement with diversity and density in batch active learning",
abstract = "Active learning has gained attention as a method to expedite the learning curve of classifiers. To this end, uncertainty sampling is a widely adopted strategy that selects instances closer to the decision boundary. However, uncertainty sampling alone may not be sufficient in batch active learning due to the redundancy of instances and its susceptibility to outliers. In this study, we utilize query-by-committee (QBC) for uncertainty and demonstrate that its performance can be improved by introducing diversity and density in instance utility. Test results show that uncertainty sampling by QBC can be significantly improved with diversity and density incorporated in instance selection. Furthermore, we investigate several distance measures for use in diversity and density and show that random forest dissimilarity can be an effective distance measure in batch active learning. The effects of the characteristics of the data on the results are also analyzed.",
keywords = "Batch active learning, Density, Diversity, Query-by-committee, Random forest",
author = "Seho Kee and {del Castillo}, Enrique and George Runger",
year = "2018",
month = "7",
day = "1",
doi = "10.1016/j.ins.2018.05.014",
language = "English (US)",
volume = "454-455",
pages = "401--418",
journal = "Information Sciences",
issn = "0020-0255",
publisher = "Elsevier Inc.",

}

TY - JOUR

T1 - Query-by-committee improvement with diversity and density in batch active learning

AU - Kee, Seho

AU - del Castillo, Enrique

AU - Runger, George

PY - 2018/7/1

Y1 - 2018/7/1

N2 - Active learning has gained attention as a method to expedite the learning curve of classifiers. To this end, uncertainty sampling is a widely adopted strategy that selects instances closer to the decision boundary. However, uncertainty sampling alone may not be sufficient in batch active learning due to the redundancy of instances and its susceptibility to outliers. In this study, we utilize query-by-committee (QBC) for uncertainty and demonstrate that its performance can be improved by introducing diversity and density in instance utility. Test results show that uncertainty sampling by QBC can be significantly improved with diversity and density incorporated in instance selection. Furthermore, we investigate several distance measures for use in diversity and density and show that random forest dissimilarity can be an effective distance measure in batch active learning. The effects of the characteristics of the data on the results are also analyzed.

AB - Active learning has gained attention as a method to expedite the learning curve of classifiers. To this end, uncertainty sampling is a widely adopted strategy that selects instances closer to the decision boundary. However, uncertainty sampling alone may not be sufficient in batch active learning due to the redundancy of instances and its susceptibility to outliers. In this study, we utilize query-by-committee (QBC) for uncertainty and demonstrate that its performance can be improved by introducing diversity and density in instance utility. Test results show that uncertainty sampling by QBC can be significantly improved with diversity and density incorporated in instance selection. Furthermore, we investigate several distance measures for use in diversity and density and show that random forest dissimilarity can be an effective distance measure in batch active learning. The effects of the characteristics of the data on the results are also analyzed.

KW - Batch active learning

KW - Density

KW - Diversity

KW - Query-by-committee

KW - Random forest

UR - http://www.scopus.com/inward/record.url?scp=85046825582&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85046825582&partnerID=8YFLogxK

U2 - 10.1016/j.ins.2018.05.014

DO - 10.1016/j.ins.2018.05.014

M3 - Article

VL - 454-455

SP - 401

EP - 418

JO - Information Sciences

JF - Information Sciences

SN - 0020-0255

ER -