Feature selection with ensembles, artificial variables, and redundancy elimination

Eugene Tuv, Alexander Borisov, George Runger, Kari Torkkola

Research output: Contribution to journalArticlepeer-review

236 Scopus citations

Abstract

Predictive models benefit from a compact, non-redundant subset of features that improves interpretability and generalization. Modern data sets are wide, dirty, mixed with both numerical and categorical predictors, and may contain interactive effects that require complex models. This is a challenge for filters, wrappers, and embedded feature selection methods. We describe details of an algorithm using tree-based ensembles to generate a compact subset of non-redundant features. Parallel and serial ensembles of trees are combined into a mixed method that can uncover masking and detect features of secondary effect. Simulated and actual examples illustrate the effectiveness of the approach.

Original languageEnglish (US)
Pages (from-to)1341-1366
Number of pages26
JournalJournal of Machine Learning Research
Volume10
StatePublished - Jul 2009

Keywords

  • Importance
  • Masking
  • Resampling
  • Residuals
  • Trees

ASJC Scopus subject areas

  • Software
  • Control and Systems Engineering
  • Statistics and Probability
  • Artificial Intelligence

Fingerprint

Dive into the research topics of 'Feature selection with ensembles, artificial variables, and redundancy elimination'. Together they form a unique fingerprint.

Cite this