Abstract

Attribute importance measures for supervised learning are important for improving both learning accuracy and interpretability. However, it is well-known there could be bias when the predictor attributes have different numbers of values. We propose two methods to solve the bias problem. One uses an out-of-bag sampling method called OOBForest and one, based on the new concept of a partial permutation test, is called pForest. The existing research has considered the bias problem only among irrelevant attributes and equally informative attributes, while we compare to existing methods in a situation where unequally informative attributes (with or without interactions) and irrelevant attributes co-exist. We observe that the existing methods are not always reliable for multi-valued predictors, while the proposed methods compare favorably in our experiments.

Original languageEnglish (US)
Title of host publicationArtificial Neural Networks and Machine Learning, ICANN 2011 - 21st International Conference on Artificial Neural Networks, Proceedings
Pages293-300
Number of pages8
EditionPART 2
DOIs
StatePublished - Jun 24 2011
Event21st International Conference on Artificial Neural Networks, ICANN 2011 - Espoo, Finland
Duration: Jun 14 2011Jun 17 2011

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
NumberPART 2
Volume6792 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Other

Other21st International Conference on Artificial Neural Networks, ICANN 2011
CountryFinland
CityEspoo
Period6/14/116/17/11

Keywords

  • Attribute importance
  • cardinality
  • feature selection
  • random forest

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Computer Science(all)

Fingerprint Dive into the research topics of 'Bias of importance measures for multi-valued attributes and solutions'. Together they form a unique fingerprint.

  • Cite this

    Deng, H., Runger, G., & Tuv, E. (2011). Bias of importance measures for multi-valued attributes and solutions. In Artificial Neural Networks and Machine Learning, ICANN 2011 - 21st International Conference on Artificial Neural Networks, Proceedings (PART 2 ed., pp. 293-300). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 6792 LNCS, No. PART 2). https://doi.org/10.1007/978-3-642-21738-8_38