Abstract

Attribute importance measures for supervised learning are important for improving both learning accuracy and interpretability. However, it is well-known there could be bias when the predictor attributes have different numbers of values. We propose two methods to solve the bias problem. One uses an out-of-bag sampling method called OOBForest and one, based on the new concept of a partial permutation test, is called pForest. The existing research has considered the bias problem only among irrelevant attributes and equally informative attributes, while we compare to existing methods in a situation where unequally informative attributes (with or without interactions) and irrelevant attributes co-exist. We observe that the existing methods are not always reliable for multi-valued predictors, while the proposed methods compare favorably in our experiments.

Original languageEnglish (US)
Title of host publicationLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Pages293-300
Number of pages8
Volume6792 LNCS
EditionPART 2
DOIs
StatePublished - 2011
Event21st International Conference on Artificial Neural Networks, ICANN 2011 - Espoo, Finland
Duration: Jun 14 2011Jun 17 2011

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
NumberPART 2
Volume6792 LNCS
ISSN (Print)03029743
ISSN (Electronic)16113349

Other

Other21st International Conference on Artificial Neural Networks, ICANN 2011
CountryFinland
CityEspoo
Period6/14/116/17/11

Fingerprint

Supervised learning
Attribute
Sampling
Experiments
Predictors
Permutation Test
Interpretability
Sampling Methods
Supervised Learning
Partial
Interaction
Experiment

Keywords

  • Attribute importance
  • cardinality
  • feature selection
  • random forest

ASJC Scopus subject areas

  • Computer Science(all)
  • Theoretical Computer Science

Cite this

Deng, H., Runger, G., & Tuv, E. (2011). Bias of importance measures for multi-valued attributes and solutions. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (PART 2 ed., Vol. 6792 LNCS, pp. 293-300). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 6792 LNCS, No. PART 2). https://doi.org/10.1007/978-3-642-21738-8_38

Bias of importance measures for multi-valued attributes and solutions. / Deng, Houtao; Runger, George; Tuv, Eugene.

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 6792 LNCS PART 2. ed. 2011. p. 293-300 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 6792 LNCS, No. PART 2).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Deng, H, Runger, G & Tuv, E 2011, Bias of importance measures for multi-valued attributes and solutions. in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). PART 2 edn, vol. 6792 LNCS, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), no. PART 2, vol. 6792 LNCS, pp. 293-300, 21st International Conference on Artificial Neural Networks, ICANN 2011, Espoo, Finland, 6/14/11. https://doi.org/10.1007/978-3-642-21738-8_38
Deng H, Runger G, Tuv E. Bias of importance measures for multi-valued attributes and solutions. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). PART 2 ed. Vol. 6792 LNCS. 2011. p. 293-300. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); PART 2). https://doi.org/10.1007/978-3-642-21738-8_38
Deng, Houtao ; Runger, George ; Tuv, Eugene. / Bias of importance measures for multi-valued attributes and solutions. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 6792 LNCS PART 2. ed. 2011. pp. 293-300 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); PART 2).
@inproceedings{7cf8ad82c38b45bcb5100daf6dda7418,
title = "Bias of importance measures for multi-valued attributes and solutions",
abstract = "Attribute importance measures for supervised learning are important for improving both learning accuracy and interpretability. However, it is well-known there could be bias when the predictor attributes have different numbers of values. We propose two methods to solve the bias problem. One uses an out-of-bag sampling method called OOBForest and one, based on the new concept of a partial permutation test, is called pForest. The existing research has considered the bias problem only among irrelevant attributes and equally informative attributes, while we compare to existing methods in a situation where unequally informative attributes (with or without interactions) and irrelevant attributes co-exist. We observe that the existing methods are not always reliable for multi-valued predictors, while the proposed methods compare favorably in our experiments.",
keywords = "Attribute importance, cardinality, feature selection, random forest",
author = "Houtao Deng and George Runger and Eugene Tuv",
year = "2011",
doi = "10.1007/978-3-642-21738-8_38",
language = "English (US)",
isbn = "9783642217371",
volume = "6792 LNCS",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
number = "PART 2",
pages = "293--300",
booktitle = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
edition = "PART 2",

}

TY - GEN

T1 - Bias of importance measures for multi-valued attributes and solutions

AU - Deng, Houtao

AU - Runger, George

AU - Tuv, Eugene

PY - 2011

Y1 - 2011

N2 - Attribute importance measures for supervised learning are important for improving both learning accuracy and interpretability. However, it is well-known there could be bias when the predictor attributes have different numbers of values. We propose two methods to solve the bias problem. One uses an out-of-bag sampling method called OOBForest and one, based on the new concept of a partial permutation test, is called pForest. The existing research has considered the bias problem only among irrelevant attributes and equally informative attributes, while we compare to existing methods in a situation where unequally informative attributes (with or without interactions) and irrelevant attributes co-exist. We observe that the existing methods are not always reliable for multi-valued predictors, while the proposed methods compare favorably in our experiments.

AB - Attribute importance measures for supervised learning are important for improving both learning accuracy and interpretability. However, it is well-known there could be bias when the predictor attributes have different numbers of values. We propose two methods to solve the bias problem. One uses an out-of-bag sampling method called OOBForest and one, based on the new concept of a partial permutation test, is called pForest. The existing research has considered the bias problem only among irrelevant attributes and equally informative attributes, while we compare to existing methods in a situation where unequally informative attributes (with or without interactions) and irrelevant attributes co-exist. We observe that the existing methods are not always reliable for multi-valued predictors, while the proposed methods compare favorably in our experiments.

KW - Attribute importance

KW - cardinality

KW - feature selection

KW - random forest

UR - http://www.scopus.com/inward/record.url?scp=79959348887&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=79959348887&partnerID=8YFLogxK

U2 - 10.1007/978-3-642-21738-8_38

DO - 10.1007/978-3-642-21738-8_38

M3 - Conference contribution

SN - 9783642217371

VL - 6792 LNCS

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 293

EP - 300

BT - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

ER -