12 Scopus citations

Abstract

Discretizing continuous attributes is necessary before association rules mining or using several inductive learning algorithms with a heterogeneous data space. This data preprocessing step should be carried out with a minimum information loss; that is the mutual information between attributes on the one hand and between attributes and the class labels on the other should not be destroyed. This paper introduces a novel supervised, global and dynamic discretization algorithm, called RFDisc (Random Forests Discretizer). It derives its ability in conserving the data properties from the Random Forests learning algorithm. RFDisc is simple, relatively fast and learns automatically the number of bins into which each continuous attribute is to be discretized. Empirical results indicate that the accuracies of classification algorithms such as CART when used with several data sets are comparable before and after discretization using RFDisc. Furthermore, C5.0 achieves the highest classification accuracy with data discretized with RFDisc when compared with other well known discretization algorithms.

Original languageEnglish (US)
Title of host publication2009 IEEE/ACS International Conference on Computer Systems and Applications, AICCSA 2009
Pages211-217
Number of pages7
DOIs
StatePublished - 2009
Event7th IEEE/ACS International Conference on Computer Systems and Applications, AICCSA-2009 - Rabat, Morocco
Duration: May 10 2009May 13 2009

Publication series

Name2009 IEEE/ACS International Conference on Computer Systems and Applications, AICCSA 2009

Other

Other7th IEEE/ACS International Conference on Computer Systems and Applications, AICCSA-2009
Country/TerritoryMorocco
CityRabat
Period5/10/095/13/09

ASJC Scopus subject areas

  • Computational Theory and Mathematics
  • Computer Science Applications
  • Electrical and Electronic Engineering

Fingerprint

Dive into the research topics of 'Supervised multivariate discretization in mixed data with random forests'. Together they form a unique fingerprint.

Cite this