Interpretable regularized class association rules algorithm for classification in a categorical data space

Mohamed Azmi, George Runger, Abdelaziz Berrado

Research output: Contribution to journalArticle

Abstract

Using association rules in classification is a great success which produces high accuracy classifiers. Even so, the principal advantage of the associative classifiers lies in interpretation. However, pruning the useless rules among the huge set of the mined rules as well as combining them to build a classifier remains a subject for improvement and further research. In this paper, we introduce a new algorithm to build a classifier based on Regularized Class Association Rules in a categorical data space called RCAR. The characteristic of this algorithm is, therefore, threefold: First, mining an exhaustive set of Class Association Rules (CARs) according to a predefined values of support and confidence thresholds. Second, applying a regularized logistic regression algorithm with Lasso penalty on the rules space to build a model that predicts the conditional probability of the existence of the outcome. Useless rules are pruned thanks to the selective nature of Lasso regularization. Third, organizing and visualizing the CARs which survive the first step of pruning by Lasso regularization using metarules. An optional step of pruning could be undertaken on the basis of the metarules and subject knowledge. Likewise, the empirical results indicate that RCAR gives comparable accuracy against Random Forest and GBM.

Original languageEnglish (US)
Pages (from-to)313-331
Number of pages19
JournalInformation Sciences
Volume483
DOIs
StatePublished - May 1 2019

Fingerprint

Nominal or categorical data
Association rules
Association Rules
Lasso
Classifiers
Pruning
Classifier
Regularization
Random Forest
Threefolds
Conditional probability
Logistic Regression
Confidence
Penalty
Logistics
Mining
High Accuracy
Predict
Class
Categorical data

Keywords

  • Association rules
  • Class association rules
  • Classification
  • Ensemble learning
  • Pruning
  • Regularization

ASJC Scopus subject areas

  • Software
  • Control and Systems Engineering
  • Theoretical Computer Science
  • Computer Science Applications
  • Information Systems and Management
  • Artificial Intelligence

Cite this

Interpretable regularized class association rules algorithm for classification in a categorical data space. / Azmi, Mohamed; Runger, George; Berrado, Abdelaziz.

In: Information Sciences, Vol. 483, 01.05.2019, p. 313-331.

Research output: Contribution to journalArticle

@article{4e105df20af74bcbbd0cae603537870d,
title = "Interpretable regularized class association rules algorithm for classification in a categorical data space",
abstract = "Using association rules in classification is a great success which produces high accuracy classifiers. Even so, the principal advantage of the associative classifiers lies in interpretation. However, pruning the useless rules among the huge set of the mined rules as well as combining them to build a classifier remains a subject for improvement and further research. In this paper, we introduce a new algorithm to build a classifier based on Regularized Class Association Rules in a categorical data space called RCAR. The characteristic of this algorithm is, therefore, threefold: First, mining an exhaustive set of Class Association Rules (CARs) according to a predefined values of support and confidence thresholds. Second, applying a regularized logistic regression algorithm with Lasso penalty on the rules space to build a model that predicts the conditional probability of the existence of the outcome. Useless rules are pruned thanks to the selective nature of Lasso regularization. Third, organizing and visualizing the CARs which survive the first step of pruning by Lasso regularization using metarules. An optional step of pruning could be undertaken on the basis of the metarules and subject knowledge. Likewise, the empirical results indicate that RCAR gives comparable accuracy against Random Forest and GBM.",
keywords = "Association rules, Class association rules, Classification, Ensemble learning, Pruning, Regularization",
author = "Mohamed Azmi and George Runger and Abdelaziz Berrado",
year = "2019",
month = "5",
day = "1",
doi = "10.1016/j.ins.2019.01.047",
language = "English (US)",
volume = "483",
pages = "313--331",
journal = "Information Sciences",
issn = "0020-0255",
publisher = "Elsevier Inc.",

}

TY - JOUR

T1 - Interpretable regularized class association rules algorithm for classification in a categorical data space

AU - Azmi, Mohamed

AU - Runger, George

AU - Berrado, Abdelaziz

PY - 2019/5/1

Y1 - 2019/5/1

N2 - Using association rules in classification is a great success which produces high accuracy classifiers. Even so, the principal advantage of the associative classifiers lies in interpretation. However, pruning the useless rules among the huge set of the mined rules as well as combining them to build a classifier remains a subject for improvement and further research. In this paper, we introduce a new algorithm to build a classifier based on Regularized Class Association Rules in a categorical data space called RCAR. The characteristic of this algorithm is, therefore, threefold: First, mining an exhaustive set of Class Association Rules (CARs) according to a predefined values of support and confidence thresholds. Second, applying a regularized logistic regression algorithm with Lasso penalty on the rules space to build a model that predicts the conditional probability of the existence of the outcome. Useless rules are pruned thanks to the selective nature of Lasso regularization. Third, organizing and visualizing the CARs which survive the first step of pruning by Lasso regularization using metarules. An optional step of pruning could be undertaken on the basis of the metarules and subject knowledge. Likewise, the empirical results indicate that RCAR gives comparable accuracy against Random Forest and GBM.

AB - Using association rules in classification is a great success which produces high accuracy classifiers. Even so, the principal advantage of the associative classifiers lies in interpretation. However, pruning the useless rules among the huge set of the mined rules as well as combining them to build a classifier remains a subject for improvement and further research. In this paper, we introduce a new algorithm to build a classifier based on Regularized Class Association Rules in a categorical data space called RCAR. The characteristic of this algorithm is, therefore, threefold: First, mining an exhaustive set of Class Association Rules (CARs) according to a predefined values of support and confidence thresholds. Second, applying a regularized logistic regression algorithm with Lasso penalty on the rules space to build a model that predicts the conditional probability of the existence of the outcome. Useless rules are pruned thanks to the selective nature of Lasso regularization. Third, organizing and visualizing the CARs which survive the first step of pruning by Lasso regularization using metarules. An optional step of pruning could be undertaken on the basis of the metarules and subject knowledge. Likewise, the empirical results indicate that RCAR gives comparable accuracy against Random Forest and GBM.

KW - Association rules

KW - Class association rules

KW - Classification

KW - Ensemble learning

KW - Pruning

KW - Regularization

UR - http://www.scopus.com/inward/record.url?scp=85060333227&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85060333227&partnerID=8YFLogxK

U2 - 10.1016/j.ins.2019.01.047

DO - 10.1016/j.ins.2019.01.047

M3 - Article

VL - 483

SP - 313

EP - 331

JO - Information Sciences

JF - Information Sciences

SN - 0020-0255

ER -