Interpretable regularized class association rules algorithm for classification in a categorical data space

Mohamed Azmi, George Runger, Abdelaziz Berrado

    Research output: Contribution to journalArticle

    2 Citations (Scopus)

    Abstract

    Using association rules in classification is a great success which produces high accuracy classifiers. Even so, the principal advantage of the associative classifiers lies in interpretation. However, pruning the useless rules among the huge set of the mined rules as well as combining them to build a classifier remains a subject for improvement and further research. In this paper, we introduce a new algorithm to build a classifier based on Regularized Class Association Rules in a categorical data space called RCAR. The characteristic of this algorithm is, therefore, threefold: First, mining an exhaustive set of Class Association Rules (CARs) according to a predefined values of support and confidence thresholds. Second, applying a regularized logistic regression algorithm with Lasso penalty on the rules space to build a model that predicts the conditional probability of the existence of the outcome. Useless rules are pruned thanks to the selective nature of Lasso regularization. Third, organizing and visualizing the CARs which survive the first step of pruning by Lasso regularization using metarules. An optional step of pruning could be undertaken on the basis of the metarules and subject knowledge. Likewise, the empirical results indicate that RCAR gives comparable accuracy against Random Forest and GBM.

    Original languageEnglish (US)
    Pages (from-to)313-331
    Number of pages19
    JournalInformation Sciences
    Volume483
    DOIs
    StatePublished - May 1 2019

    Fingerprint

    Nominal or categorical data
    Association rules
    Association Rules
    Lasso
    Classifiers
    Pruning
    Classifier
    Regularization
    Random Forest
    Threefolds
    Conditional probability
    Logistic Regression
    Confidence
    Penalty
    Logistics
    Mining
    High Accuracy
    Predict
    Class
    Categorical data

    Keywords

    • Association rules
    • Class association rules
    • Classification
    • Ensemble learning
    • Pruning
    • Regularization

    ASJC Scopus subject areas

    • Software
    • Control and Systems Engineering
    • Theoretical Computer Science
    • Computer Science Applications
    • Information Systems and Management
    • Artificial Intelligence

    Cite this

    Interpretable regularized class association rules algorithm for classification in a categorical data space. / Azmi, Mohamed; Runger, George; Berrado, Abdelaziz.

    In: Information Sciences, Vol. 483, 01.05.2019, p. 313-331.

    Research output: Contribution to journalArticle

    @article{4e105df20af74bcbbd0cae603537870d,
    title = "Interpretable regularized class association rules algorithm for classification in a categorical data space",
    abstract = "Using association rules in classification is a great success which produces high accuracy classifiers. Even so, the principal advantage of the associative classifiers lies in interpretation. However, pruning the useless rules among the huge set of the mined rules as well as combining them to build a classifier remains a subject for improvement and further research. In this paper, we introduce a new algorithm to build a classifier based on Regularized Class Association Rules in a categorical data space called RCAR. The characteristic of this algorithm is, therefore, threefold: First, mining an exhaustive set of Class Association Rules (CARs) according to a predefined values of support and confidence thresholds. Second, applying a regularized logistic regression algorithm with Lasso penalty on the rules space to build a model that predicts the conditional probability of the existence of the outcome. Useless rules are pruned thanks to the selective nature of Lasso regularization. Third, organizing and visualizing the CARs which survive the first step of pruning by Lasso regularization using metarules. An optional step of pruning could be undertaken on the basis of the metarules and subject knowledge. Likewise, the empirical results indicate that RCAR gives comparable accuracy against Random Forest and GBM.",
    keywords = "Association rules, Class association rules, Classification, Ensemble learning, Pruning, Regularization",
    author = "Mohamed Azmi and George Runger and Abdelaziz Berrado",
    year = "2019",
    month = "5",
    day = "1",
    doi = "10.1016/j.ins.2019.01.047",
    language = "English (US)",
    volume = "483",
    pages = "313--331",
    journal = "Information Sciences",
    issn = "0020-0255",
    publisher = "Elsevier Inc.",

    }

    TY - JOUR

    T1 - Interpretable regularized class association rules algorithm for classification in a categorical data space

    AU - Azmi, Mohamed

    AU - Runger, George

    AU - Berrado, Abdelaziz

    PY - 2019/5/1

    Y1 - 2019/5/1

    N2 - Using association rules in classification is a great success which produces high accuracy classifiers. Even so, the principal advantage of the associative classifiers lies in interpretation. However, pruning the useless rules among the huge set of the mined rules as well as combining them to build a classifier remains a subject for improvement and further research. In this paper, we introduce a new algorithm to build a classifier based on Regularized Class Association Rules in a categorical data space called RCAR. The characteristic of this algorithm is, therefore, threefold: First, mining an exhaustive set of Class Association Rules (CARs) according to a predefined values of support and confidence thresholds. Second, applying a regularized logistic regression algorithm with Lasso penalty on the rules space to build a model that predicts the conditional probability of the existence of the outcome. Useless rules are pruned thanks to the selective nature of Lasso regularization. Third, organizing and visualizing the CARs which survive the first step of pruning by Lasso regularization using metarules. An optional step of pruning could be undertaken on the basis of the metarules and subject knowledge. Likewise, the empirical results indicate that RCAR gives comparable accuracy against Random Forest and GBM.

    AB - Using association rules in classification is a great success which produces high accuracy classifiers. Even so, the principal advantage of the associative classifiers lies in interpretation. However, pruning the useless rules among the huge set of the mined rules as well as combining them to build a classifier remains a subject for improvement and further research. In this paper, we introduce a new algorithm to build a classifier based on Regularized Class Association Rules in a categorical data space called RCAR. The characteristic of this algorithm is, therefore, threefold: First, mining an exhaustive set of Class Association Rules (CARs) according to a predefined values of support and confidence thresholds. Second, applying a regularized logistic regression algorithm with Lasso penalty on the rules space to build a model that predicts the conditional probability of the existence of the outcome. Useless rules are pruned thanks to the selective nature of Lasso regularization. Third, organizing and visualizing the CARs which survive the first step of pruning by Lasso regularization using metarules. An optional step of pruning could be undertaken on the basis of the metarules and subject knowledge. Likewise, the empirical results indicate that RCAR gives comparable accuracy against Random Forest and GBM.

    KW - Association rules

    KW - Class association rules

    KW - Classification

    KW - Ensemble learning

    KW - Pruning

    KW - Regularization

    UR - http://www.scopus.com/inward/record.url?scp=85060333227&partnerID=8YFLogxK

    UR - http://www.scopus.com/inward/citedby.url?scp=85060333227&partnerID=8YFLogxK

    U2 - 10.1016/j.ins.2019.01.047

    DO - 10.1016/j.ins.2019.01.047

    M3 - Article

    AN - SCOPUS:85060333227

    VL - 483

    SP - 313

    EP - 331

    JO - Information Sciences

    JF - Information Sciences

    SN - 0020-0255

    ER -