Abstract

A multivariate decision tree attempts to improve upon the single variable split in a traditional tree. With the increase in datasets with many features and a small number of labeled instances in a variety of domains (bioinformatics, text mining, etc.), a traditional tree-based approach with a greedy variable selection at a node may omit important information. Therefore, the recursive partitioning idea of a simple decision tree combined with the intrinsic feature selection of L1 regularized logistic regression (LR) at each node is a natural choice for a multivariate tree model that is simple, but broadly applicable. This natural solution leads to the sparse multivariate tree (SMT) considered here. SMT can naturally handle non-time-series data and is extended to handle time-series classification problems with the power of extracting interpretable temporal patterns (e.g., means, slopes, and deviations). Binary L1 regularized LR models are used here for binary classification problems. However, SMT may be extended to solve multiclass problems with multinomial LR models. The accuracy and computational efficiency of SMT is compared to a large number of competitors on time series and non-time-series data.

Original languageEnglish (US)
Pages (from-to)53-69
Number of pages17
JournalStatistical Analysis and Data Mining
Volume7
Issue number1
DOIs
StatePublished - Feb 2014

Keywords

  • Decision tree
  • Feature extraction
  • Fused Lasso
  • Lasso
  • Time series classification

ASJC Scopus subject areas

  • Analysis
  • Information Systems
  • Computer Science Applications

Fingerprint Dive into the research topics of 'SMT: Sparse multivariate tree'. Together they form a unique fingerprint.

  • Cite this