Abstract

Time series data mining has received much greater interest along with the increase in temporal data sets from different domains such as medicine, finance, multimedia, etc. Representations are important to reduce dimensionality and generate useful similarity measures. High-level representations such as Fourier transforms, wavelets, piecewise polynomial models, etc., were considered previously. Recently, autoregressive kernels were introduced to reflect the similarity of the time series. We introduce a novel approach to model the dependency structure in time series that generalizes the concept of autoregression to local autopatterns. Our approach generates a pattern-based representation along with a similarity measure called learned pattern similarity (LPS). A tree-based ensemble-learning strategy that is fast and insensitive to parameter settings is the basis for the approach. Then, a robust similarity measure based on the learned patterns is presented. This unsupervised approach to represent and measure the similarity between time series generally applies to a number of data mining tasks (e.g., clustering, anomaly detection, classification). Furthermore, an embedded learning of the representation avoids pre-defined features and an extraction step which is common in some feature-based approaches. The method generalizes in a straightforward manner to multivariate time series. The effectiveness of LPS is evaluated on time series classification problems from various domains. We compare LPS to eleven well-known similarity measures. Our experimental results show that LPS provides fast and competitive results on benchmark datasets from several domains. Furthermore, LPS provides a research direction and template approach that breaks from the linear dependency models to potentially foster other promising nonlinear approaches.

Original languageEnglish (US)
Pages (from-to)476-509
Number of pages34
JournalData Mining and Knowledge Discovery
Volume30
Issue number2
DOIs
StatePublished - Mar 1 2016

Fingerprint

Time series
Data mining
Finance
Medicine
Fourier transforms

Keywords

  • Autoregression
  • Pattern discovery
  • Regression tree
  • Similarity
  • Time series

ASJC Scopus subject areas

  • Information Systems
  • Computer Science Applications
  • Computer Networks and Communications

Cite this

Time series representation and similarity based on local autopatterns. / Baydogan, Mustafa Gokce; Runger, George.

In: Data Mining and Knowledge Discovery, Vol. 30, No. 2, 01.03.2016, p. 476-509.

Research output: Contribution to journalArticle

@article{b85ab8f4da234ce8b200455f2e93d7c8,
title = "Time series representation and similarity based on local autopatterns",
abstract = "Time series data mining has received much greater interest along with the increase in temporal data sets from different domains such as medicine, finance, multimedia, etc. Representations are important to reduce dimensionality and generate useful similarity measures. High-level representations such as Fourier transforms, wavelets, piecewise polynomial models, etc., were considered previously. Recently, autoregressive kernels were introduced to reflect the similarity of the time series. We introduce a novel approach to model the dependency structure in time series that generalizes the concept of autoregression to local autopatterns. Our approach generates a pattern-based representation along with a similarity measure called learned pattern similarity (LPS). A tree-based ensemble-learning strategy that is fast and insensitive to parameter settings is the basis for the approach. Then, a robust similarity measure based on the learned patterns is presented. This unsupervised approach to represent and measure the similarity between time series generally applies to a number of data mining tasks (e.g., clustering, anomaly detection, classification). Furthermore, an embedded learning of the representation avoids pre-defined features and an extraction step which is common in some feature-based approaches. The method generalizes in a straightforward manner to multivariate time series. The effectiveness of LPS is evaluated on time series classification problems from various domains. We compare LPS to eleven well-known similarity measures. Our experimental results show that LPS provides fast and competitive results on benchmark datasets from several domains. Furthermore, LPS provides a research direction and template approach that breaks from the linear dependency models to potentially foster other promising nonlinear approaches.",
keywords = "Autoregression, Pattern discovery, Regression tree, Similarity, Time series",
author = "Baydogan, {Mustafa Gokce} and George Runger",
year = "2016",
month = "3",
day = "1",
doi = "10.1007/s10618-015-0425-y",
language = "English (US)",
volume = "30",
pages = "476--509",
journal = "Data Mining and Knowledge Discovery",
issn = "1384-5810",
publisher = "Springer Netherlands",
number = "2",

}

TY - JOUR

T1 - Time series representation and similarity based on local autopatterns

AU - Baydogan, Mustafa Gokce

AU - Runger, George

PY - 2016/3/1

Y1 - 2016/3/1

N2 - Time series data mining has received much greater interest along with the increase in temporal data sets from different domains such as medicine, finance, multimedia, etc. Representations are important to reduce dimensionality and generate useful similarity measures. High-level representations such as Fourier transforms, wavelets, piecewise polynomial models, etc., were considered previously. Recently, autoregressive kernels were introduced to reflect the similarity of the time series. We introduce a novel approach to model the dependency structure in time series that generalizes the concept of autoregression to local autopatterns. Our approach generates a pattern-based representation along with a similarity measure called learned pattern similarity (LPS). A tree-based ensemble-learning strategy that is fast and insensitive to parameter settings is the basis for the approach. Then, a robust similarity measure based on the learned patterns is presented. This unsupervised approach to represent and measure the similarity between time series generally applies to a number of data mining tasks (e.g., clustering, anomaly detection, classification). Furthermore, an embedded learning of the representation avoids pre-defined features and an extraction step which is common in some feature-based approaches. The method generalizes in a straightforward manner to multivariate time series. The effectiveness of LPS is evaluated on time series classification problems from various domains. We compare LPS to eleven well-known similarity measures. Our experimental results show that LPS provides fast and competitive results on benchmark datasets from several domains. Furthermore, LPS provides a research direction and template approach that breaks from the linear dependency models to potentially foster other promising nonlinear approaches.

AB - Time series data mining has received much greater interest along with the increase in temporal data sets from different domains such as medicine, finance, multimedia, etc. Representations are important to reduce dimensionality and generate useful similarity measures. High-level representations such as Fourier transforms, wavelets, piecewise polynomial models, etc., were considered previously. Recently, autoregressive kernels were introduced to reflect the similarity of the time series. We introduce a novel approach to model the dependency structure in time series that generalizes the concept of autoregression to local autopatterns. Our approach generates a pattern-based representation along with a similarity measure called learned pattern similarity (LPS). A tree-based ensemble-learning strategy that is fast and insensitive to parameter settings is the basis for the approach. Then, a robust similarity measure based on the learned patterns is presented. This unsupervised approach to represent and measure the similarity between time series generally applies to a number of data mining tasks (e.g., clustering, anomaly detection, classification). Furthermore, an embedded learning of the representation avoids pre-defined features and an extraction step which is common in some feature-based approaches. The method generalizes in a straightforward manner to multivariate time series. The effectiveness of LPS is evaluated on time series classification problems from various domains. We compare LPS to eleven well-known similarity measures. Our experimental results show that LPS provides fast and competitive results on benchmark datasets from several domains. Furthermore, LPS provides a research direction and template approach that breaks from the linear dependency models to potentially foster other promising nonlinear approaches.

KW - Autoregression

KW - Pattern discovery

KW - Regression tree

KW - Similarity

KW - Time series

UR - http://www.scopus.com/inward/record.url?scp=84957940893&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84957940893&partnerID=8YFLogxK

U2 - 10.1007/s10618-015-0425-y

DO - 10.1007/s10618-015-0425-y

M3 - Article

VL - 30

SP - 476

EP - 509

JO - Data Mining and Knowledge Discovery

JF - Data Mining and Knowledge Discovery

SN - 1384-5810

IS - 2

ER -