Abstract

Statistical analysis of massive multivariate industrial data sets might not be effective unless the data is first partitioned into stable operating regions. Standard statistical methods that are based on global models, like principal component and regression analyses, can be ineffective if the process is not under statistical control. Here, constrained clustering is proposed as a practical, robust, general solution to detect change points and partition historical process data. The constraint is that only observations (or clusters) that are contiguous in time can be joined. This paper describes a method for partitioning data sets into stable regions by modifying agglomerative clustering algorithms to take into account the time order within the data set. A stopping criterion is proposed to evaluate the number of change points generated, and several empirical studies with simulated and actual data are presented. The result is a partitioned data set that is better suited for analysis by standard statistical methods.

Original languageEnglish (US)
Pages (from-to)3-17
Number of pages15
JournalJournal of Quality Technology
Volume41
Issue number1
DOIs
StatePublished - Jan 2009

Keywords

  • Change point
  • Cluster analysis
  • Data mining
  • Data segmentation
  • Partitioning

ASJC Scopus subject areas

  • Safety, Risk, Reliability and Quality
  • Strategy and Management
  • Management Science and Operations Research
  • Industrial and Manufacturing Engineering

Fingerprint

Dive into the research topics of 'Process partitions from time-ordered clusters'. Together they form a unique fingerprint.

Cite this