A clustering algorithm for identifying multiple outliers in linear regression

David M. Sebert, Douglas Montgomery, Dwayne A. Rollier

Research output: Contribution to journalArticlepeer-review

35 Scopus citations

Abstract

Identifying outliers is a fundamental step in the regression model building process. However, current outlier diagnostics are often inadequate when data sets contain multiple outlying observations. This paper proposes a new clustering-based approach for multiple outlier identification that utilizes the predicted and residual values obtained from a least squares fit of the data. The procedure is described and is shown to perform well on classic multiple-outlier data sets found in the literature. Also, the performance characteristics of the proposed methodology are demonstrated and explored by applying the procedure to simulated data sets that have various outlier scenarios.

Original languageEnglish (US)
Pages (from-to)461-484
Number of pages24
JournalComputational Statistics and Data Analysis
Volume27
Issue number4
DOIs
StatePublished - Jun 5 1998

ASJC Scopus subject areas

  • Statistics and Probability
  • Computational Mathematics
  • Computational Theory and Mathematics
  • Applied Mathematics

Fingerprint

Dive into the research topics of 'A clustering algorithm for identifying multiple outliers in linear regression'. Together they form a unique fingerprint.

Cite this