A clustering algorithm for identifying multiple outliers in linear regression

David M. Sebert; Douglas Montgomery; Dwayne A. Rollier

doi:10.1016/S0167-9473(98)00021-8

A clustering algorithm for identifying multiple outliers in linear regression

David M. Sebert, Douglas Montgomery, Dwayne A. Rollier

Industrial, Systems and Operations Engineering

Research output: Contribution to journal › Article › peer-review

35 Scopus citations

Abstract

Identifying outliers is a fundamental step in the regression model building process. However, current outlier diagnostics are often inadequate when data sets contain multiple outlying observations. This paper proposes a new clustering-based approach for multiple outlier identification that utilizes the predicted and residual values obtained from a least squares fit of the data. The procedure is described and is shown to perform well on classic multiple-outlier data sets found in the literature. Also, the performance characteristics of the proposed methodology are demonstrated and explored by applying the procedure to simulated data sets that have various outlier scenarios.

Original language	English (US)
Pages (from-to)	461-484
Number of pages	24
Journal	Computational Statistics and Data Analysis
Volume	27
Issue number	4
DOIs	https://doi.org/10.1016/S0167-9473(98)00021-8
State	Published - Jun 5 1998

ASJC Scopus subject areas

Statistics and Probability
Computational Mathematics
Computational Theory and Mathematics
Applied Mathematics

Access to Document

10.1016/S0167-9473(98)00021-8

Cite this

@article{b43249ee8bb042f4844f09a795e4e364,

title = "A clustering algorithm for identifying multiple outliers in linear regression",

abstract = "Identifying outliers is a fundamental step in the regression model building process. However, current outlier diagnostics are often inadequate when data sets contain multiple outlying observations. This paper proposes a new clustering-based approach for multiple outlier identification that utilizes the predicted and residual values obtained from a least squares fit of the data. The procedure is described and is shown to perform well on classic multiple-outlier data sets found in the literature. Also, the performance characteristics of the proposed methodology are demonstrated and explored by applying the procedure to simulated data sets that have various outlier scenarios.",

author = "Sebert, {David M.} and Douglas Montgomery and Rollier, {Dwayne A.}",

year = "1998",

month = jun,

day = "5",

doi = "10.1016/S0167-9473(98)00021-8",

language = "English (US)",

volume = "27",

pages = "461--484",

journal = "Computational Statistics and Data Analysis",

issn = "0167-9473",

publisher = "Elsevier",

number = "4",

}

TY - JOUR

T1 - A clustering algorithm for identifying multiple outliers in linear regression

AU - Sebert, David M.

AU - Montgomery, Douglas

AU - Rollier, Dwayne A.

PY - 1998/6/5

Y1 - 1998/6/5

N2 - Identifying outliers is a fundamental step in the regression model building process. However, current outlier diagnostics are often inadequate when data sets contain multiple outlying observations. This paper proposes a new clustering-based approach for multiple outlier identification that utilizes the predicted and residual values obtained from a least squares fit of the data. The procedure is described and is shown to perform well on classic multiple-outlier data sets found in the literature. Also, the performance characteristics of the proposed methodology are demonstrated and explored by applying the procedure to simulated data sets that have various outlier scenarios.

AB - Identifying outliers is a fundamental step in the regression model building process. However, current outlier diagnostics are often inadequate when data sets contain multiple outlying observations. This paper proposes a new clustering-based approach for multiple outlier identification that utilizes the predicted and residual values obtained from a least squares fit of the data. The procedure is described and is shown to perform well on classic multiple-outlier data sets found in the literature. Also, the performance characteristics of the proposed methodology are demonstrated and explored by applying the procedure to simulated data sets that have various outlier scenarios.

UR - http://www.scopus.com/inward/record.url?scp=0032486019&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0032486019&partnerID=8YFLogxK

U2 - 10.1016/S0167-9473(98)00021-8

DO - 10.1016/S0167-9473(98)00021-8

M3 - Article

AN - SCOPUS:0032486019

SN - 0167-9473

VL - 27

SP - 461

EP - 484

JO - Computational Statistics and Data Analysis

JF - Computational Statistics and Data Analysis

IS - 4

ER -

A clustering algorithm for identifying multiple outliers in linear regression

Abstract

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this