Identifying outliers is a fundamental step in the regression model building process. However, current outlier diagnostics are often inadequate when data sets contain multiple outlying observations. This paper proposes a new clustering-based approach for multiple outlier identification that utilizes the predicted and residual values obtained from a least squares fit of the data. The procedure is described and is shown to perform well on classic multiple-outlier data sets found in the literature. Also, the performance characteristics of the proposed methodology are demonstrated and explored by applying the procedure to simulated data sets that have various outlier scenarios.
ASJC Scopus subject areas
- Statistics and Probability
- Computational Mathematics
- Computational Theory and Mathematics
- Applied Mathematics