Abstract
In manufacturing as well as other application areas there is a need to learn standard operating conditions in order to detect future changes or deviations. This is related to the even more general problem of detecting instances (cases, records) that are unusual compared to the bulk of the data (outliers). Examples of the problem are fault detection in chemical engineering and statistical process control. The outlier problem is ubiquitous. If specific deviations are not a priori specified, this is a type of unsupervised learning problem. The focus here is on the important, practical case for modern data environments. That is, training data with multiple (usual many) variables of mixed types (without the expedient assumptions common in statistics of multivariate normality that rarely holds in practice). An elegant technique is used to transform an unsupervised learning problem to a supervised one. This methodology uses an artificial reference distribution. For the focus here such a specific reference distribution requires appropriate properties. Then an effective, universal, and nonparametric supervised learner (a gradient boosting machine) is applied to the transformed problem. The results are then in a sense inverted to the original problem. Extensions are mentioned as well as additional insight that becomes available. An illustrative example is presented to justify the validity of this generic and general methodology.
Original language | English (US) |
---|---|
Title of host publication | Fourth International Conference on Data Mining, Data Mining IV |
Editors | N.F.F.E. Ebecken, C.A. Brebbia, A. Zanasi |
Publisher | WITPress |
Pages | 63-72 |
Number of pages | 10 |
Volume | 7 |
ISBN (Print) | 1853128309 |
State | Published - 2003 |
Event | Fourth International Conference on Data Mining, Data Mining IV - Rio De Janeiro, Brazil Duration: Dec 1 2003 → Dec 3 2003 |
Other
Other | Fourth International Conference on Data Mining, Data Mining IV |
---|---|
Country/Territory | Brazil |
City | Rio De Janeiro |
Period | 12/1/03 → 12/3/03 |
ASJC Scopus subject areas
- Management Information Systems
- Information Systems
- Engineering(all)
- Computer Science Applications
- Information Systems and Management