The processing of massive amounts of data for myriads of applications by learning the various types of data and then classifying points of information to identify trends, etc. is becoming enormously important for all sorts of applications. To name a few examples: expert systems for computer-based decision making; computer security and intrusion detection; and data mining for marketing applications are all ways such technology can be utilized. Researchers at Arizona State University have developed a clustering and classification algorithm that groups and classifies data points. This system, denoted as CCA-S, supports an incremental clustering procedure, whereas existing clustering algorithms perform clustering in a batch mode. Other methods require all data points to be available for generating clusters. CCA-S, on the other hand, can be applied to mining dynamic data sets, where new data points are added over time. CCA-S utilizes not only values of attribute variables but also values of the target variable. Hence, it performs a supervised clustering procedure.It is truly a scalable clustering and classification technique that can be applied to large-scale data mining problems with fixed or dynamically changing data sets. In one example, CCA-S was used as a signature recognition technique to learn signatures of intrusive activities in an information system from historic data of intrusive activities and normal activities in an information system. Learned intrusion signatures were used to classify incoming activities in the information system as intrusive or normal. In this application, it performed much better than a decision tree algorithm in a commercial data mining software package.
|Original language||English (US)|
|State||Published - Sep 7 2000|