TY - GEN

T1 - PICC counting

T2 - 9th SIAM International Conference on Data Mining 2009, SDM 2009

AU - Kim, Jong Wook

AU - Candan, Kasim

PY - 2009

Y1 - 2009

N2 - Counting is a common task in many data mining applications, including market basket data analysis, scientific inquiry, and other high dimensional data management applications. Given a single table, obtaining the instance counts of the entries in the table is relatively cheap. In situations where the attributes of interest are distributed across different tables, however, the problem of computing instance counts can be very expensive. The naive solution, joining all the relevant relations to obtain a single table suitable for counting, is rarely practical. In this paper, we propose PICC (Propagation-based Instance Counts on Concise Graphs), a novel counting technique for discovering instance counts in databases. We first propose a propagation-based instance counting scheme which avoids joins to obtain a single table. We then present a method for summarizing a database into a concise synopsis and describe how to use this along with the propagation scheme to estimate the required counts efficiently. The experiment results show that the proposed technique, PICC, provides significant execution time and accuracy gains over the existing solutions to this problem.

AB - Counting is a common task in many data mining applications, including market basket data analysis, scientific inquiry, and other high dimensional data management applications. Given a single table, obtaining the instance counts of the entries in the table is relatively cheap. In situations where the attributes of interest are distributed across different tables, however, the problem of computing instance counts can be very expensive. The naive solution, joining all the relevant relations to obtain a single table suitable for counting, is rarely practical. In this paper, we propose PICC (Propagation-based Instance Counts on Concise Graphs), a novel counting technique for discovering instance counts in databases. We first propose a propagation-based instance counting scheme which avoids joins to obtain a single table. We then present a method for summarizing a database into a concise synopsis and describe how to use this along with the propagation scheme to estimate the required counts efficiently. The experiment results show that the proposed technique, PICC, provides significant execution time and accuracy gains over the existing solutions to this problem.

UR - http://www.scopus.com/inward/record.url?scp=72749085579&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=72749085579&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:72749085579

SN - 9781615671090

T3 - Society for Industrial and Applied Mathematics - 9th SIAM International Conference on Data Mining 2009, Proceedings in Applied Mathematics

SP - 752

EP - 763

BT - Society for Industrial and Applied Mathematics - 9th SIAM International Conference on Data Mining 2009, Proceedings in Applied Mathematics 133

Y2 - 30 April 2009 through 2 May 2009

ER -