PICC counting: Who needs joins when you can propagate efficiently?

Jong Wook Kim, Kasim Candan

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Counting is a common task in many data mining applications, including market basket data analysis, scientific inquiry, and other high dimensional data management applications. Given a single table, obtaining the instance counts of the entries in the table is relatively cheap. In situations where the attributes of interest are distributed across different tables, however, the problem of computing instance counts can be very expensive. The naive solution, joining all the relevant relations to obtain a single table suitable for counting, is rarely practical. In this paper, we propose PICC (Propagation-based Instance Counts on Concise Graphs), a novel counting technique for discovering instance counts in databases. We first propose a propagation-based instance counting scheme which avoids joins to obtain a single table. We then present a method for summarizing a database into a concise synopsis and describe how to use this along with the propagation scheme to estimate the required counts efficiently. The experiment results show that the proposed technique, PICC, provides significant execution time and accuracy gains over the existing solutions to this problem.

Original languageEnglish (US)
Title of host publicationSociety for Industrial and Applied Mathematics - 9th SIAM International Conference on Data Mining 2009, Proceedings in Applied Mathematics 133
Pages752-763
Number of pages12
StatePublished - 2009
Event9th SIAM International Conference on Data Mining 2009, SDM 2009 - Sparks, NV, United States
Duration: Apr 30 2009May 2 2009

Publication series

NameSociety for Industrial and Applied Mathematics - 9th SIAM International Conference on Data Mining 2009, Proceedings in Applied Mathematics
Volume2

Other

Other9th SIAM International Conference on Data Mining 2009, SDM 2009
CountryUnited States
CitySparks, NV
Period4/30/095/2/09

ASJC Scopus subject areas

  • Computational Theory and Mathematics
  • Software
  • Applied Mathematics

Fingerprint Dive into the research topics of 'PICC counting: Who needs joins when you can propagate efficiently?'. Together they form a unique fingerprint.

Cite this