PICC counting: Who needs joins when you can propagate efficiently?

Jong Wook Kim, Kasim Candan

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Counting is a common task in many data mining applications, including market basket data analysis, scientific inquiry, and other high dimensional data management applications. Given a single table, obtaining the instance counts of the entries in the table is relatively cheap. In situations where the attributes of interest are distributed across different tables, however, the problem of computing instance counts can be very expensive. The naive solution, joining all the relevant relations to obtain a single table suitable for counting, is rarely practical. In this paper, we propose PICC (Propagation-based Instance Counts on Concise Graphs), a novel counting technique for discovering instance counts in databases. We first propose a propagation-based instance counting scheme which avoids joins to obtain a single table. We then present a method for summarizing a database into a concise synopsis and describe how to use this along with the propagation scheme to estimate the required counts efficiently. The experiment results show that the proposed technique, PICC, provides significant execution time and accuracy gains over the existing solutions to this problem.

Original languageEnglish (US)
Title of host publicationSociety for Industrial and Applied Mathematics - 9th SIAM International Conference on Data Mining 2009, Proceedings in Applied Mathematics
Pages752-763
Number of pages12
Volume2
StatePublished - 2009
Event9th SIAM International Conference on Data Mining 2009, SDM 2009 - Sparks, NV, United States
Duration: Apr 30 2009May 2 2009

Other

Other9th SIAM International Conference on Data Mining 2009, SDM 2009
CountryUnited States
CitySparks, NV
Period4/30/095/2/09

Fingerprint

Join
Counting
Count
Table
Joining
Information management
Data mining
Propagation
High-dimensional Data
Data Management
Experiments
Execution Time
Tables
Data analysis
Data Mining
Attribute
Computing
Graph in graph theory
Estimate
Experiment

ASJC Scopus subject areas

  • Computational Theory and Mathematics
  • Software
  • Applied Mathematics

Cite this

Kim, J. W., & Candan, K. (2009). PICC counting: Who needs joins when you can propagate efficiently? In Society for Industrial and Applied Mathematics - 9th SIAM International Conference on Data Mining 2009, Proceedings in Applied Mathematics (Vol. 2, pp. 752-763)

PICC counting : Who needs joins when you can propagate efficiently? / Kim, Jong Wook; Candan, Kasim.

Society for Industrial and Applied Mathematics - 9th SIAM International Conference on Data Mining 2009, Proceedings in Applied Mathematics. Vol. 2 2009. p. 752-763.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Kim, JW & Candan, K 2009, PICC counting: Who needs joins when you can propagate efficiently? in Society for Industrial and Applied Mathematics - 9th SIAM International Conference on Data Mining 2009, Proceedings in Applied Mathematics. vol. 2, pp. 752-763, 9th SIAM International Conference on Data Mining 2009, SDM 2009, Sparks, NV, United States, 4/30/09.
Kim JW, Candan K. PICC counting: Who needs joins when you can propagate efficiently? In Society for Industrial and Applied Mathematics - 9th SIAM International Conference on Data Mining 2009, Proceedings in Applied Mathematics. Vol. 2. 2009. p. 752-763
Kim, Jong Wook ; Candan, Kasim. / PICC counting : Who needs joins when you can propagate efficiently?. Society for Industrial and Applied Mathematics - 9th SIAM International Conference on Data Mining 2009, Proceedings in Applied Mathematics. Vol. 2 2009. pp. 752-763
@inproceedings{03b3f3605d6d4d4ab1136976c66989e1,
title = "PICC counting: Who needs joins when you can propagate efficiently?",
abstract = "Counting is a common task in many data mining applications, including market basket data analysis, scientific inquiry, and other high dimensional data management applications. Given a single table, obtaining the instance counts of the entries in the table is relatively cheap. In situations where the attributes of interest are distributed across different tables, however, the problem of computing instance counts can be very expensive. The naive solution, joining all the relevant relations to obtain a single table suitable for counting, is rarely practical. In this paper, we propose PICC (Propagation-based Instance Counts on Concise Graphs), a novel counting technique for discovering instance counts in databases. We first propose a propagation-based instance counting scheme which avoids joins to obtain a single table. We then present a method for summarizing a database into a concise synopsis and describe how to use this along with the propagation scheme to estimate the required counts efficiently. The experiment results show that the proposed technique, PICC, provides significant execution time and accuracy gains over the existing solutions to this problem.",
author = "Kim, {Jong Wook} and Kasim Candan",
year = "2009",
language = "English (US)",
isbn = "9781615671090",
volume = "2",
pages = "752--763",
booktitle = "Society for Industrial and Applied Mathematics - 9th SIAM International Conference on Data Mining 2009, Proceedings in Applied Mathematics",

}

TY - GEN

T1 - PICC counting

T2 - Who needs joins when you can propagate efficiently?

AU - Kim, Jong Wook

AU - Candan, Kasim

PY - 2009

Y1 - 2009

N2 - Counting is a common task in many data mining applications, including market basket data analysis, scientific inquiry, and other high dimensional data management applications. Given a single table, obtaining the instance counts of the entries in the table is relatively cheap. In situations where the attributes of interest are distributed across different tables, however, the problem of computing instance counts can be very expensive. The naive solution, joining all the relevant relations to obtain a single table suitable for counting, is rarely practical. In this paper, we propose PICC (Propagation-based Instance Counts on Concise Graphs), a novel counting technique for discovering instance counts in databases. We first propose a propagation-based instance counting scheme which avoids joins to obtain a single table. We then present a method for summarizing a database into a concise synopsis and describe how to use this along with the propagation scheme to estimate the required counts efficiently. The experiment results show that the proposed technique, PICC, provides significant execution time and accuracy gains over the existing solutions to this problem.

AB - Counting is a common task in many data mining applications, including market basket data analysis, scientific inquiry, and other high dimensional data management applications. Given a single table, obtaining the instance counts of the entries in the table is relatively cheap. In situations where the attributes of interest are distributed across different tables, however, the problem of computing instance counts can be very expensive. The naive solution, joining all the relevant relations to obtain a single table suitable for counting, is rarely practical. In this paper, we propose PICC (Propagation-based Instance Counts on Concise Graphs), a novel counting technique for discovering instance counts in databases. We first propose a propagation-based instance counting scheme which avoids joins to obtain a single table. We then present a method for summarizing a database into a concise synopsis and describe how to use this along with the propagation scheme to estimate the required counts efficiently. The experiment results show that the proposed technique, PICC, provides significant execution time and accuracy gains over the existing solutions to this problem.

UR - http://www.scopus.com/inward/record.url?scp=72749085579&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=72749085579&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:72749085579

SN - 9781615671090

VL - 2

SP - 752

EP - 763

BT - Society for Industrial and Applied Mathematics - 9th SIAM International Conference on Data Mining 2009, Proceedings in Applied Mathematics

ER -