### Abstract

Counting is a common task in many data mining applications, including market basket data analysis, scientific inquiry, and other high dimensional data management applications. Given a single table, obtaining the instance counts of the entries in the table is relatively cheap. In situations where the attributes of interest are distributed across different tables, however, the problem of computing instance counts can be very expensive. The naive solution, joining all the relevant relations to obtain a single table suitable for counting, is rarely practical. In this paper, we propose PICC (Propagation-based Instance Counts on Concise Graphs), a novel counting technique for discovering instance counts in databases. We first propose a propagation-based instance counting scheme which avoids joins to obtain a single table. We then present a method for summarizing a database into a concise synopsis and describe how to use this along with the propagation scheme to estimate the required counts efficiently. The experiment results show that the proposed technique, PICC, provides significant execution time and accuracy gains over the existing solutions to this problem.

Original language | English (US) |
---|---|

Title of host publication | Society for Industrial and Applied Mathematics - 9th SIAM International Conference on Data Mining 2009, Proceedings in Applied Mathematics |

Pages | 752-763 |

Number of pages | 12 |

Volume | 2 |

State | Published - 2009 |

Event | 9th SIAM International Conference on Data Mining 2009, SDM 2009 - Sparks, NV, United States Duration: Apr 30 2009 → May 2 2009 |

### Other

Other | 9th SIAM International Conference on Data Mining 2009, SDM 2009 |
---|---|

Country | United States |

City | Sparks, NV |

Period | 4/30/09 → 5/2/09 |

### Fingerprint

### ASJC Scopus subject areas

- Computational Theory and Mathematics
- Software
- Applied Mathematics

### Cite this

*Society for Industrial and Applied Mathematics - 9th SIAM International Conference on Data Mining 2009, Proceedings in Applied Mathematics*(Vol. 2, pp. 752-763)

**PICC counting : Who needs joins when you can propagate efficiently?** / Kim, Jong Wook; Candan, Kasim.

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

*Society for Industrial and Applied Mathematics - 9th SIAM International Conference on Data Mining 2009, Proceedings in Applied Mathematics.*vol. 2, pp. 752-763, 9th SIAM International Conference on Data Mining 2009, SDM 2009, Sparks, NV, United States, 4/30/09.

}

TY - GEN

T1 - PICC counting

T2 - Who needs joins when you can propagate efficiently?

AU - Kim, Jong Wook

AU - Candan, Kasim

PY - 2009

Y1 - 2009

N2 - Counting is a common task in many data mining applications, including market basket data analysis, scientific inquiry, and other high dimensional data management applications. Given a single table, obtaining the instance counts of the entries in the table is relatively cheap. In situations where the attributes of interest are distributed across different tables, however, the problem of computing instance counts can be very expensive. The naive solution, joining all the relevant relations to obtain a single table suitable for counting, is rarely practical. In this paper, we propose PICC (Propagation-based Instance Counts on Concise Graphs), a novel counting technique for discovering instance counts in databases. We first propose a propagation-based instance counting scheme which avoids joins to obtain a single table. We then present a method for summarizing a database into a concise synopsis and describe how to use this along with the propagation scheme to estimate the required counts efficiently. The experiment results show that the proposed technique, PICC, provides significant execution time and accuracy gains over the existing solutions to this problem.

AB - Counting is a common task in many data mining applications, including market basket data analysis, scientific inquiry, and other high dimensional data management applications. Given a single table, obtaining the instance counts of the entries in the table is relatively cheap. In situations where the attributes of interest are distributed across different tables, however, the problem of computing instance counts can be very expensive. The naive solution, joining all the relevant relations to obtain a single table suitable for counting, is rarely practical. In this paper, we propose PICC (Propagation-based Instance Counts on Concise Graphs), a novel counting technique for discovering instance counts in databases. We first propose a propagation-based instance counting scheme which avoids joins to obtain a single table. We then present a method for summarizing a database into a concise synopsis and describe how to use this along with the propagation scheme to estimate the required counts efficiently. The experiment results show that the proposed technique, PICC, provides significant execution time and accuracy gains over the existing solutions to this problem.

UR - http://www.scopus.com/inward/record.url?scp=72749085579&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=72749085579&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:72749085579

SN - 9781615671090

VL - 2

SP - 752

EP - 763

BT - Society for Industrial and Applied Mathematics - 9th SIAM International Conference on Data Mining 2009, Proceedings in Applied Mathematics

ER -