This chapter presents StatMiner, a system for estimating the coverage and overlap statistics while keeping the needed statistics tightly under control. StatMiner uses a hierarchical classification of the queries, and threshold based variants of familiar data mining techniques to dynamically decide the level of resolution at which to learn the statistics. The chapter demonstrates the major functionalities of StatMiner and the effectiveness of the learned statistics in BibFinder, a publicly available computer science bibliography mediator. The sources that BibFinder integrates are autonomous and can have uncontrolled coverage and overlap. An important focus in BibFinder was thus to mine coverage and overlap statistics about these sources and to exploit them to improve query processing.
|Original language||English (US)|
|Title of host publication||Proceedings 2003 VLDB Conference|
|Subtitle of host publication||29th International Conference on Very Large Databases (VLDB)|
|Number of pages||4|
|State||Published - Jan 1 2003|
ASJC Scopus subject areas
- Computer Science(all)