Mining coverage statistics for websource selection in a mediator

Zaiqing Nie, Ullas Nambiar, Sreelakshmi Vaddi, Subbarao Kambhampati

Research output: Chapter in Book/Report/Conference proceedingConference contribution

6 Scopus citations

Abstract

Recent work in data integration has shown the importance of statistical information about the coverage and overlap of sources for efficient query processing. Despite this recognition there are no effective approaches for learning the needed statistics. The key challenge in learning such statistics is keeping the number of needed statistics low enough to have the storage and learning costs manageable. Naive approaches can become infeasible very quickly. In this paper we present a set of connected techniques that estimate the coverage and overlap statistics while keeping the needed statistics tightly under control. Our approach uses a hierarchical classification of the queries, and threshold based variants of familiar data mining techniques to dynamically decide the level of resolution at which to learn the statistics. We describe the details of our method, and present experimental results demonstrating the efficiency of the learning algorithms and the effectiveness of the learned statistics.

Original languageEnglish (US)
Title of host publicationInternational Conference on Information and Knowledge Management, Proceedings
EditorsK Kalpakis, N Goharian, D Grossman
Pages678-680
Number of pages3
StatePublished - 2002
EventProceedings of the Eleventh International Conference on Information and Knowledge Management (CIKM 2002) - McLean, VA, United States
Duration: Nov 4 2002Nov 9 2002

Other

OtherProceedings of the Eleventh International Conference on Information and Knowledge Management (CIKM 2002)
CountryUnited States
CityMcLean, VA
Period11/4/0211/9/02

Keywords

  • Coverage statistics
  • Web-based data integration
  • Webmining to support query optimization

ASJC Scopus subject areas

  • Business, Management and Accounting(all)

Fingerprint Dive into the research topics of 'Mining coverage statistics for websource selection in a mediator'. Together they form a unique fingerprint.

Cite this