Mining coverage statistics for websource selection in a mediator

Zaiqing Nie; Ullas Nambiar; Sreelakshmi Vaddi; Subbarao Kambhampati

Mining coverage statistics for websource selection in a mediator

Zaiqing Nie, Ullas Nambiar, Sreelakshmi Vaddi, Subbarao Kambhampati

Computer Science and Engineering

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

6 Scopus citations

Abstract

Recent work in data integration has shown the importance of statistical information about the coverage and overlap of sources for efficient query processing. Despite this recognition there are no effective approaches for learning the needed statistics. The key challenge in learning such statistics is keeping the number of needed statistics low enough to have the storage and learning costs manageable. Naive approaches can become infeasible very quickly. In this paper we present a set of connected techniques that estimate the coverage and overlap statistics while keeping the needed statistics tightly under control. Our approach uses a hierarchical classification of the queries, and threshold based variants of familiar data mining techniques to dynamically decide the level of resolution at which to learn the statistics. We describe the details of our method, and present experimental results demonstrating the efficiency of the learning algorithms and the effectiveness of the learned statistics.

Original language	English (US)
Title of host publication	International Conference on Information and Knowledge Management, Proceedings
Editors	K Kalpakis, N Goharian, D Grossman
Pages	678-680
Number of pages	3
State	Published - 2002
Event	Proceedings of the Eleventh International Conference on Information and Knowledge Management (CIKM 2002) - McLean, VA, United States Duration: Nov 4 2002 → Nov 9 2002

Other

Other	Proceedings of the Eleventh International Conference on Information and Knowledge Management (CIKM 2002)
Country/Territory	United States
City	McLean, VA
Period	11/4/02 → 11/9/02

Keywords

Coverage statistics
Web-based data integration
Webmining to support query optimization

ASJC Scopus subject areas

General Business, Management and Accounting

Cite this

Nie, Z, Nambiar, U, Vaddi, S & Kambhampati, S 2002, Mining coverage statistics for websource selection in a mediator. in K Kalpakis, N Goharian & D Grossman (eds), International Conference on Information and Knowledge Management, Proceedings. pp. 678-680, Proceedings of the Eleventh International Conference on Information and Knowledge Management (CIKM 2002), McLean, VA, United States, 11/4/02.

@inproceedings{d1877d676ac4405a9e1f9a99150e912f,

title = "Mining coverage statistics for websource selection in a mediator",

abstract = "Recent work in data integration has shown the importance of statistical information about the coverage and overlap of sources for efficient query processing. Despite this recognition there are no effective approaches for learning the needed statistics. The key challenge in learning such statistics is keeping the number of needed statistics low enough to have the storage and learning costs manageable. Naive approaches can become infeasible very quickly. In this paper we present a set of connected techniques that estimate the coverage and overlap statistics while keeping the needed statistics tightly under control. Our approach uses a hierarchical classification of the queries, and threshold based variants of familiar data mining techniques to dynamically decide the level of resolution at which to learn the statistics. We describe the details of our method, and present experimental results demonstrating the efficiency of the learning algorithms and the effectiveness of the learned statistics.",

keywords = "Coverage statistics, Web-based data integration, Webmining to support query optimization",

author = "Zaiqing Nie and Ullas Nambiar and Sreelakshmi Vaddi and Subbarao Kambhampati",

year = "2002",

language = "English (US)",

pages = "678--680",

editor = "K Kalpakis and N Goharian and D Grossman",

booktitle = "International Conference on Information and Knowledge Management, Proceedings",

note = "Proceedings of the Eleventh International Conference on Information and Knowledge Management (CIKM 2002) ; Conference date: 04-11-2002 Through 09-11-2002",

}

TY - GEN

T1 - Mining coverage statistics for websource selection in a mediator

AU - Nie, Zaiqing

AU - Nambiar, Ullas

AU - Vaddi, Sreelakshmi

AU - Kambhampati, Subbarao

PY - 2002

Y1 - 2002

N2 - Recent work in data integration has shown the importance of statistical information about the coverage and overlap of sources for efficient query processing. Despite this recognition there are no effective approaches for learning the needed statistics. The key challenge in learning such statistics is keeping the number of needed statistics low enough to have the storage and learning costs manageable. Naive approaches can become infeasible very quickly. In this paper we present a set of connected techniques that estimate the coverage and overlap statistics while keeping the needed statistics tightly under control. Our approach uses a hierarchical classification of the queries, and threshold based variants of familiar data mining techniques to dynamically decide the level of resolution at which to learn the statistics. We describe the details of our method, and present experimental results demonstrating the efficiency of the learning algorithms and the effectiveness of the learned statistics.

AB - Recent work in data integration has shown the importance of statistical information about the coverage and overlap of sources for efficient query processing. Despite this recognition there are no effective approaches for learning the needed statistics. The key challenge in learning such statistics is keeping the number of needed statistics low enough to have the storage and learning costs manageable. Naive approaches can become infeasible very quickly. In this paper we present a set of connected techniques that estimate the coverage and overlap statistics while keeping the needed statistics tightly under control. Our approach uses a hierarchical classification of the queries, and threshold based variants of familiar data mining techniques to dynamically decide the level of resolution at which to learn the statistics. We describe the details of our method, and present experimental results demonstrating the efficiency of the learning algorithms and the effectiveness of the learned statistics.

KW - Coverage statistics

KW - Web-based data integration

KW - Webmining to support query optimization

UR - http://www.scopus.com/inward/record.url?scp=0038156127&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0038156127&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:0038156127

SP - 678

EP - 680

BT - International Conference on Information and Knowledge Management, Proceedings

A2 - Kalpakis, K

A2 - Goharian, N

A2 - Grossman, D

T2 - Proceedings of the Eleventh International Conference on Information and Knowledge Management (CIKM 2002)

Y2 - 4 November 2002 through 9 November 2002

ER -

Mining coverage statistics for websource selection in a mediator

Abstract

Other

Keywords

ASJC Scopus subject areas

Other files and links

Fingerprint

Cite this