A frequency-based approach for mining coverage statistics in data integration

Research output: Chapter in Book/Report/Conference proceedingConference contribution

16 Citations (Scopus)

Abstract

Query optimization in data integration requires source coverage and overlap statistics. Gathering and storing the required statistics presents many challenges, not the least of which is controlling the amount of statistics learned. In this paper we introduce StatMiner, a novel statistics mining approach which automatically generates attribute value hierarchies, efficiently discovers frequently accessed query classes based on the learned attribute value hierarchies, and learns statistics only with respect to these classes. We describe the details of our method, and present experimental results demonstrating the efficiency and effectiveness of our approach. Our experiments are done in the context of BibFinder, a publicly fielded bibliography mediator.

Original languageEnglish (US)
Title of host publicationProceedings - International Conference on Data Engineering
Pages387-398
Number of pages12
Volume20
StatePublished - 2004
EventProceedings - 20th International Conference on Data Engineering - ICDE 2004 - Boston, MA., United States
Duration: Mar 30 2004Apr 2 2004

Other

OtherProceedings - 20th International Conference on Data Engineering - ICDE 2004
CountryUnited States
CityBoston, MA.
Period3/30/044/2/04

Fingerprint

Data integration
Statistics
Bibliographies
Experiments

ASJC Scopus subject areas

  • Software
  • Engineering(all)
  • Engineering (miscellaneous)

Cite this

Nie, Z., & Kambhampati, S. (2004). A frequency-based approach for mining coverage statistics in data integration. In Proceedings - International Conference on Data Engineering (Vol. 20, pp. 387-398)

A frequency-based approach for mining coverage statistics in data integration. / Nie, Zaiqing; Kambhampati, Subbarao.

Proceedings - International Conference on Data Engineering. Vol. 20 2004. p. 387-398.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Nie, Z & Kambhampati, S 2004, A frequency-based approach for mining coverage statistics in data integration. in Proceedings - International Conference on Data Engineering. vol. 20, pp. 387-398, Proceedings - 20th International Conference on Data Engineering - ICDE 2004, Boston, MA., United States, 3/30/04.
Nie Z, Kambhampati S. A frequency-based approach for mining coverage statistics in data integration. In Proceedings - International Conference on Data Engineering. Vol. 20. 2004. p. 387-398
Nie, Zaiqing ; Kambhampati, Subbarao. / A frequency-based approach for mining coverage statistics in data integration. Proceedings - International Conference on Data Engineering. Vol. 20 2004. pp. 387-398
@inproceedings{0b629e19fc4844a48fa292e4bb912504,
title = "A frequency-based approach for mining coverage statistics in data integration",
abstract = "Query optimization in data integration requires source coverage and overlap statistics. Gathering and storing the required statistics presents many challenges, not the least of which is controlling the amount of statistics learned. In this paper we introduce StatMiner, a novel statistics mining approach which automatically generates attribute value hierarchies, efficiently discovers frequently accessed query classes based on the learned attribute value hierarchies, and learns statistics only with respect to these classes. We describe the details of our method, and present experimental results demonstrating the efficiency and effectiveness of our approach. Our experiments are done in the context of BibFinder, a publicly fielded bibliography mediator.",
author = "Zaiqing Nie and Subbarao Kambhampati",
year = "2004",
language = "English (US)",
volume = "20",
pages = "387--398",
booktitle = "Proceedings - International Conference on Data Engineering",

}

TY - GEN

T1 - A frequency-based approach for mining coverage statistics in data integration

AU - Nie, Zaiqing

AU - Kambhampati, Subbarao

PY - 2004

Y1 - 2004

N2 - Query optimization in data integration requires source coverage and overlap statistics. Gathering and storing the required statistics presents many challenges, not the least of which is controlling the amount of statistics learned. In this paper we introduce StatMiner, a novel statistics mining approach which automatically generates attribute value hierarchies, efficiently discovers frequently accessed query classes based on the learned attribute value hierarchies, and learns statistics only with respect to these classes. We describe the details of our method, and present experimental results demonstrating the efficiency and effectiveness of our approach. Our experiments are done in the context of BibFinder, a publicly fielded bibliography mediator.

AB - Query optimization in data integration requires source coverage and overlap statistics. Gathering and storing the required statistics presents many challenges, not the least of which is controlling the amount of statistics learned. In this paper we introduce StatMiner, a novel statistics mining approach which automatically generates attribute value hierarchies, efficiently discovers frequently accessed query classes based on the learned attribute value hierarchies, and learns statistics only with respect to these classes. We describe the details of our method, and present experimental results demonstrating the efficiency and effectiveness of our approach. Our experiments are done in the context of BibFinder, a publicly fielded bibliography mediator.

UR - http://www.scopus.com/inward/record.url?scp=2442473065&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=2442473065&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:2442473065

VL - 20

SP - 387

EP - 398

BT - Proceedings - International Conference on Data Engineering

ER -