Abstract
Existing approaches for optimizing queries in data integration use decoupled strategies-attempting to optimize coverage and cost in two separate phases. Since sources tend to have a variety of access limitations, such phased optimization of cost and coverage can unfortunately lead to expensive planning as well as highly inefficient plans. In this paper we present techniques for joint optimization of cost and coverage of the query plans. Our algorithms search in the space of parallel query plans that support multiple sources for each subgoal conjunct. The refinement of the partial plans takes into account the potential parallelism between source calls, and the binding compatibilities between the sources included in the plan. We start by introducing and motivating our query plan representation. We then briefly review how to compute the cost and coverage of a parallel plan. Next, we provide both a System-R style query optimization algorithm as well as a greedy local search algorithm for searching in the space of such query plans. Finally we present a simulation study that demonstrates that the plans generated by our approach will be significantly better, both in terms of planning cost, and in terms of plan execution cost, compared to the existing approaches.
Original language | English (US) |
---|---|
Title of host publication | International Conference on Information and Knowledge Management, Proceedings |
Editors | H. Paques, L. Liu |
Pages | 223-230 |
Number of pages | 8 |
State | Published - 2001 |
Event | Proceedings of the 2001 ACM CIKM: 10th International Conference on Information and Knowledge Management - Atlanta, GA, United States Duration: Nov 5 2001 → Nov 10 2001 |
Other
Other | Proceedings of the 2001 ACM CIKM: 10th International Conference on Information and Knowledge Management |
---|---|
Country/Territory | United States |
City | Atlanta, GA |
Period | 11/5/01 → 11/10/01 |
ASJC Scopus subject areas
- Business, Management and Accounting(all)