Optimizing Recursive Information Gathering Plans in EMERAC

Subbarao Kambhampatt, Eric Lambrecht, Ullas Nambiar, Zaiqing Nie, Gnanaprakasam Senthil

Research output: Contribution to journalArticlepeer-review

14 Scopus citations

Abstract

In this paper we describe two optimization techniques that are specially tailored for information gathering. The first is a greedy minimization algorithm that minimizes an information gathering plan by removing redundant and overlapping information sources without loss of completeness. We then discuss a set of heuristics that guide the greedy minimization algorithm so as to remove costlier information sources first. In contrast to previous work, our approach can handle recursive query plans that arise commonly in the presence of constrained sources. Second, we present a method for ordering the access to sources to reduce the execution cost. This problem differs significantly from the traditional database query optimization problem as sources on the Internet have a variety of access limitations and the execution cost in information gathering is affected both by network traffic and by the connection setup costs. Furthermore, because of the autonomous and decentralized nature of the Web, very little cost statistics about the sources may be available. In this paper, we propose a heuristic algorithm for ordering source calls that takes these constraints into account. Specifically, our algorithm takes both access costs and traffic costs into account, and is able to operate with very coarse statistics about sources (i.e., without depending on full source statistics). Finally, we will discuss implementation and empirical evaluation of these methods in Emerac, our prototype information gathering system.

Original languageEnglish (US)
Pages (from-to)119-153
Number of pages35
JournalJournal of Intelligent Information Systems
Volume22
Issue number2
DOIs
StatePublished - Mar 2004
Externally publishedYes

Keywords

  • Data integration
  • Information gathering
  • Query optimization
  • Web and databases

ASJC Scopus subject areas

  • Software
  • Information Systems
  • Hardware and Architecture
  • Computer Networks and Communications
  • Artificial Intelligence

Fingerprint

Dive into the research topics of 'Optimizing Recursive Information Gathering Plans in EMERAC'. Together they form a unique fingerprint.

Cite this