Improving text collection selection with coverage and overlap statistics

Thomas Hernandez, Subbarao Kambhampati

Research output: Chapter in Book/Report/Conference proceedingConference contribution

14 Scopus citations

Abstract

In an environment of distributed text collections, the first step in the information retrieval process is to identify which of all available collections are more relevant to a given query and which should thus be accessed to answer the query. We address the challenge of collection selection when there is full or partial overlap between the available text collections, a scenario which has not been examined previously despite its real-world applications. To that end, we present COSCO, a collection selection approach which uses collection-specific coverage and overlap statistics. We describe our experimental results which show that the presented approach displays the desired behavior of retrieving more new results early on in the collection order, and performs consistently and significantly better than CORI, previously considered to be one of the best collection selection systems.

Original languageEnglish (US)
Title of host publication14th International World Wide Web Conference, WWW2005
Pages1128-1129
Number of pages2
DOIs
StatePublished - Dec 1 2005
Event14th International World Wide Web Conference, WWW2005 - Chiba, Japan
Duration: May 10 2005May 14 2005

Publication series

Name14th International World Wide Web Conference, WWW2005

Other

Other14th International World Wide Web Conference, WWW2005
Country/TerritoryJapan
CityChiba
Period5/10/055/14/05

Keywords

  • Collection overlap
  • Collection selection
  • Statistics gathering

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Software

Fingerprint

Dive into the research topics of 'Improving text collection selection with coverage and overlap statistics'. Together they form a unique fingerprint.

Cite this