Improving text collection selection with coverage and overlap statistics

Thomas Hernandez, Subbarao Kambhampati

Research output: Chapter in Book/Report/Conference proceedingConference contribution

14 Citations (Scopus)

Abstract

In an environment of distributed text collections, the first step in the information retrieval process is to identify which of all available collections are more relevant to a given query and which should thus be accessed to answer the query. We address the challenge of collection selection when there is full or partial overlap between the available text collections, a scenario which has not been examined previously despite its real-world applications. To that end, we present COSCO, a collection selection approach which uses collection-specific coverage and overlap statistics. We describe our experimental results which show that the presented approach displays the desired behavior of retrieving more new results early on in the collection order, and performs consistently and significantly better than CORI, previously considered to be one of the best collection selection systems.

Original languageEnglish (US)
Title of host publication14th International World Wide Web Conference, WWW2005
Pages1128-1129
Number of pages2
DOIs
StatePublished - 2005
Event14th International World Wide Web Conference, WWW2005 - Chiba, Japan
Duration: May 10 2005May 14 2005

Other

Other14th International World Wide Web Conference, WWW2005
CountryJapan
CityChiba
Period5/10/055/14/05

Fingerprint

Information retrieval
Statistics

Keywords

  • Collection overlap
  • Collection selection
  • Statistics gathering

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Software

Cite this

Hernandez, T., & Kambhampati, S. (2005). Improving text collection selection with coverage and overlap statistics. In 14th International World Wide Web Conference, WWW2005 (pp. 1128-1129) https://doi.org/10.1145/1062745.1062902

Improving text collection selection with coverage and overlap statistics. / Hernandez, Thomas; Kambhampati, Subbarao.

14th International World Wide Web Conference, WWW2005. 2005. p. 1128-1129.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Hernandez, T & Kambhampati, S 2005, Improving text collection selection with coverage and overlap statistics. in 14th International World Wide Web Conference, WWW2005. pp. 1128-1129, 14th International World Wide Web Conference, WWW2005, Chiba, Japan, 5/10/05. https://doi.org/10.1145/1062745.1062902
Hernandez T, Kambhampati S. Improving text collection selection with coverage and overlap statistics. In 14th International World Wide Web Conference, WWW2005. 2005. p. 1128-1129 https://doi.org/10.1145/1062745.1062902
Hernandez, Thomas ; Kambhampati, Subbarao. / Improving text collection selection with coverage and overlap statistics. 14th International World Wide Web Conference, WWW2005. 2005. pp. 1128-1129
@inproceedings{b2d2882f1d3c40b4aa6fa91c329e17e4,
title = "Improving text collection selection with coverage and overlap statistics",
abstract = "In an environment of distributed text collections, the first step in the information retrieval process is to identify which of all available collections are more relevant to a given query and which should thus be accessed to answer the query. We address the challenge of collection selection when there is full or partial overlap between the available text collections, a scenario which has not been examined previously despite its real-world applications. To that end, we present COSCO, a collection selection approach which uses collection-specific coverage and overlap statistics. We describe our experimental results which show that the presented approach displays the desired behavior of retrieving more new results early on in the collection order, and performs consistently and significantly better than CORI, previously considered to be one of the best collection selection systems.",
keywords = "Collection overlap, Collection selection, Statistics gathering",
author = "Thomas Hernandez and Subbarao Kambhampati",
year = "2005",
doi = "10.1145/1062745.1062902",
language = "English (US)",
isbn = "1595930515",
pages = "1128--1129",
booktitle = "14th International World Wide Web Conference, WWW2005",

}

TY - GEN

T1 - Improving text collection selection with coverage and overlap statistics

AU - Hernandez, Thomas

AU - Kambhampati, Subbarao

PY - 2005

Y1 - 2005

N2 - In an environment of distributed text collections, the first step in the information retrieval process is to identify which of all available collections are more relevant to a given query and which should thus be accessed to answer the query. We address the challenge of collection selection when there is full or partial overlap between the available text collections, a scenario which has not been examined previously despite its real-world applications. To that end, we present COSCO, a collection selection approach which uses collection-specific coverage and overlap statistics. We describe our experimental results which show that the presented approach displays the desired behavior of retrieving more new results early on in the collection order, and performs consistently and significantly better than CORI, previously considered to be one of the best collection selection systems.

AB - In an environment of distributed text collections, the first step in the information retrieval process is to identify which of all available collections are more relevant to a given query and which should thus be accessed to answer the query. We address the challenge of collection selection when there is full or partial overlap between the available text collections, a scenario which has not been examined previously despite its real-world applications. To that end, we present COSCO, a collection selection approach which uses collection-specific coverage and overlap statistics. We describe our experimental results which show that the presented approach displays the desired behavior of retrieving more new results early on in the collection order, and performs consistently and significantly better than CORI, previously considered to be one of the best collection selection systems.

KW - Collection overlap

KW - Collection selection

KW - Statistics gathering

UR - http://www.scopus.com/inward/record.url?scp=77953053895&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=77953053895&partnerID=8YFLogxK

U2 - 10.1145/1062745.1062902

DO - 10.1145/1062745.1062902

M3 - Conference contribution

AN - SCOPUS:77953053895

SN - 1595930515

SN - 9781595930514

SP - 1128

EP - 1129

BT - 14th International World Wide Web Conference, WWW2005

ER -