TY - GEN
T1 - Improving text collection selection with coverage and overlap statistics
AU - Hernandez, Thomas
AU - Kambhampati, Subbarao
PY - 2005/12/1
Y1 - 2005/12/1
N2 - In an environment of distributed text collections, the first step in the information retrieval process is to identify which of all available collections are more relevant to a given query and which should thus be accessed to answer the query. We address the challenge of collection selection when there is full or partial overlap between the available text collections, a scenario which has not been examined previously despite its real-world applications. To that end, we present COSCO, a collection selection approach which uses collection-specific coverage and overlap statistics. We describe our experimental results which show that the presented approach displays the desired behavior of retrieving more new results early on in the collection order, and performs consistently and significantly better than CORI, previously considered to be one of the best collection selection systems.
AB - In an environment of distributed text collections, the first step in the information retrieval process is to identify which of all available collections are more relevant to a given query and which should thus be accessed to answer the query. We address the challenge of collection selection when there is full or partial overlap between the available text collections, a scenario which has not been examined previously despite its real-world applications. To that end, we present COSCO, a collection selection approach which uses collection-specific coverage and overlap statistics. We describe our experimental results which show that the presented approach displays the desired behavior of retrieving more new results early on in the collection order, and performs consistently and significantly better than CORI, previously considered to be one of the best collection selection systems.
KW - Collection overlap
KW - Collection selection
KW - Statistics gathering
UR - http://www.scopus.com/inward/record.url?scp=77953053895&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=77953053895&partnerID=8YFLogxK
U2 - 10.1145/1062745.1062902
DO - 10.1145/1062745.1062902
M3 - Conference contribution
AN - SCOPUS:77953053895
SN - 1595930515
SN - 9781595930514
T3 - 14th International World Wide Web Conference, WWW2005
SP - 1128
EP - 1129
BT - 14th International World Wide Web Conference, WWW2005
T2 - 14th International World Wide Web Conference, WWW2005
Y2 - 10 May 2005 through 14 May 2005
ER -