Abstract
Existing algorithms for collection selection make the assumption that all collections are disjoint. This is an unrealistic assumption in most scenarios. Consequently, in practice, the existing approaches are liable to access sources that do not return any new (not yet seen) answers. Our approach combines relevance as well as inter-source overlap information to provide significantly better collection selection capability. The crux of our invention is an efficient way of estimating the overlap statistics between collections and using this effectively combining this information in collection selection.
Original language | English (US) |
---|---|
State | Published - Mar 11 2005 |