TY - JOUR
T1 - DataRover
T2 - An automated system for extracting product information from online catalogs
AU - Ahmed, Syed Toufeeq
AU - Vadrevu, Srinivas
AU - Davulcu, Hasan
PY - 2006
Y1 - 2006
N2 - The increasing number of e-commerce Web sites on the Web introduces numerous challenges in organizing and searching the product information across multiple Web sites. This problem is further exacerbated by various presentation templates that different Web sites use in presenting their product information, and different ways of product information they store in their catalogs. This paper describes the DataRover system, which can automatically crawl and extract all products from online catalogs. DataRover is based on pattern mining algorithms and domain specific heuristics which utilize the navigational and presentation regularities to identify taxonomy, list-of-product and single-product segments within an online catalog. Next, it uses the inferred patterns to extract data from all such data segments and to automatically transform an online catalog into a database of categorized products. We also provide experimental results to demonstrate the efficacy of the DataRover.
AB - The increasing number of e-commerce Web sites on the Web introduces numerous challenges in organizing and searching the product information across multiple Web sites. This problem is further exacerbated by various presentation templates that different Web sites use in presenting their product information, and different ways of product information they store in their catalogs. This paper describes the DataRover system, which can automatically crawl and extract all products from online catalogs. DataRover is based on pattern mining algorithms and domain specific heuristics which utilize the navigational and presentation regularities to identify taxonomy, list-of-product and single-product segments within an online catalog. Next, it uses the inferred patterns to extract data from all such data segments and to automatically transform an online catalog into a database of categorized products. We also provide experimental results to demonstrate the efficacy of the DataRover.
UR - http://www.scopus.com/inward/record.url?scp=33748899694&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=33748899694&partnerID=8YFLogxK
U2 - 10.1007/3-540-33880-2_1
DO - 10.1007/3-540-33880-2_1
M3 - Article
AN - SCOPUS:33748899694
SN - 1860-949X
VL - 23
SP - 1
EP - 10
JO - Studies in Computational Intelligence
JF - Studies in Computational Intelligence
ER -