DataRover: An automated system for extracting product information from online catalogs

Syed Toufeeq Ahmed, Srinivas Vadrevu, Hasan Davulcu

Research output: Contribution to journalArticle

1 Citation (Scopus)

Abstract

The increasing number of e-commerce Web sites on the Web introduces numerous challenges in organizing and searching the product information across multiple Web sites. This problem is further exacerbated by various presentation templates that different Web sites use in presenting their product information, and different ways of product information they store in their catalogs. This paper describes the DataRover system, which can automatically crawl and extract all products from online catalogs. DataRover is based on pattern mining algorithms and domain specific heuristics which utilize the navigational and presentation regularities to identify taxonomy, list-of-product and single-product segments within an online catalog. Next, it uses the inferred patterns to extract data from all such data segments and to automatically transform an online catalog into a database of categorized products. We also provide experimental results to demonstrate the efficacy of the DataRover.

Original languageEnglish (US)
Pages (from-to)1-10
Number of pages10
JournalStudies in Computational Intelligence
Volume23
DOIs
StatePublished - 2006

Fingerprint

Websites
Taxonomies
World Wide Web

ASJC Scopus subject areas

  • Artificial Intelligence

Cite this

DataRover : An automated system for extracting product information from online catalogs. / Ahmed, Syed Toufeeq; Vadrevu, Srinivas; Davulcu, Hasan.

In: Studies in Computational Intelligence, Vol. 23, 2006, p. 1-10.

Research output: Contribution to journalArticle

@article{7a3c3549af8d49f1b3981c43708adddb,
title = "DataRover: An automated system for extracting product information from online catalogs",
abstract = "The increasing number of e-commerce Web sites on the Web introduces numerous challenges in organizing and searching the product information across multiple Web sites. This problem is further exacerbated by various presentation templates that different Web sites use in presenting their product information, and different ways of product information they store in their catalogs. This paper describes the DataRover system, which can automatically crawl and extract all products from online catalogs. DataRover is based on pattern mining algorithms and domain specific heuristics which utilize the navigational and presentation regularities to identify taxonomy, list-of-product and single-product segments within an online catalog. Next, it uses the inferred patterns to extract data from all such data segments and to automatically transform an online catalog into a database of categorized products. We also provide experimental results to demonstrate the efficacy of the DataRover.",
author = "Ahmed, {Syed Toufeeq} and Srinivas Vadrevu and Hasan Davulcu",
year = "2006",
doi = "10.1007/3-540-33880-2_1",
language = "English (US)",
volume = "23",
pages = "1--10",
journal = "Studies in Computational Intelligence",
issn = "1860-949X",
publisher = "Springer Verlag",

}

TY - JOUR

T1 - DataRover

T2 - An automated system for extracting product information from online catalogs

AU - Ahmed, Syed Toufeeq

AU - Vadrevu, Srinivas

AU - Davulcu, Hasan

PY - 2006

Y1 - 2006

N2 - The increasing number of e-commerce Web sites on the Web introduces numerous challenges in organizing and searching the product information across multiple Web sites. This problem is further exacerbated by various presentation templates that different Web sites use in presenting their product information, and different ways of product information they store in their catalogs. This paper describes the DataRover system, which can automatically crawl and extract all products from online catalogs. DataRover is based on pattern mining algorithms and domain specific heuristics which utilize the navigational and presentation regularities to identify taxonomy, list-of-product and single-product segments within an online catalog. Next, it uses the inferred patterns to extract data from all such data segments and to automatically transform an online catalog into a database of categorized products. We also provide experimental results to demonstrate the efficacy of the DataRover.

AB - The increasing number of e-commerce Web sites on the Web introduces numerous challenges in organizing and searching the product information across multiple Web sites. This problem is further exacerbated by various presentation templates that different Web sites use in presenting their product information, and different ways of product information they store in their catalogs. This paper describes the DataRover system, which can automatically crawl and extract all products from online catalogs. DataRover is based on pattern mining algorithms and domain specific heuristics which utilize the navigational and presentation regularities to identify taxonomy, list-of-product and single-product segments within an online catalog. Next, it uses the inferred patterns to extract data from all such data segments and to automatically transform an online catalog into a database of categorized products. We also provide experimental results to demonstrate the efficacy of the DataRover.

UR - http://www.scopus.com/inward/record.url?scp=33748899694&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=33748899694&partnerID=8YFLogxK

U2 - 10.1007/3-540-33880-2_1

DO - 10.1007/3-540-33880-2_1

M3 - Article

AN - SCOPUS:33748899694

VL - 23

SP - 1

EP - 10

JO - Studies in Computational Intelligence

JF - Studies in Computational Intelligence

SN - 1860-949X

ER -