SMARTINT: Using mined attribute dependencies to integrate fragmented web databases

Ravi Gummadi, Anupam Khulbe, Aravind Kalavagattu, Sanil Salvi, Subbarao Kambhampati

Research output: Contribution to journalArticle

10 Citations (Scopus)

Abstract

Many web databases can be seen as providing partial and overlapping information about entities in the world. To answer queries effectively, we need to integrate the information about the individual entities that are fragmented over multiple sources. At first blush this is just the inverse of traditional database normalization problem-rather than go from a universal relation to normalized tables, we want to reconstruct the universal relation given the tables (sources). The standard way of reconstructing the entities will involve joining the tables. Unfortunately, because of the autonomous and decentralized way in which the sources are populated, they often do not have Primary Key-Foreign Key relations. While tables may share attributes, naive joins over these shared attributes can result in reconstruction of many spurious entities thus seriously compromising precision. Our system, SmartInt is aimed at addressing the problem of data integration in such scenarios. Given a query, our system uses the Approximate Functional Dependencies (AFDs) to piece together a tree of relevant tables to answer it. The result tuples produced by our system are able to strike a favorable balance between precision and recall.

Original languageEnglish (US)
Pages (from-to)575-599
Number of pages25
JournalJournal of Intelligent Information Systems
Volume38
Issue number3
DOIs
StatePublished - Jun 2012

Fingerprint

Data integration
Joining

Keywords

  • Information integration
  • Loss of PK/FK
  • Web databases

ASJC Scopus subject areas

  • Artificial Intelligence
  • Information Systems
  • Hardware and Architecture
  • Computer Networks and Communications
  • Software

Cite this

SMARTINT : Using mined attribute dependencies to integrate fragmented web databases. / Gummadi, Ravi; Khulbe, Anupam; Kalavagattu, Aravind; Salvi, Sanil; Kambhampati, Subbarao.

In: Journal of Intelligent Information Systems, Vol. 38, No. 3, 06.2012, p. 575-599.

Research output: Contribution to journalArticle

Gummadi, Ravi ; Khulbe, Anupam ; Kalavagattu, Aravind ; Salvi, Sanil ; Kambhampati, Subbarao. / SMARTINT : Using mined attribute dependencies to integrate fragmented web databases. In: Journal of Intelligent Information Systems. 2012 ; Vol. 38, No. 3. pp. 575-599.
@article{f694d00bce214d67bc2b0f2bc9031cfa,
title = "SMARTINT: Using mined attribute dependencies to integrate fragmented web databases",
abstract = "Many web databases can be seen as providing partial and overlapping information about entities in the world. To answer queries effectively, we need to integrate the information about the individual entities that are fragmented over multiple sources. At first blush this is just the inverse of traditional database normalization problem-rather than go from a universal relation to normalized tables, we want to reconstruct the universal relation given the tables (sources). The standard way of reconstructing the entities will involve joining the tables. Unfortunately, because of the autonomous and decentralized way in which the sources are populated, they often do not have Primary Key-Foreign Key relations. While tables may share attributes, naive joins over these shared attributes can result in reconstruction of many spurious entities thus seriously compromising precision. Our system, SmartInt is aimed at addressing the problem of data integration in such scenarios. Given a query, our system uses the Approximate Functional Dependencies (AFDs) to piece together a tree of relevant tables to answer it. The result tuples produced by our system are able to strike a favorable balance between precision and recall.",
keywords = "Information integration, Loss of PK/FK, Web databases",
author = "Ravi Gummadi and Anupam Khulbe and Aravind Kalavagattu and Sanil Salvi and Subbarao Kambhampati",
year = "2012",
month = "6",
doi = "10.1007/s10844-011-0169-0",
language = "English (US)",
volume = "38",
pages = "575--599",
journal = "Journal of Intelligent Information Systems",
issn = "0925-9902",
publisher = "Springer Netherlands",
number = "3",

}

TY - JOUR

T1 - SMARTINT

T2 - Using mined attribute dependencies to integrate fragmented web databases

AU - Gummadi, Ravi

AU - Khulbe, Anupam

AU - Kalavagattu, Aravind

AU - Salvi, Sanil

AU - Kambhampati, Subbarao

PY - 2012/6

Y1 - 2012/6

N2 - Many web databases can be seen as providing partial and overlapping information about entities in the world. To answer queries effectively, we need to integrate the information about the individual entities that are fragmented over multiple sources. At first blush this is just the inverse of traditional database normalization problem-rather than go from a universal relation to normalized tables, we want to reconstruct the universal relation given the tables (sources). The standard way of reconstructing the entities will involve joining the tables. Unfortunately, because of the autonomous and decentralized way in which the sources are populated, they often do not have Primary Key-Foreign Key relations. While tables may share attributes, naive joins over these shared attributes can result in reconstruction of many spurious entities thus seriously compromising precision. Our system, SmartInt is aimed at addressing the problem of data integration in such scenarios. Given a query, our system uses the Approximate Functional Dependencies (AFDs) to piece together a tree of relevant tables to answer it. The result tuples produced by our system are able to strike a favorable balance between precision and recall.

AB - Many web databases can be seen as providing partial and overlapping information about entities in the world. To answer queries effectively, we need to integrate the information about the individual entities that are fragmented over multiple sources. At first blush this is just the inverse of traditional database normalization problem-rather than go from a universal relation to normalized tables, we want to reconstruct the universal relation given the tables (sources). The standard way of reconstructing the entities will involve joining the tables. Unfortunately, because of the autonomous and decentralized way in which the sources are populated, they often do not have Primary Key-Foreign Key relations. While tables may share attributes, naive joins over these shared attributes can result in reconstruction of many spurious entities thus seriously compromising precision. Our system, SmartInt is aimed at addressing the problem of data integration in such scenarios. Given a query, our system uses the Approximate Functional Dependencies (AFDs) to piece together a tree of relevant tables to answer it. The result tuples produced by our system are able to strike a favorable balance between precision and recall.

KW - Information integration

KW - Loss of PK/FK

KW - Web databases

UR - http://www.scopus.com/inward/record.url?scp=84862283528&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84862283528&partnerID=8YFLogxK

U2 - 10.1007/s10844-011-0169-0

DO - 10.1007/s10844-011-0169-0

M3 - Article

AN - SCOPUS:84862283528

VL - 38

SP - 575

EP - 599

JO - Journal of Intelligent Information Systems

JF - Journal of Intelligent Information Systems

SN - 0925-9902

IS - 3

ER -