SMARTINT: A system for answering queries over web databases using attribute dependencies

Ravi Gummadi, Anupam Khulbe, Aravind Kalavagattu, Sanil Salvi, Subbarao Kambhampati

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Citations (Scopus)

Abstract

Many web databases can be seen as providing partial and overlapping information about entities in the world. To answer queries effectively, we need to integrate the information about the individual entities that are fragmented over multiple sources. At first blush this is just the inverse of traditional database normalization problem - rather than go from a universal relation to normalized tables, we want to reconstruct the universal relation given the tables (sources). The standard way of reconstructing the entities will involve joining the tables. Unfortunately, because of the autonomous and decentralized way in which the sources are populated, they often do not have Primary Key - Foreign Key relations. While tables do share attributes, naive joins over these shared attributes can result in reconstruction of many spurious entities thus seriously compromising precision. Our system, SMARTINT is aimed at addressing the problem of data integration in such scenarios. Given a query, our system uses the Approximate Functional Dependencies(AFDs) to piece together a tree of relevant tables and schemas for joining them. The result tuples produced by our system are able to strike a favorable balance between precision and recall.

Original languageEnglish (US)
Title of host publicationProceedings - International Conference on Data Engineering
Pages1149-1152
Number of pages4
DOIs
StatePublished - 2010
Event26th IEEE International Conference on Data Engineering, ICDE 2010 - Long Beach, CA, United States
Duration: Mar 1 2010Mar 6 2010

Other

Other26th IEEE International Conference on Data Engineering, ICDE 2010
CountryUnited States
CityLong Beach, CA
Period3/1/103/6/10

Fingerprint

Joining
Data integration

ASJC Scopus subject areas

  • Information Systems
  • Signal Processing
  • Software

Cite this

Gummadi, R., Khulbe, A., Kalavagattu, A., Salvi, S., & Kambhampati, S. (2010). SMARTINT: A system for answering queries over web databases using attribute dependencies. In Proceedings - International Conference on Data Engineering (pp. 1149-1152). [5447729] https://doi.org/10.1109/ICDE.2010.5447729

SMARTINT : A system for answering queries over web databases using attribute dependencies. / Gummadi, Ravi; Khulbe, Anupam; Kalavagattu, Aravind; Salvi, Sanil; Kambhampati, Subbarao.

Proceedings - International Conference on Data Engineering. 2010. p. 1149-1152 5447729.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Gummadi, R, Khulbe, A, Kalavagattu, A, Salvi, S & Kambhampati, S 2010, SMARTINT: A system for answering queries over web databases using attribute dependencies. in Proceedings - International Conference on Data Engineering., 5447729, pp. 1149-1152, 26th IEEE International Conference on Data Engineering, ICDE 2010, Long Beach, CA, United States, 3/1/10. https://doi.org/10.1109/ICDE.2010.5447729
Gummadi R, Khulbe A, Kalavagattu A, Salvi S, Kambhampati S. SMARTINT: A system for answering queries over web databases using attribute dependencies. In Proceedings - International Conference on Data Engineering. 2010. p. 1149-1152. 5447729 https://doi.org/10.1109/ICDE.2010.5447729
Gummadi, Ravi ; Khulbe, Anupam ; Kalavagattu, Aravind ; Salvi, Sanil ; Kambhampati, Subbarao. / SMARTINT : A system for answering queries over web databases using attribute dependencies. Proceedings - International Conference on Data Engineering. 2010. pp. 1149-1152
@inproceedings{ec30e3f765414062812528a2c35540f5,
title = "SMARTINT: A system for answering queries over web databases using attribute dependencies",
abstract = "Many web databases can be seen as providing partial and overlapping information about entities in the world. To answer queries effectively, we need to integrate the information about the individual entities that are fragmented over multiple sources. At first blush this is just the inverse of traditional database normalization problem - rather than go from a universal relation to normalized tables, we want to reconstruct the universal relation given the tables (sources). The standard way of reconstructing the entities will involve joining the tables. Unfortunately, because of the autonomous and decentralized way in which the sources are populated, they often do not have Primary Key - Foreign Key relations. While tables do share attributes, naive joins over these shared attributes can result in reconstruction of many spurious entities thus seriously compromising precision. Our system, SMARTINT is aimed at addressing the problem of data integration in such scenarios. Given a query, our system uses the Approximate Functional Dependencies(AFDs) to piece together a tree of relevant tables and schemas for joining them. The result tuples produced by our system are able to strike a favorable balance between precision and recall.",
author = "Ravi Gummadi and Anupam Khulbe and Aravind Kalavagattu and Sanil Salvi and Subbarao Kambhampati",
year = "2010",
doi = "10.1109/ICDE.2010.5447729",
language = "English (US)",
isbn = "9781424454440",
pages = "1149--1152",
booktitle = "Proceedings - International Conference on Data Engineering",

}

TY - GEN

T1 - SMARTINT

T2 - A system for answering queries over web databases using attribute dependencies

AU - Gummadi, Ravi

AU - Khulbe, Anupam

AU - Kalavagattu, Aravind

AU - Salvi, Sanil

AU - Kambhampati, Subbarao

PY - 2010

Y1 - 2010

N2 - Many web databases can be seen as providing partial and overlapping information about entities in the world. To answer queries effectively, we need to integrate the information about the individual entities that are fragmented over multiple sources. At first blush this is just the inverse of traditional database normalization problem - rather than go from a universal relation to normalized tables, we want to reconstruct the universal relation given the tables (sources). The standard way of reconstructing the entities will involve joining the tables. Unfortunately, because of the autonomous and decentralized way in which the sources are populated, they often do not have Primary Key - Foreign Key relations. While tables do share attributes, naive joins over these shared attributes can result in reconstruction of many spurious entities thus seriously compromising precision. Our system, SMARTINT is aimed at addressing the problem of data integration in such scenarios. Given a query, our system uses the Approximate Functional Dependencies(AFDs) to piece together a tree of relevant tables and schemas for joining them. The result tuples produced by our system are able to strike a favorable balance between precision and recall.

AB - Many web databases can be seen as providing partial and overlapping information about entities in the world. To answer queries effectively, we need to integrate the information about the individual entities that are fragmented over multiple sources. At first blush this is just the inverse of traditional database normalization problem - rather than go from a universal relation to normalized tables, we want to reconstruct the universal relation given the tables (sources). The standard way of reconstructing the entities will involve joining the tables. Unfortunately, because of the autonomous and decentralized way in which the sources are populated, they often do not have Primary Key - Foreign Key relations. While tables do share attributes, naive joins over these shared attributes can result in reconstruction of many spurious entities thus seriously compromising precision. Our system, SMARTINT is aimed at addressing the problem of data integration in such scenarios. Given a query, our system uses the Approximate Functional Dependencies(AFDs) to piece together a tree of relevant tables and schemas for joining them. The result tuples produced by our system are able to strike a favorable balance between precision and recall.

UR - http://www.scopus.com/inward/record.url?scp=77952748781&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=77952748781&partnerID=8YFLogxK

U2 - 10.1109/ICDE.2010.5447729

DO - 10.1109/ICDE.2010.5447729

M3 - Conference contribution

AN - SCOPUS:77952748781

SN - 9781424454440

SP - 1149

EP - 1152

BT - Proceedings - International Conference on Data Engineering

ER -