TY - GEN
T1 - SMARTINT
T2 - 26th IEEE International Conference on Data Engineering, ICDE 2010
AU - Gummadi, Ravi
AU - Khulbe, Anupam
AU - Kalavagattu, Aravind
AU - Salvi, Sanil
AU - Kambhampati, Subbarao
PY - 2010
Y1 - 2010
N2 - Many web databases can be seen as providing partial and overlapping information about entities in the world. To answer queries effectively, we need to integrate the information about the individual entities that are fragmented over multiple sources. At first blush this is just the inverse of traditional database normalization problem - rather than go from a universal relation to normalized tables, we want to reconstruct the universal relation given the tables (sources). The standard way of reconstructing the entities will involve joining the tables. Unfortunately, because of the autonomous and decentralized way in which the sources are populated, they often do not have Primary Key - Foreign Key relations. While tables do share attributes, naive joins over these shared attributes can result in reconstruction of many spurious entities thus seriously compromising precision. Our system, SMARTINT is aimed at addressing the problem of data integration in such scenarios. Given a query, our system uses the Approximate Functional Dependencies(AFDs) to piece together a tree of relevant tables and schemas for joining them. The result tuples produced by our system are able to strike a favorable balance between precision and recall.
AB - Many web databases can be seen as providing partial and overlapping information about entities in the world. To answer queries effectively, we need to integrate the information about the individual entities that are fragmented over multiple sources. At first blush this is just the inverse of traditional database normalization problem - rather than go from a universal relation to normalized tables, we want to reconstruct the universal relation given the tables (sources). The standard way of reconstructing the entities will involve joining the tables. Unfortunately, because of the autonomous and decentralized way in which the sources are populated, they often do not have Primary Key - Foreign Key relations. While tables do share attributes, naive joins over these shared attributes can result in reconstruction of many spurious entities thus seriously compromising precision. Our system, SMARTINT is aimed at addressing the problem of data integration in such scenarios. Given a query, our system uses the Approximate Functional Dependencies(AFDs) to piece together a tree of relevant tables and schemas for joining them. The result tuples produced by our system are able to strike a favorable balance between precision and recall.
UR - http://www.scopus.com/inward/record.url?scp=77952748781&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=77952748781&partnerID=8YFLogxK
U2 - 10.1109/ICDE.2010.5447729
DO - 10.1109/ICDE.2010.5447729
M3 - Conference contribution
AN - SCOPUS:77952748781
SN - 9781424454440
T3 - Proceedings - International Conference on Data Engineering
SP - 1149
EP - 1152
BT - 26th IEEE International Conference on Data Engineering, ICDE 2010 - Conference Proceedings
Y2 - 1 March 2010 through 6 March 2010
ER -