TY - GEN
T1 - SmartInt
T2 - 20th International Conference Companion on World Wide Web, WWW 2011
AU - Gummadi, Ravi
AU - Khulbe, Anupam
AU - Kalavagattu, Aravind
AU - Salvi, Sanil
AU - Kambhampati, Subbarao
PY - 2011
Y1 - 2011
N2 - Many web databases can be seen as providing partial and overlapping information about entities in the world. To answer queries effectively, we need to integrate the information about the individual entities that are fragmented over multiple sources. At first blush this is just the inverse of traditional database normalization problem - rather than go from a universal relation to normalized tables, we want to reconstruct the universal relation given the tables (sources). The standard way of reconstructing the entities will involve joining the tables. Unfortunately, because of the autonomous and decentralized way in which the sources are populated, they often do not have Primary Key - Foreign Key relations. While tables do share attributes, direct joins over these shared attributes can result in reconstruction of many spurious entities thus seriously compromising precision. We present a unified approach that supports intelligent retrieval over fragmented web databases by mining and using inter-table dependencies. Experiments with the prototype implementation, SmartInt, show that its retrieval strikes a good balance between precision and recall.
AB - Many web databases can be seen as providing partial and overlapping information about entities in the world. To answer queries effectively, we need to integrate the information about the individual entities that are fragmented over multiple sources. At first blush this is just the inverse of traditional database normalization problem - rather than go from a universal relation to normalized tables, we want to reconstruct the universal relation given the tables (sources). The standard way of reconstructing the entities will involve joining the tables. Unfortunately, because of the autonomous and decentralized way in which the sources are populated, they often do not have Primary Key - Foreign Key relations. While tables do share attributes, direct joins over these shared attributes can result in reconstruction of many spurious entities thus seriously compromising precision. We present a unified approach that supports intelligent retrieval over fragmented web databases by mining and using inter-table dependencies. Experiments with the prototype implementation, SmartInt, show that its retrieval strikes a good balance between precision and recall.
KW - entity completion
KW - loss of pk-fk
KW - web databases
UR - http://www.scopus.com/inward/record.url?scp=79955144122&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=79955144122&partnerID=8YFLogxK
U2 - 10.1145/1963192.1963219
DO - 10.1145/1963192.1963219
M3 - Conference contribution
AN - SCOPUS:79955144122
SN - 9781450305181
T3 - Proceedings of the 20th International Conference Companion on World Wide Web, WWW 2011
SP - 51
EP - 52
BT - Proceedings of the 20th International Conference Companion on World Wide Web, WWW 2011
Y2 - 28 March 2011 through 1 April 2011
ER -