TY - GEN
T1 - Mapping and cleaning
AU - Geerts, Floris
AU - Mecca, Giansalvatore
AU - Papotti, Paolo
AU - Santoro, Donatello
PY - 2014
Y1 - 2014
N2 - We address the challenging and open problem of bringing together two crucial activities in data integration and data quality, i.e., transforming data using schema mappings, and fixing conflicts and inconsistencies using data repairing. This problem is made complex by several factors. First, schema mappings and data repairing have traditionally been considered as separate activities, and research has progressed in a largely independent way in the two fields. Second, the elegant formalizations and the algorithms that have been proposed for both tasks have had mixed fortune in scaling to large databases. In the paper, we introduce a very general notion of a mapping and cleaning scenario that incorporates a wide variety of features, like, for example, user interventions. We develop a new semantics for these scenarios that represents a conservative extension of previous semantics for schema mappings and data repairing. Based on the semantics, we introduce a chase-based algorithm to compute solutions. Appropriate care is devoted to developing a scalable implementation of the chase algorithm. To the best of our knowledge, this is the first general and scalable proposal in this direction.
AB - We address the challenging and open problem of bringing together two crucial activities in data integration and data quality, i.e., transforming data using schema mappings, and fixing conflicts and inconsistencies using data repairing. This problem is made complex by several factors. First, schema mappings and data repairing have traditionally been considered as separate activities, and research has progressed in a largely independent way in the two fields. Second, the elegant formalizations and the algorithms that have been proposed for both tasks have had mixed fortune in scaling to large databases. In the paper, we introduce a very general notion of a mapping and cleaning scenario that incorporates a wide variety of features, like, for example, user interventions. We develop a new semantics for these scenarios that represents a conservative extension of previous semantics for schema mappings and data repairing. Based on the semantics, we introduce a chase-based algorithm to compute solutions. Appropriate care is devoted to developing a scalable implementation of the chase algorithm. To the best of our knowledge, this is the first general and scalable proposal in this direction.
UR - http://www.scopus.com/inward/record.url?scp=84901745035&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84901745035&partnerID=8YFLogxK
U2 - 10.1109/ICDE.2014.6816654
DO - 10.1109/ICDE.2014.6816654
M3 - Conference contribution
AN - SCOPUS:84901745035
SN - 9781479925544
T3 - Proceedings - International Conference on Data Engineering
SP - 232
EP - 243
BT - 2014 IEEE 30th International Conference on Data Engineering, ICDE 2014
PB - IEEE Computer Society
T2 - 30th IEEE International Conference on Data Engineering, ICDE 2014
Y2 - 31 March 2014 through 4 April 2014
ER -