DataXFormer

A robust transformation discovery system

Ziawasch Abedjan, John Morcos, Ihab F. Ilyas, Mourad Ouzzani, Paolo Papotti, Michael Stonebraker

Research output: Chapter in Book/Report/Conference proceedingConference contribution

13 Citations (Scopus)

Abstract

In data integration, data curation, and other data analysis tasks, users spend a considerable amount of time converting data from one representation to another. For example US dates to European dates or airport codes to city names. In a previous vision paper, we presented the initial design of DataXFormer, a system that uses web resources to assist in transformation discovery. Specifically, DataXFormer discovers possible transformations from web tables and web forms and involves human feedback where appropriate. In this paper, we present the full fledged system along with several extensions. In particular, we present algorithms to find (i) transformations that entail multiple columns of input data, (ii) indirect transformations that are compositions of other transformations, (iii) transformations that are not functions but rather relationships, and (iv) transformations from a knowledge base of public data. We report on experiments with a collection of 120 transformation tasks, and show our enhanced system automatically covers 101 of them by using openly available resources.

Original languageEnglish (US)
Title of host publication2016 IEEE 32nd International Conference on Data Engineering, ICDE 2016
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages1134-1145
Number of pages12
ISBN (Electronic)9781509020195
DOIs
StatePublished - Jun 22 2016
Externally publishedYes
Event32nd IEEE International Conference on Data Engineering, ICDE 2016 - Helsinki, Finland
Duration: May 16 2016May 20 2016

Other

Other32nd IEEE International Conference on Data Engineering, ICDE 2016
CountryFinland
CityHelsinki
Period5/16/165/20/16

Fingerprint

Data integration
Airports
Feedback
Chemical analysis
Experiments

ASJC Scopus subject areas

  • Artificial Intelligence
  • Computational Theory and Mathematics
  • Computer Graphics and Computer-Aided Design
  • Computer Networks and Communications
  • Information Systems
  • Information Systems and Management

Cite this

Abedjan, Z., Morcos, J., Ilyas, I. F., Ouzzani, M., Papotti, P., & Stonebraker, M. (2016). DataXFormer: A robust transformation discovery system. In 2016 IEEE 32nd International Conference on Data Engineering, ICDE 2016 (pp. 1134-1145). [7498319] Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ICDE.2016.7498319

DataXFormer : A robust transformation discovery system. / Abedjan, Ziawasch; Morcos, John; Ilyas, Ihab F.; Ouzzani, Mourad; Papotti, Paolo; Stonebraker, Michael.

2016 IEEE 32nd International Conference on Data Engineering, ICDE 2016. Institute of Electrical and Electronics Engineers Inc., 2016. p. 1134-1145 7498319.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abedjan, Z, Morcos, J, Ilyas, IF, Ouzzani, M, Papotti, P & Stonebraker, M 2016, DataXFormer: A robust transformation discovery system. in 2016 IEEE 32nd International Conference on Data Engineering, ICDE 2016., 7498319, Institute of Electrical and Electronics Engineers Inc., pp. 1134-1145, 32nd IEEE International Conference on Data Engineering, ICDE 2016, Helsinki, Finland, 5/16/16. https://doi.org/10.1109/ICDE.2016.7498319
Abedjan Z, Morcos J, Ilyas IF, Ouzzani M, Papotti P, Stonebraker M. DataXFormer: A robust transformation discovery system. In 2016 IEEE 32nd International Conference on Data Engineering, ICDE 2016. Institute of Electrical and Electronics Engineers Inc. 2016. p. 1134-1145. 7498319 https://doi.org/10.1109/ICDE.2016.7498319
Abedjan, Ziawasch ; Morcos, John ; Ilyas, Ihab F. ; Ouzzani, Mourad ; Papotti, Paolo ; Stonebraker, Michael. / DataXFormer : A robust transformation discovery system. 2016 IEEE 32nd International Conference on Data Engineering, ICDE 2016. Institute of Electrical and Electronics Engineers Inc., 2016. pp. 1134-1145
@inproceedings{e676820b725442ed93114adc124c73b9,
title = "DataXFormer: A robust transformation discovery system",
abstract = "In data integration, data curation, and other data analysis tasks, users spend a considerable amount of time converting data from one representation to another. For example US dates to European dates or airport codes to city names. In a previous vision paper, we presented the initial design of DataXFormer, a system that uses web resources to assist in transformation discovery. Specifically, DataXFormer discovers possible transformations from web tables and web forms and involves human feedback where appropriate. In this paper, we present the full fledged system along with several extensions. In particular, we present algorithms to find (i) transformations that entail multiple columns of input data, (ii) indirect transformations that are compositions of other transformations, (iii) transformations that are not functions but rather relationships, and (iv) transformations from a knowledge base of public data. We report on experiments with a collection of 120 transformation tasks, and show our enhanced system automatically covers 101 of them by using openly available resources.",
author = "Ziawasch Abedjan and John Morcos and Ilyas, {Ihab F.} and Mourad Ouzzani and Paolo Papotti and Michael Stonebraker",
year = "2016",
month = "6",
day = "22",
doi = "10.1109/ICDE.2016.7498319",
language = "English (US)",
pages = "1134--1145",
booktitle = "2016 IEEE 32nd International Conference on Data Engineering, ICDE 2016",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
address = "United States",

}

TY - GEN

T1 - DataXFormer

T2 - A robust transformation discovery system

AU - Abedjan, Ziawasch

AU - Morcos, John

AU - Ilyas, Ihab F.

AU - Ouzzani, Mourad

AU - Papotti, Paolo

AU - Stonebraker, Michael

PY - 2016/6/22

Y1 - 2016/6/22

N2 - In data integration, data curation, and other data analysis tasks, users spend a considerable amount of time converting data from one representation to another. For example US dates to European dates or airport codes to city names. In a previous vision paper, we presented the initial design of DataXFormer, a system that uses web resources to assist in transformation discovery. Specifically, DataXFormer discovers possible transformations from web tables and web forms and involves human feedback where appropriate. In this paper, we present the full fledged system along with several extensions. In particular, we present algorithms to find (i) transformations that entail multiple columns of input data, (ii) indirect transformations that are compositions of other transformations, (iii) transformations that are not functions but rather relationships, and (iv) transformations from a knowledge base of public data. We report on experiments with a collection of 120 transformation tasks, and show our enhanced system automatically covers 101 of them by using openly available resources.

AB - In data integration, data curation, and other data analysis tasks, users spend a considerable amount of time converting data from one representation to another. For example US dates to European dates or airport codes to city names. In a previous vision paper, we presented the initial design of DataXFormer, a system that uses web resources to assist in transformation discovery. Specifically, DataXFormer discovers possible transformations from web tables and web forms and involves human feedback where appropriate. In this paper, we present the full fledged system along with several extensions. In particular, we present algorithms to find (i) transformations that entail multiple columns of input data, (ii) indirect transformations that are compositions of other transformations, (iii) transformations that are not functions but rather relationships, and (iv) transformations from a knowledge base of public data. We report on experiments with a collection of 120 transformation tasks, and show our enhanced system automatically covers 101 of them by using openly available resources.

UR - http://www.scopus.com/inward/record.url?scp=84980367317&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84980367317&partnerID=8YFLogxK

U2 - 10.1109/ICDE.2016.7498319

DO - 10.1109/ICDE.2016.7498319

M3 - Conference contribution

SP - 1134

EP - 1145

BT - 2016 IEEE 32nd International Conference on Data Engineering, ICDE 2016

PB - Institute of Electrical and Electronics Engineers Inc.

ER -