Integration of fast-evolving data sources using a deep learning approach

Zijie Wang, Lixi Zhou, Jia Zou

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Data scientists spent 80–90% of their efforts in data integration and there is still no end-to-end automatic integration and wrangling pipeline working for a large number of data sources. This work proposes a data integration system that transforms fast-evolving raw data sources to user desired tables. Based on a set of pre-trained models, a user only needs to specify the schema of the outcome feature vector as well as a few examples of rows, the system will automatically generate the outcome table from the raw data sources. The training process is automatically injected with provisioned schema evolution so that the model is resistant to data source changes. Our experiments show that the proposed approach is particularly effective for the integration of data with fast evolving schemas.

Original languageEnglish (US)
Title of host publicationSoftware Foundations for Data Interoperability and Large Scale Graph Data Analytics - 4th International Workshop, SFDI 2020, and 2nd International Workshop, LSGDA 2020, held in Conjunction with VLDB 2020, Proceedings
EditorsLu Qin, Wenjie Zhang, Ying Zhang, You Peng, Hiroyuki Kato, Wei Wang, Chuan Xiao
PublisherSpringer Science and Business Media Deutschland GmbH
Pages172-186
Number of pages15
ISBN (Print)9783030611323
DOIs
StatePublished - 2020
Externally publishedYes
Event4th International Workshop on Software Foundations for Data Interoperability, SFDI 2020 and 2nd International Workshop on Large Scale Graph Data Analytics, LSGDA 2020, held in Conjunction with VLDB 2020 - Tokyo, Japan
Duration: Sep 4 2020Sep 4 2020

Publication series

NameCommunications in Computer and Information Science
Volume1281
ISSN (Print)1865-0929
ISSN (Electronic)1865-0937

Conference

Conference4th International Workshop on Software Foundations for Data Interoperability, SFDI 2020 and 2nd International Workshop on Large Scale Graph Data Analytics, LSGDA 2020, held in Conjunction with VLDB 2020
CountryJapan
CityTokyo
Period9/4/209/4/20

Keywords

  • Data integration
  • Deep learning
  • Schema evolution

ASJC Scopus subject areas

  • Computer Science(all)
  • Mathematics(all)

Fingerprint Dive into the research topics of 'Integration of fast-evolving data sources using a deep learning approach'. Together they form a unique fingerprint.

Cite this