GenerIE: Information extraction using database queries

Luis Tari; Phan Huy Tu; Jörg Hakenberg; Yi Chen; Tran Cao Son; Graciela Gonzalez; Chitta Baral

doi:10.1109/ICDE.2010.5447773

GenerIE: Information extraction using database queries

Luis Tari, Phan Huy Tu, Jörg Hakenberg, Yi Chen, Tran Cao Son, Graciela Gonzalez, Chitta Baral

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Abstract

Information extraction systems are traditionally implemented as a pipeline of special-purpose processing modules. A major drawback of such an approach is that whenever a new extraction goal emerges or a module is improved, extraction has to be re-applied from scratch to the entire text corpus even though only a small part of the corpus might be affected. In this demonstration proposal, we describe a novel paradigm for information extraction: we store the parse trees output by text processing in a database, and then express extraction needs using queries, which can be evaluated and optimized by databases. Compared with the existing approaches, database queries for information extraction enable generic extraction and minimize reprocessing. However, such an approach also poses a lot of technical challenges, such as language design, optimization and automatic query generation. We will present the opportunities and challenges that we met when building GenerIE, a system that implements this paradigm.

Original language	English (US)
Title of host publication	26th IEEE International Conference on Data Engineering, ICDE 2010 - Conference Proceedings
Pages	1121-1124
Number of pages	4
DOIs	https://doi.org/10.1109/ICDE.2010.5447773
State	Published - 2010
Event	26th IEEE International Conference on Data Engineering, ICDE 2010 - Long Beach, CA, United States Duration: Mar 1 2010 → Mar 6 2010

Publication series

Name	Proceedings - International Conference on Data Engineering
ISSN (Print)	1084-4627

Other

Other	26th IEEE International Conference on Data Engineering, ICDE 2010
Country/Territory	United States
City	Long Beach, CA
Period	3/1/10 → 3/6/10

ASJC Scopus subject areas

Software
Signal Processing
Information Systems

Access to Document

10.1109/ICDE.2010.5447773

Cite this

Tari, L., Tu, P. H., Hakenberg, J., Chen, Y., Son, T. C., Gonzalez, G., & Baral, C. (2010). GenerIE: Information extraction using database queries. In 26th IEEE International Conference on Data Engineering, ICDE 2010 - Conference Proceedings (pp. 1121-1124). Article 5447773 (Proceedings - International Conference on Data Engineering). https://doi.org/10.1109/ICDE.2010.5447773

GenerIE: Information extraction using database queries. / Tari, Luis; Tu, Phan Huy; Hakenberg, Jörg et al.
26th IEEE International Conference on Data Engineering, ICDE 2010 - Conference Proceedings. 2010. p. 1121-1124 5447773 (Proceedings - International Conference on Data Engineering).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Tari, L, Tu, PH, Hakenberg, J, Chen, Y, Son, TC, Gonzalez, G & Baral, C 2010, GenerIE: Information extraction using database queries. in 26th IEEE International Conference on Data Engineering, ICDE 2010 - Conference Proceedings., 5447773, Proceedings - International Conference on Data Engineering, pp. 1121-1124, 26th IEEE International Conference on Data Engineering, ICDE 2010, Long Beach, CA, United States, 3/1/10. https://doi.org/10.1109/ICDE.2010.5447773

@inproceedings{448976ac8ab64901b6ddbc68f99ff416,

title = "GenerIE: Information extraction using database queries",

abstract = "Information extraction systems are traditionally implemented as a pipeline of special-purpose processing modules. A major drawback of such an approach is that whenever a new extraction goal emerges or a module is improved, extraction has to be re-applied from scratch to the entire text corpus even though only a small part of the corpus might be affected. In this demonstration proposal, we describe a novel paradigm for information extraction: we store the parse trees output by text processing in a database, and then express extraction needs using queries, which can be evaluated and optimized by databases. Compared with the existing approaches, database queries for information extraction enable generic extraction and minimize reprocessing. However, such an approach also poses a lot of technical challenges, such as language design, optimization and automatic query generation. We will present the opportunities and challenges that we met when building GenerIE, a system that implements this paradigm.",

author = "Luis Tari and Tu, {Phan Huy} and J{\"o}rg Hakenberg and Yi Chen and Son, {Tran Cao} and Graciela Gonzalez and Chitta Baral",

year = "2010",

doi = "10.1109/ICDE.2010.5447773",

language = "English (US)",

isbn = "9781424454440",

series = "Proceedings - International Conference on Data Engineering",

pages = "1121--1124",

booktitle = "26th IEEE International Conference on Data Engineering, ICDE 2010 - Conference Proceedings",

note = "26th IEEE International Conference on Data Engineering, ICDE 2010 ; Conference date: 01-03-2010 Through 06-03-2010",

}

TY - GEN

T1 - GenerIE

T2 - 26th IEEE International Conference on Data Engineering, ICDE 2010

AU - Tari, Luis

AU - Tu, Phan Huy

AU - Hakenberg, Jörg

AU - Chen, Yi

AU - Son, Tran Cao

AU - Gonzalez, Graciela

AU - Baral, Chitta

PY - 2010

Y1 - 2010

N2 - Information extraction systems are traditionally implemented as a pipeline of special-purpose processing modules. A major drawback of such an approach is that whenever a new extraction goal emerges or a module is improved, extraction has to be re-applied from scratch to the entire text corpus even though only a small part of the corpus might be affected. In this demonstration proposal, we describe a novel paradigm for information extraction: we store the parse trees output by text processing in a database, and then express extraction needs using queries, which can be evaluated and optimized by databases. Compared with the existing approaches, database queries for information extraction enable generic extraction and minimize reprocessing. However, such an approach also poses a lot of technical challenges, such as language design, optimization and automatic query generation. We will present the opportunities and challenges that we met when building GenerIE, a system that implements this paradigm.

AB - Information extraction systems are traditionally implemented as a pipeline of special-purpose processing modules. A major drawback of such an approach is that whenever a new extraction goal emerges or a module is improved, extraction has to be re-applied from scratch to the entire text corpus even though only a small part of the corpus might be affected. In this demonstration proposal, we describe a novel paradigm for information extraction: we store the parse trees output by text processing in a database, and then express extraction needs using queries, which can be evaluated and optimized by databases. Compared with the existing approaches, database queries for information extraction enable generic extraction and minimize reprocessing. However, such an approach also poses a lot of technical challenges, such as language design, optimization and automatic query generation. We will present the opportunities and challenges that we met when building GenerIE, a system that implements this paradigm.

UR - http://www.scopus.com/inward/record.url?scp=77952779235&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=77952779235&partnerID=8YFLogxK

U2 - 10.1109/ICDE.2010.5447773

DO - 10.1109/ICDE.2010.5447773

M3 - Conference contribution

AN - SCOPUS:77952779235

SN - 9781424454440

T3 - Proceedings - International Conference on Data Engineering

SP - 1121

EP - 1124

BT - 26th IEEE International Conference on Data Engineering, ICDE 2010 - Conference Proceedings

Y2 - 1 March 2010 through 6 March 2010

ER -

GenerIE: Information extraction using database queries

Abstract

Publication series

Other

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this