Molecular event extraction from link grammar parse trees in the BIONLP'09 shared task

Jörg Hakenberg, Illés Solt, Domonkos Tikk, Vãu Há Nguyên, Luis Tari, Quang Long Nguyen, Chitta Baral, Ulf Leser

Research output: Contribution to journalArticle

2 Citations (Scopus)

Abstract

The BioNLP'09 Shared Task deals with extracting information on molecular events, such as gene expression and protein localization, from natural language text. Information in this benchmark are given as tuples including protein names, trigger terms for each event, and possible other participants such as bindings sites. We address all three tasks of BioNLP'09: event detection, event enrichment, and recognition of negation and speculation. Our method for the first two tasks is based on a deep parser; we store the parse tree of each sentence in a relational database scheme. From the training data, we collect the dependencies connecting any two relevant terms of a known tuple, that is, the shortest paths linking these two constituents. We encode all such linkages in a query language to retrieve similar linkages from unseen text. For the third task, we rely on a hierarchy of hand-crafted regular expressions to recognize speculation and negated events. In this paper, we added extensions regarding a post-processing step that handles ambiguous event trigger terms, as well as an extension of the query language to relax linkage constraints. On the BioNLP Shared Task test data, we achieve an overall F1-measure of 32%, 29%, and 30% for the successive Tasks 1, 2, and 3, respectively.

Original languageEnglish (US)
Pages (from-to)665-680
Number of pages16
JournalComputational Intelligence
Volume27
Issue number4
DOIs
StatePublished - Nov 2011

Fingerprint

Query languages
Grammar
Proteins
Linkage
Binding sites
Gene expression
Speculation
Query Language
Trigger
Term
Protein
Processing
Event Detection
Regular Expressions
Ambiguous
Relational Database
Post-processing
Shortest path
Natural Language
Linking

Keywords

  • event extraction
  • parse tree database
  • sentence parsing
  • text mining

ASJC Scopus subject areas

  • Artificial Intelligence
  • Computational Mathematics

Cite this

Hakenberg, J., Solt, I., Tikk, D., Nguyên, V. H., Tari, L., Nguyen, Q. L., ... Leser, U. (2011). Molecular event extraction from link grammar parse trees in the BIONLP'09 shared task. Computational Intelligence, 27(4), 665-680. https://doi.org/10.1111/j.1467-8640.2011.00404.x

Molecular event extraction from link grammar parse trees in the BIONLP'09 shared task. / Hakenberg, Jörg; Solt, Illés; Tikk, Domonkos; Nguyên, Vãu Há; Tari, Luis; Nguyen, Quang Long; Baral, Chitta; Leser, Ulf.

In: Computational Intelligence, Vol. 27, No. 4, 11.2011, p. 665-680.

Research output: Contribution to journalArticle

Hakenberg, J, Solt, I, Tikk, D, Nguyên, VH, Tari, L, Nguyen, QL, Baral, C & Leser, U 2011, 'Molecular event extraction from link grammar parse trees in the BIONLP'09 shared task', Computational Intelligence, vol. 27, no. 4, pp. 665-680. https://doi.org/10.1111/j.1467-8640.2011.00404.x
Hakenberg, Jörg ; Solt, Illés ; Tikk, Domonkos ; Nguyên, Vãu Há ; Tari, Luis ; Nguyen, Quang Long ; Baral, Chitta ; Leser, Ulf. / Molecular event extraction from link grammar parse trees in the BIONLP'09 shared task. In: Computational Intelligence. 2011 ; Vol. 27, No. 4. pp. 665-680.
@article{2cf243cc111b49e8a6332bba5c3160df,
title = "Molecular event extraction from link grammar parse trees in the BIONLP'09 shared task",
abstract = "The BioNLP'09 Shared Task deals with extracting information on molecular events, such as gene expression and protein localization, from natural language text. Information in this benchmark are given as tuples including protein names, trigger terms for each event, and possible other participants such as bindings sites. We address all three tasks of BioNLP'09: event detection, event enrichment, and recognition of negation and speculation. Our method for the first two tasks is based on a deep parser; we store the parse tree of each sentence in a relational database scheme. From the training data, we collect the dependencies connecting any two relevant terms of a known tuple, that is, the shortest paths linking these two constituents. We encode all such linkages in a query language to retrieve similar linkages from unseen text. For the third task, we rely on a hierarchy of hand-crafted regular expressions to recognize speculation and negated events. In this paper, we added extensions regarding a post-processing step that handles ambiguous event trigger terms, as well as an extension of the query language to relax linkage constraints. On the BioNLP Shared Task test data, we achieve an overall F1-measure of 32{\%}, 29{\%}, and 30{\%} for the successive Tasks 1, 2, and 3, respectively.",
keywords = "event extraction, parse tree database, sentence parsing, text mining",
author = "J{\"o}rg Hakenberg and Ill{\'e}s Solt and Domonkos Tikk and Nguy{\^e}n, {V{\~a}u H{\'a}} and Luis Tari and Nguyen, {Quang Long} and Chitta Baral and Ulf Leser",
year = "2011",
month = "11",
doi = "10.1111/j.1467-8640.2011.00404.x",
language = "English (US)",
volume = "27",
pages = "665--680",
journal = "Computational Intelligence",
issn = "0824-7935",
publisher = "Wiley-Blackwell",
number = "4",

}

TY - JOUR

T1 - Molecular event extraction from link grammar parse trees in the BIONLP'09 shared task

AU - Hakenberg, Jörg

AU - Solt, Illés

AU - Tikk, Domonkos

AU - Nguyên, Vãu Há

AU - Tari, Luis

AU - Nguyen, Quang Long

AU - Baral, Chitta

AU - Leser, Ulf

PY - 2011/11

Y1 - 2011/11

N2 - The BioNLP'09 Shared Task deals with extracting information on molecular events, such as gene expression and protein localization, from natural language text. Information in this benchmark are given as tuples including protein names, trigger terms for each event, and possible other participants such as bindings sites. We address all three tasks of BioNLP'09: event detection, event enrichment, and recognition of negation and speculation. Our method for the first two tasks is based on a deep parser; we store the parse tree of each sentence in a relational database scheme. From the training data, we collect the dependencies connecting any two relevant terms of a known tuple, that is, the shortest paths linking these two constituents. We encode all such linkages in a query language to retrieve similar linkages from unseen text. For the third task, we rely on a hierarchy of hand-crafted regular expressions to recognize speculation and negated events. In this paper, we added extensions regarding a post-processing step that handles ambiguous event trigger terms, as well as an extension of the query language to relax linkage constraints. On the BioNLP Shared Task test data, we achieve an overall F1-measure of 32%, 29%, and 30% for the successive Tasks 1, 2, and 3, respectively.

AB - The BioNLP'09 Shared Task deals with extracting information on molecular events, such as gene expression and protein localization, from natural language text. Information in this benchmark are given as tuples including protein names, trigger terms for each event, and possible other participants such as bindings sites. We address all three tasks of BioNLP'09: event detection, event enrichment, and recognition of negation and speculation. Our method for the first two tasks is based on a deep parser; we store the parse tree of each sentence in a relational database scheme. From the training data, we collect the dependencies connecting any two relevant terms of a known tuple, that is, the shortest paths linking these two constituents. We encode all such linkages in a query language to retrieve similar linkages from unseen text. For the third task, we rely on a hierarchy of hand-crafted regular expressions to recognize speculation and negated events. In this paper, we added extensions regarding a post-processing step that handles ambiguous event trigger terms, as well as an extension of the query language to relax linkage constraints. On the BioNLP Shared Task test data, we achieve an overall F1-measure of 32%, 29%, and 30% for the successive Tasks 1, 2, and 3, respectively.

KW - event extraction

KW - parse tree database

KW - sentence parsing

KW - text mining

UR - http://www.scopus.com/inward/record.url?scp=82455189733&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=82455189733&partnerID=8YFLogxK

U2 - 10.1111/j.1467-8640.2011.00404.x

DO - 10.1111/j.1467-8640.2011.00404.x

M3 - Article

AN - SCOPUS:82455189733

VL - 27

SP - 665

EP - 680

JO - Computational Intelligence

JF - Computational Intelligence

SN - 0824-7935

IS - 4

ER -