Towards effective sentence simplification for automatic processing of biomedical text

Siddhartha Jonnalagadda, Luis Tari, Jörg Hakenberg, Chitta Baral, Graciela Gonzalez

Research output: Chapter in Book/Report/Conference proceedingConference contribution

36 Scopus citations

Abstract

The complexity of sentences characteristic to biomedical articles poses a challenge to natural language parsers, which are typically trained on large-scale corpora of non-technical text. We propose a text simplification process, bioSimplify, that seeks to reduce the complexity of sentences in biomedical abstracts in order to improve the performance of syntactic parsers on the processed sentences. Syntactic parsing is typically one of the first steps in a text mining pipeline. Thus, any improvement in performance would have a ripple effect over all processing steps. We evaluated our method using a corpus of biomedical sentences annotated with syntactic links. Our empirical results show an improvement of 2.90% for the Charniak-McClosky parser and of 4.23% for the Link Grammar parser when processing simplified sentences rather than the original sentences in the corpus.

Original languageEnglish (US)
Title of host publicationNAACL-HLT 2009 - Human Language Technologies
Subtitle of host publication2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Short Papers
EditorsMari Ostendorf, Michael Collins, Shri Narayanan, Douglas W. Oard, Lucy Vanderwende
PublisherAssociation for Computational Linguistics (ACL)
Pages177-180
Number of pages4
ISBN (Electronic)9781932432428
StatePublished - 2009
Event2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2009 - Boulder, United States
Duration: May 31 2009Jun 5 2009

Publication series

NameNAACL-HLT 2009 - Human Language Technologies: 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Short Papers

Conference

Conference2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2009
Country/TerritoryUnited States
CityBoulder
Period5/31/096/5/09

ASJC Scopus subject areas

  • Language and Linguistics
  • Linguistics and Language

Fingerprint

Dive into the research topics of 'Towards effective sentence simplification for automatic processing of biomedical text'. Together they form a unique fingerprint.

Cite this