TY - GEN
T1 - Towards effective sentence simplification for automatic processing of biomedical text
AU - Jonnalagadda, Siddhartha
AU - Tari, Luis
AU - Hakenberg, Jörg
AU - Baral, Chitta
AU - Gonzalez, Graciela
N1 - Publisher Copyright:
© 2009 Association for Computational Linguistics
PY - 2009
Y1 - 2009
N2 - The complexity of sentences characteristic to biomedical articles poses a challenge to natural language parsers, which are typically trained on large-scale corpora of non-technical text. We propose a text simplification process, bioSimplify, that seeks to reduce the complexity of sentences in biomedical abstracts in order to improve the performance of syntactic parsers on the processed sentences. Syntactic parsing is typically one of the first steps in a text mining pipeline. Thus, any improvement in performance would have a ripple effect over all processing steps. We evaluated our method using a corpus of biomedical sentences annotated with syntactic links. Our empirical results show an improvement of 2.90% for the Charniak-McClosky parser and of 4.23% for the Link Grammar parser when processing simplified sentences rather than the original sentences in the corpus.
AB - The complexity of sentences characteristic to biomedical articles poses a challenge to natural language parsers, which are typically trained on large-scale corpora of non-technical text. We propose a text simplification process, bioSimplify, that seeks to reduce the complexity of sentences in biomedical abstracts in order to improve the performance of syntactic parsers on the processed sentences. Syntactic parsing is typically one of the first steps in a text mining pipeline. Thus, any improvement in performance would have a ripple effect over all processing steps. We evaluated our method using a corpus of biomedical sentences annotated with syntactic links. Our empirical results show an improvement of 2.90% for the Charniak-McClosky parser and of 4.23% for the Link Grammar parser when processing simplified sentences rather than the original sentences in the corpus.
UR - http://www.scopus.com/inward/record.url?scp=79957457736&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=79957457736&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:79957457736
T3 - NAACL-HLT 2009 - Human Language Technologies: 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Short Papers
SP - 177
EP - 180
BT - NAACL-HLT 2009 - Human Language Technologies
A2 - Ostendorf, Mari
A2 - Collins, Michael
A2 - Narayanan, Shri
A2 - Oard, Douglas W.
A2 - Vanderwende, Lucy
PB - Association for Computational Linguistics (ACL)
T2 - 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2009
Y2 - 31 May 2009 through 5 June 2009
ER -