Sequence-to-Sequence Models for Automated Text Simplification

Robert Mihai Botarleanu, Mihai Dascalu, Scott Andrew Crossley, Danielle S. McNamara

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Scopus citations

Abstract

A key writing skill is the capability to clearly convey desired meaning using available linguistic knowledge. Consequently, writers must select from a large array of idioms, vocabulary terms that are semantically equivalent, and discourse features that simultaneously reflect content and allow readers to grasp meaning. In many cases, a simplified version of a text is needed to ensure comprehension on the part of a targeted audience (e.g., second language learners). To address this need, we propose an automated method to simplify texts based on paraphrasing. Specifically, we explore the potential for a deep learning model, previously used for machine translation, to learn a simplified version of the English language within the context of short phrases. The best model, based on an Universal Transformer architecture, achieved a BLEU score of 66.01. We also evaluated this model’s capability to perform similar transformation to texts that were simplified by human experts at different levels.

Original languageEnglish (US)
Title of host publicationArtificial Intelligence in Education - 21st International Conference, AIED 2020, Proceedings
EditorsIg Ibert Bittencourt, Mutlu Cukurova, Rose Luckin, Kasia Muldner, Eva Millán
PublisherSpringer
Pages31-36
Number of pages6
ISBN (Print)9783030522391
DOIs
StatePublished - 2020
Event21st International Conference on Artificial Intelligence in Education, AIED 2020 - Ifrane, Morocco
Duration: Jul 6 2020Jul 10 2020

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume12164 LNAI
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference21st International Conference on Artificial Intelligence in Education, AIED 2020
CountryMorocco
CityIfrane
Period7/6/207/10/20

Keywords

  • Natural language processing
  • Paraphrasing
  • Sequence-to-sequence model
  • Text simplification

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Computer Science(all)

Fingerprint Dive into the research topics of 'Sequence-to-Sequence Models for Automated Text Simplification'. Together they form a unique fingerprint.

Cite this