Sequence-to-Sequence Models for Automated Text Simplification

Robert Mihai Botarleanu; Mihai Dascalu; Scott Andrew Crossley; Danielle S. McNamara

doi:10.1007/978-3-030-52240-7_6

Sequence-to-Sequence Models for Automated Text Simplification

Robert Mihai Botarleanu, Mihai Dascalu, Scott Andrew Crossley, Danielle S. McNamara

Psychology

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

7 Scopus citations

Abstract

A key writing skill is the capability to clearly convey desired meaning using available linguistic knowledge. Consequently, writers must select from a large array of idioms, vocabulary terms that are semantically equivalent, and discourse features that simultaneously reflect content and allow readers to grasp meaning. In many cases, a simplified version of a text is needed to ensure comprehension on the part of a targeted audience (e.g., second language learners). To address this need, we propose an automated method to simplify texts based on paraphrasing. Specifically, we explore the potential for a deep learning model, previously used for machine translation, to learn a simplified version of the English language within the context of short phrases. The best model, based on an Universal Transformer architecture, achieved a BLEU score of 66.01. We also evaluated this model’s capability to perform similar transformation to texts that were simplified by human experts at different levels.

Original language	English (US)
Title of host publication	Artificial Intelligence in Education - 21st International Conference, AIED 2020, Proceedings
Editors	Ig Ibert Bittencourt, Mutlu Cukurova, Rose Luckin, Kasia Muldner, Eva Millán
Publisher	Springer
Pages	31-36
Number of pages	6
ISBN (Print)	9783030522391
DOIs	https://doi.org/10.1007/978-3-030-52240-7_6
State	Published - 2020
Event	21st International Conference on Artificial Intelligence in Education, AIED 2020 - Ifrane, Morocco Duration: Jul 6 2020 → Jul 10 2020

Publication series

Name	Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume	12164 LNAI
ISSN (Print)	0302-9743
ISSN (Electronic)	1611-3349

Conference

Conference	21st International Conference on Artificial Intelligence in Education, AIED 2020
Country/Territory	Morocco
City	Ifrane
Period	7/6/20 → 7/10/20

Keywords

Natural language processing
Paraphrasing
Sequence-to-sequence model
Text simplification

ASJC Scopus subject areas

Theoretical Computer Science
General Computer Science

Access to Document

10.1007/978-3-030-52240-7_6

Cite this

Botarleanu, R. M., Dascalu, M., Crossley, S. A., & McNamara, D. S. (2020). Sequence-to-Sequence Models for Automated Text Simplification. In I. I. Bittencourt, M. Cukurova, R. Luckin, K. Muldner, & E. Millán (Eds.), Artificial Intelligence in Education - 21st International Conference, AIED 2020, Proceedings (pp. 31-36). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 12164 LNAI). Springer. https://doi.org/10.1007/978-3-030-52240-7_6

Sequence-to-Sequence Models for Automated Text Simplification. / Botarleanu, Robert Mihai; Dascalu, Mihai; Crossley, Scott Andrew et al.
Artificial Intelligence in Education - 21st International Conference, AIED 2020, Proceedings. ed. / Ig Ibert Bittencourt; Mutlu Cukurova; Rose Luckin; Kasia Muldner; Eva Millán. Springer, 2020. p. 31-36 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 12164 LNAI).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Botarleanu, RM, Dascalu, M, Crossley, SA & McNamara, DS 2020, Sequence-to-Sequence Models for Automated Text Simplification. in II Bittencourt, M Cukurova, R Luckin, K Muldner & E Millán (eds), Artificial Intelligence in Education - 21st International Conference, AIED 2020, Proceedings. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 12164 LNAI, Springer, pp. 31-36, 21st International Conference on Artificial Intelligence in Education, AIED 2020, Ifrane, Morocco, 7/6/20. https://doi.org/10.1007/978-3-030-52240-7_6

Botarleanu RM, Dascalu M, Crossley SA, McNamara DS. Sequence-to-Sequence Models for Automated Text Simplification. In Bittencourt II, Cukurova M, Luckin R, Muldner K, Millán E, editors, Artificial Intelligence in Education - 21st International Conference, AIED 2020, Proceedings. Springer. 2020. p. 31-36. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). doi: 10.1007/978-3-030-52240-7_6

Botarleanu, Robert Mihai ; Dascalu, Mihai ; Crossley, Scott Andrew et al. / Sequence-to-Sequence Models for Automated Text Simplification. Artificial Intelligence in Education - 21st International Conference, AIED 2020, Proceedings. editor / Ig Ibert Bittencourt ; Mutlu Cukurova ; Rose Luckin ; Kasia Muldner ; Eva Millán. Springer, 2020. pp. 31-36 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).

@inproceedings{154d50048c514931922a18225ca11992,

title = "Sequence-to-Sequence Models for Automated Text Simplification",

abstract = "A key writing skill is the capability to clearly convey desired meaning using available linguistic knowledge. Consequently, writers must select from a large array of idioms, vocabulary terms that are semantically equivalent, and discourse features that simultaneously reflect content and allow readers to grasp meaning. In many cases, a simplified version of a text is needed to ensure comprehension on the part of a targeted audience (e.g., second language learners). To address this need, we propose an automated method to simplify texts based on paraphrasing. Specifically, we explore the potential for a deep learning model, previously used for machine translation, to learn a simplified version of the English language within the context of short phrases. The best model, based on an Universal Transformer architecture, achieved a BLEU score of 66.01. We also evaluated this model{\textquoteright}s capability to perform similar transformation to texts that were simplified by human experts at different levels.",

keywords = "Natural language processing, Paraphrasing, Sequence-to-sequence model, Text simplification",

author = "Botarleanu, {Robert Mihai} and Mihai Dascalu and Crossley, {Scott Andrew} and McNamara, {Danielle S.}",

note = "Funding Information: This work was supported by a grant of the Romanian National Authority for Scientific Research and Innovation, CNCS ? UEFISCDI, project number PN-III 54PCCDI ? 2018, INTELLIT ? ?Prezervarea ?i valorificarea patrimoniului literar rom?nesc folosind solu?ii digitale inteligente pentru extragerea ?i sistematizarea de cuno?tin?e?. This research was also supported in part by the Institute of Education Sciences (R305A190063) and the Office of Naval Research (N00014-17-1-2300 and N00014-19-1-2424). The opinions expressed are those of the authors and do not represent views of the IES or ONR.; 21st International Conference on Artificial Intelligence in Education, AIED 2020 ; Conference date: 06-07-2020 Through 10-07-2020",

year = "2020",

doi = "10.1007/978-3-030-52240-7_6",

language = "English (US)",

isbn = "9783030522391",

series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",

publisher = "Springer",

pages = "31--36",

editor = "Bittencourt, {Ig Ibert} and Mutlu Cukurova and Rose Luckin and Kasia Muldner and Eva Mill{\'a}n",

booktitle = "Artificial Intelligence in Education - 21st International Conference, AIED 2020, Proceedings",

}

TY - GEN

T1 - Sequence-to-Sequence Models for Automated Text Simplification

AU - Botarleanu, Robert Mihai

AU - Dascalu, Mihai

AU - Crossley, Scott Andrew

AU - McNamara, Danielle S.

N1 - Funding Information: This work was supported by a grant of the Romanian National Authority for Scientific Research and Innovation, CNCS ? UEFISCDI, project number PN-III 54PCCDI ? 2018, INTELLIT ? ?Prezervarea ?i valorificarea patrimoniului literar rom?nesc folosind solu?ii digitale inteligente pentru extragerea ?i sistematizarea de cuno?tin?e?. This research was also supported in part by the Institute of Education Sciences (R305A190063) and the Office of Naval Research (N00014-17-1-2300 and N00014-19-1-2424). The opinions expressed are those of the authors and do not represent views of the IES or ONR.

PY - 2020

Y1 - 2020

N2 - A key writing skill is the capability to clearly convey desired meaning using available linguistic knowledge. Consequently, writers must select from a large array of idioms, vocabulary terms that are semantically equivalent, and discourse features that simultaneously reflect content and allow readers to grasp meaning. In many cases, a simplified version of a text is needed to ensure comprehension on the part of a targeted audience (e.g., second language learners). To address this need, we propose an automated method to simplify texts based on paraphrasing. Specifically, we explore the potential for a deep learning model, previously used for machine translation, to learn a simplified version of the English language within the context of short phrases. The best model, based on an Universal Transformer architecture, achieved a BLEU score of 66.01. We also evaluated this model’s capability to perform similar transformation to texts that were simplified by human experts at different levels.

AB - A key writing skill is the capability to clearly convey desired meaning using available linguistic knowledge. Consequently, writers must select from a large array of idioms, vocabulary terms that are semantically equivalent, and discourse features that simultaneously reflect content and allow readers to grasp meaning. In many cases, a simplified version of a text is needed to ensure comprehension on the part of a targeted audience (e.g., second language learners). To address this need, we propose an automated method to simplify texts based on paraphrasing. Specifically, we explore the potential for a deep learning model, previously used for machine translation, to learn a simplified version of the English language within the context of short phrases. The best model, based on an Universal Transformer architecture, achieved a BLEU score of 66.01. We also evaluated this model’s capability to perform similar transformation to texts that were simplified by human experts at different levels.

KW - Natural language processing

KW - Paraphrasing

KW - Sequence-to-sequence model

KW - Text simplification

UR - http://www.scopus.com/inward/record.url?scp=85088556291&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85088556291&partnerID=8YFLogxK

U2 - 10.1007/978-3-030-52240-7_6

DO - 10.1007/978-3-030-52240-7_6

M3 - Conference contribution

AN - SCOPUS:85088556291

SN - 9783030522391

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 31

EP - 36

BT - Artificial Intelligence in Education - 21st International Conference, AIED 2020, Proceedings

A2 - Bittencourt, Ig Ibert

A2 - Cukurova, Mutlu

A2 - Luckin, Rose

A2 - Muldner, Kasia

A2 - Millán, Eva

PB - Springer

T2 - 21st International Conference on Artificial Intelligence in Education, AIED 2020

Y2 - 6 July 2020 through 10 July 2020

ER -

Sequence-to-Sequence Models for Automated Text Simplification

Abstract

Publication series

Conference

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this