TY - GEN
T1 - Automated Paragraph Detection Using Cohesion Network Analysis
AU - Botarleanu, Robert Mihai
AU - Dascalu, Mihai
AU - Andrew Crossley, Scott
AU - McNamara, Danielle S.
N1 - Funding Information:
Acknowledgements This research was supported by a grant from the Romanian National Authority for Scientific Research and Innovation, CNCS—UEFISCDI, project number TE 70 PN-III-P1-1.1-TE-2019-2209, ATES— “Automated Text Evaluation and Simplification”, the Institute of Education Sciences (R305A180144 and R305A180261), and the Office of Naval Research (N00014-17-1-2300; N00014-20-1-2623). The opinions expressed are those of the authors and do not represent the views of the IES or ONR.
Publisher Copyright:
© 2023, The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
PY - 2023
Y1 - 2023
N2 - The ability to express yourself concisely and coherently is a crucial skill, both for academic purposes and professional careers. An important aspect to consider in writing is an adequate segmentation of ideas, which in turn requires a proper understanding of where to place paragraph breaks. However, these decisions are often performed intuitively, with little systematicity in sequencing ideas. Thus, an automated method of detecting the optimal hierarchical structure of texts using quantifiable features could be a valuable tool for learners. Here, we aim to define a framework grounded in Cohesion Network Analysis to establish the structure of a text by modeling paragraphs as clusters of sentences. The analogy to clustering enables us to identify paragraph breaks that maximize inter-paragraph separation while ensuring high intra-paragraph cohesion. Our approach consists of two steps acted on texts without paragraph breaks. First, the number of paragraphs is automatically inferred with an absolute error of 1.02 using a Recurrent Neural Network, which relies on text features and cohesion flow. Second, paragraph splits are detected using two algorithms: top k which selects the largest cohesion gaps between adjacent utterances, and divisive clustering which iteratively splits the text into paragraphs. Silhouette scores are used to assess performance and the obtained values denote adequately inferred structures.
AB - The ability to express yourself concisely and coherently is a crucial skill, both for academic purposes and professional careers. An important aspect to consider in writing is an adequate segmentation of ideas, which in turn requires a proper understanding of where to place paragraph breaks. However, these decisions are often performed intuitively, with little systematicity in sequencing ideas. Thus, an automated method of detecting the optimal hierarchical structure of texts using quantifiable features could be a valuable tool for learners. Here, we aim to define a framework grounded in Cohesion Network Analysis to establish the structure of a text by modeling paragraphs as clusters of sentences. The analogy to clustering enables us to identify paragraph breaks that maximize inter-paragraph separation while ensuring high intra-paragraph cohesion. Our approach consists of two steps acted on texts without paragraph breaks. First, the number of paragraphs is automatically inferred with an absolute error of 1.02 using a Recurrent Neural Network, which relies on text features and cohesion flow. Second, paragraph splits are detected using two algorithms: top k which selects the largest cohesion gaps between adjacent utterances, and divisive clustering which iteratively splits the text into paragraphs. Silhouette scores are used to assess performance and the obtained values denote adequately inferred structures.
KW - Clustering
KW - Cohesion network analysis
KW - Paragraph marking
KW - Sentence embeddings
UR - http://www.scopus.com/inward/record.url?scp=85140464957&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85140464957&partnerID=8YFLogxK
U2 - 10.1007/978-981-19-5240-1_5
DO - 10.1007/978-981-19-5240-1_5
M3 - Conference contribution
AN - SCOPUS:85140464957
SN - 9789811952395
T3 - Smart Innovation, Systems and Technologies
SP - 77
EP - 90
BT - Polyphonic Construction of Smart Learning Ecosystems - Proceedings of the 7th Conference on Smart Learning Ecosystems and Regional Development
A2 - Dascalu, Mihai
A2 - Marti, Patrizia
A2 - Pozzi, Francesca
PB - Springer Science and Business Media Deutschland GmbH
T2 - 7th International Conference on Smart Learning Ecosystems and Regional Development, SLERD 2022
Y2 - 5 July 2022 through 6 July 2022
ER -