TY - JOUR
T1 - Applying Natural Language Processing and Hierarchical Machine Learning Approaches to Text Difficulty Classification
AU - Balyan, Renu
AU - McCarthy, Kathryn S.
AU - McNamara, Danielle S.
N1 - Funding Information:
The authors would like to recognize the support of the Institute of Education Sciences, U.S. Department of Education, through Grants R305A180261, R305A190050 and R305A180144, and the Office of Naval Research, through Grant N000141712300, to Arizona State University. The opinions expressed are those of the authors and do not represent views of the Institute, the U.S. Department of Education, or the Office of Naval Research.
Publisher Copyright:
© 2020, International Artificial Intelligence in Education Society.
PY - 2020/10/1
Y1 - 2020/10/1
N2 - For decades, educators have relied on readability metrics that tend to oversimplify dimensions of text difficulty. This study examines the potential of applying advanced artificial intelligence methods to the educational problem of assessing text difficulty. The combination of hierarchical machine learning and natural language processing (NLP) is leveraged to predict the difficulty of practice texts used in a reading comprehension intelligent tutoring system, iSTART. Human raters estimated the text difficulty level of 262 texts across two text sets (Set A and Set B) in the iSTART library. NLP tools were used to identify linguistic features predictive of text difficulty and these indices were submitted to both flat and hierarchical machine learning algorithms. Results indicated that including NLP indices and machine learning increased accuracy by more than 10% as compared to classic readability metrics (e.g., Flesch-Kincaid Grade Level). Further, hierarchical outperformed non-hierarchical (flat) machine learning classification for Set B (72%) and the combined set A + B (65%), whereas the non-hierarchical approach performed slightly better than the hierarchical approach for Set A (79%). These findings demonstrate the importance of considering deeper features of language related to text difficulty as well as the potential utility of hierarchical machine learning approaches in the development of meaningful text difficulty classification.
AB - For decades, educators have relied on readability metrics that tend to oversimplify dimensions of text difficulty. This study examines the potential of applying advanced artificial intelligence methods to the educational problem of assessing text difficulty. The combination of hierarchical machine learning and natural language processing (NLP) is leveraged to predict the difficulty of practice texts used in a reading comprehension intelligent tutoring system, iSTART. Human raters estimated the text difficulty level of 262 texts across two text sets (Set A and Set B) in the iSTART library. NLP tools were used to identify linguistic features predictive of text difficulty and these indices were submitted to both flat and hierarchical machine learning algorithms. Results indicated that including NLP indices and machine learning increased accuracy by more than 10% as compared to classic readability metrics (e.g., Flesch-Kincaid Grade Level). Further, hierarchical outperformed non-hierarchical (flat) machine learning classification for Set B (72%) and the combined set A + B (65%), whereas the non-hierarchical approach performed slightly better than the hierarchical approach for Set A (79%). These findings demonstrate the importance of considering deeper features of language related to text difficulty as well as the potential utility of hierarchical machine learning approaches in the development of meaningful text difficulty classification.
KW - Hierarchical classification
KW - Machine learning
KW - Natural language processing
KW - Text difficulty
UR - http://www.scopus.com/inward/record.url?scp=85086862150&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85086862150&partnerID=8YFLogxK
U2 - 10.1007/s40593-020-00201-7
DO - 10.1007/s40593-020-00201-7
M3 - Article
AN - SCOPUS:85086862150
SN - 1560-4292
VL - 30
SP - 337
EP - 370
JO - International Journal of Artificial Intelligence in Education
JF - International Journal of Artificial Intelligence in Education
IS - 3
ER -