Comparing machine learning classification approaches for predicting expository text difficulty

Renu Balyan; Kathryn S. McCarthy; Danielle S. McNamara

Comparing machine learning classification approaches for predicting expository text difficulty

Renu Balyan, Kathryn S. McCarthy, Danielle S. McNamara

Psychology

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

15 Scopus citations

Abstract

While hierarchical machine learning approaches have been used to classify texts into different content areas, this approach has, to our knowledge, not been used in the automated assessment of text difficulty. This study compared the accuracy of four classification machine learning approaches (flat, one-vs-one, one-vs-all, and hierarchical) using natural language processing features in predicting human ratings of text difficulty for two sets of texts. The hierarchical classification was the most accurate for the two text sets considered individually (Set A, 77.78%; Set B, 82.05%), while the non-hierarchical approaches, one-vs-one and one-vs-all, performed similar to the hierarchical classification for the combined set (71.43%). These findings suggest both promise and limitations for applying hierarchical approaches to text difficulty classification. It may be beneficial to apply a recursive top-down approach to discriminate the subsets of classes that are at the top of the hierarchy and less related, and then further separate the classes into subsets that may be more similar to one other. These results also suggest that a single approach may not always work for all types of da-taseis and that it is important to evaluate which machine learning approach and algorithm works best for particular datasets. The authors encourage more work in this area to help suggest which types of algorithms work best as a function of the type of dataset.

Original language	English (US)
Title of host publication	Proceedings of the 31st International Florida Artificial Intelligence Research Society Conference, FLAIRS 2018
Editors	Keith Brawner, Vasile Rus
Publisher	AAAI press
Pages	421-426
Number of pages	6
ISBN (Electronic)	9781577357964
State	Published - 2018
Event	31st International Florida Artificial Intelligence Research Society Conference, FLAIRS 2018 - Melbourne, United States Duration: May 21 2018 → May 23 2018

Publication series

Name	Proceedings of the 31st International Florida Artificial Intelligence Research Society Conference, FLAIRS 2018

Conference

Conference	31st International Florida Artificial Intelligence Research Society Conference, FLAIRS 2018
Country/Territory	United States
City	Melbourne
Period	5/21/18 → 5/23/18

ASJC Scopus subject areas

Artificial Intelligence
Software

Cite this

Balyan, R., McCarthy, K. S., & McNamara, D. S. (2018). Comparing machine learning classification approaches for predicting expository text difficulty. In K. Brawner, & V. Rus (Eds.), Proceedings of the 31st International Florida Artificial Intelligence Research Society Conference, FLAIRS 2018 (pp. 421-426). (Proceedings of the 31st International Florida Artificial Intelligence Research Society Conference, FLAIRS 2018). AAAI press.

Comparing machine learning classification approaches for predicting expository text difficulty. / Balyan, Renu; McCarthy, Kathryn S.; McNamara, Danielle S.
Proceedings of the 31st International Florida Artificial Intelligence Research Society Conference, FLAIRS 2018. ed. / Keith Brawner; Vasile Rus. AAAI press, 2018. p. 421-426 (Proceedings of the 31st International Florida Artificial Intelligence Research Society Conference, FLAIRS 2018).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Balyan, R, McCarthy, KS & McNamara, DS 2018, Comparing machine learning classification approaches for predicting expository text difficulty. in K Brawner & V Rus (eds), Proceedings of the 31st International Florida Artificial Intelligence Research Society Conference, FLAIRS 2018. Proceedings of the 31st International Florida Artificial Intelligence Research Society Conference, FLAIRS 2018, AAAI press, pp. 421-426, 31st International Florida Artificial Intelligence Research Society Conference, FLAIRS 2018, Melbourne, United States, 5/21/18.

Balyan R, McCarthy KS, McNamara DS. Comparing machine learning classification approaches for predicting expository text difficulty. In Brawner K, Rus V, editors, Proceedings of the 31st International Florida Artificial Intelligence Research Society Conference, FLAIRS 2018. AAAI press. 2018. p. 421-426. (Proceedings of the 31st International Florida Artificial Intelligence Research Society Conference, FLAIRS 2018).

Balyan, Renu ; McCarthy, Kathryn S. ; McNamara, Danielle S. / Comparing machine learning classification approaches for predicting expository text difficulty. Proceedings of the 31st International Florida Artificial Intelligence Research Society Conference, FLAIRS 2018. editor / Keith Brawner ; Vasile Rus. AAAI press, 2018. pp. 421-426 (Proceedings of the 31st International Florida Artificial Intelligence Research Society Conference, FLAIRS 2018).

@inproceedings{a4e7552b81f84c2e92301a4c68c95fbe,

title = "Comparing machine learning classification approaches for predicting expository text difficulty",

abstract = "While hierarchical machine learning approaches have been used to classify texts into different content areas, this approach has, to our knowledge, not been used in the automated assessment of text difficulty. This study compared the accuracy of four classification machine learning approaches (flat, one-vs-one, one-vs-all, and hierarchical) using natural language processing features in predicting human ratings of text difficulty for two sets of texts. The hierarchical classification was the most accurate for the two text sets considered individually (Set A, 77.78%; Set B, 82.05%), while the non-hierarchical approaches, one-vs-one and one-vs-all, performed similar to the hierarchical classification for the combined set (71.43%). These findings suggest both promise and limitations for applying hierarchical approaches to text difficulty classification. It may be beneficial to apply a recursive top-down approach to discriminate the subsets of classes that are at the top of the hierarchy and less related, and then further separate the classes into subsets that may be more similar to one other. These results also suggest that a single approach may not always work for all types of da-taseis and that it is important to evaluate which machine learning approach and algorithm works best for particular datasets. The authors encourage more work in this area to help suggest which types of algorithms work best as a function of the type of dataset.",

author = "Renu Balyan and McCarthy, {Kathryn S.} and McNamara, {Danielle S.}",

note = "Funding Information: This research was supported in part by the Institute of Education Sciences (IES R305A130124) and the Office of Naval Research (ONR 00014-17-1-2300; ONR N00014-14-1-0343). Opinions or conclusions are those of the authors and do not represent the views of the IES or ONR. Publisher Copyright: Copyright {\textcopyright} 2018, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved.; 31st International Florida Artificial Intelligence Research Society Conference, FLAIRS 2018 ; Conference date: 21-05-2018 Through 23-05-2018",

year = "2018",

language = "English (US)",

series = "Proceedings of the 31st International Florida Artificial Intelligence Research Society Conference, FLAIRS 2018",

publisher = "AAAI press",

pages = "421--426",

editor = "Keith Brawner and Vasile Rus",

booktitle = "Proceedings of the 31st International Florida Artificial Intelligence Research Society Conference, FLAIRS 2018",

}

TY - GEN

T1 - Comparing machine learning classification approaches for predicting expository text difficulty

AU - Balyan, Renu

AU - McCarthy, Kathryn S.

AU - McNamara, Danielle S.

N1 - Funding Information: This research was supported in part by the Institute of Education Sciences (IES R305A130124) and the Office of Naval Research (ONR 00014-17-1-2300; ONR N00014-14-1-0343). Opinions or conclusions are those of the authors and do not represent the views of the IES or ONR. Publisher Copyright: Copyright © 2018, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved.

PY - 2018

Y1 - 2018

N2 - While hierarchical machine learning approaches have been used to classify texts into different content areas, this approach has, to our knowledge, not been used in the automated assessment of text difficulty. This study compared the accuracy of four classification machine learning approaches (flat, one-vs-one, one-vs-all, and hierarchical) using natural language processing features in predicting human ratings of text difficulty for two sets of texts. The hierarchical classification was the most accurate for the two text sets considered individually (Set A, 77.78%; Set B, 82.05%), while the non-hierarchical approaches, one-vs-one and one-vs-all, performed similar to the hierarchical classification for the combined set (71.43%). These findings suggest both promise and limitations for applying hierarchical approaches to text difficulty classification. It may be beneficial to apply a recursive top-down approach to discriminate the subsets of classes that are at the top of the hierarchy and less related, and then further separate the classes into subsets that may be more similar to one other. These results also suggest that a single approach may not always work for all types of da-taseis and that it is important to evaluate which machine learning approach and algorithm works best for particular datasets. The authors encourage more work in this area to help suggest which types of algorithms work best as a function of the type of dataset.

AB - While hierarchical machine learning approaches have been used to classify texts into different content areas, this approach has, to our knowledge, not been used in the automated assessment of text difficulty. This study compared the accuracy of four classification machine learning approaches (flat, one-vs-one, one-vs-all, and hierarchical) using natural language processing features in predicting human ratings of text difficulty for two sets of texts. The hierarchical classification was the most accurate for the two text sets considered individually (Set A, 77.78%; Set B, 82.05%), while the non-hierarchical approaches, one-vs-one and one-vs-all, performed similar to the hierarchical classification for the combined set (71.43%). These findings suggest both promise and limitations for applying hierarchical approaches to text difficulty classification. It may be beneficial to apply a recursive top-down approach to discriminate the subsets of classes that are at the top of the hierarchy and less related, and then further separate the classes into subsets that may be more similar to one other. These results also suggest that a single approach may not always work for all types of da-taseis and that it is important to evaluate which machine learning approach and algorithm works best for particular datasets. The authors encourage more work in this area to help suggest which types of algorithms work best as a function of the type of dataset.

UR - http://www.scopus.com/inward/record.url?scp=85071907769&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85071907769&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:85071907769

T3 - Proceedings of the 31st International Florida Artificial Intelligence Research Society Conference, FLAIRS 2018

SP - 421

EP - 426

BT - Proceedings of the 31st International Florida Artificial Intelligence Research Society Conference, FLAIRS 2018

A2 - Brawner, Keith

A2 - Rus, Vasile

PB - AAAI press

T2 - 31st International Florida Artificial Intelligence Research Society Conference, FLAIRS 2018

Y2 - 21 May 2018 through 23 May 2018

ER -

Comparing machine learning classification approaches for predicting expository text difficulty

Abstract

Publication series

Conference

ASJC Scopus subject areas

Other files and links

Fingerprint

Cite this