Combining machine learning and natural language processing to assess literary text comprehension

Renu Balyan; Kathryn S. McCarthy; Danielle S. McNamara

Combining machine learning and natural language processing to assess literary text comprehension

Renu Balyan, Kathryn S. McCarthy, Danielle S. McNamara

Psychology

Research output: Contribution to conference › Paper › peer-review

11 Scopus citations

Abstract

This study examined how machine learning and natural language processing (NLP) techniques can be leveraged to assess the interpretive behavior that is required for successful literary text comprehension. We compared the accuracy of seven different machine learning classification algorithms in predicting human ratings of student essays about literary works. Three types of NLP feature sets: unigrams (single content words), elaborative (new) n-grams, and linguistic features were used to classify idea units (paraphrase, text-based inference, interpretive inference). The most accurate classifications emerged using all three NLP features sets in combination, with accuracy ranging from 0.61 to 0.94 (F=0.18 to 0.81). Random Forests, which employs multiple decision trees and a bagging approach, was the most accurate classifier for these data. In contrast, the single classifier, Trees, which tends to “overfit” the data during training, was the least accurate. Ensemble classifiers were generally more accurate than single classifiers. However, Support Vector Machines accuracy was comparable to that of the ensemble classifiers. This is likely due to Support Vector Machines’ unique ability to support high dimension feature spaces. The findings suggest that combining the power of NLP and machine learning is an effective means of automating literary text comprehension assessment.

Original language	English (US)
Pages	244-249
Number of pages	6
State	Published - 2017
Event	10th International Conference on Educational Data Mining, EDM 2017 - Wuhan, China Duration: Jun 25 2017 → Jun 28 2017

Conference

Conference	10th International Conference on Educational Data Mining, EDM 2017
Country/Territory	China
City	Wuhan
Period	6/25/17 → 6/28/17

Keywords

Classification
Interpretation
Natural language processing
Supervised machine learning

ASJC Scopus subject areas

Computer Science Applications
Information Systems

Cite this

@conference{3de3f6d5325e4665a0a97d65046500e6,

title = "Combining machine learning and natural language processing to assess literary text comprehension",

abstract = "This study examined how machine learning and natural language processing (NLP) techniques can be leveraged to assess the interpretive behavior that is required for successful literary text comprehension. We compared the accuracy of seven different machine learning classification algorithms in predicting human ratings of student essays about literary works. Three types of NLP feature sets: unigrams (single content words), elaborative (new) n-grams, and linguistic features were used to classify idea units (paraphrase, text-based inference, interpretive inference). The most accurate classifications emerged using all three NLP features sets in combination, with accuracy ranging from 0.61 to 0.94 (F=0.18 to 0.81). Random Forests, which employs multiple decision trees and a bagging approach, was the most accurate classifier for these data. In contrast, the single classifier, Trees, which tends to “overfit” the data during training, was the least accurate. Ensemble classifiers were generally more accurate than single classifiers. However, Support Vector Machines accuracy was comparable to that of the ensemble classifiers. This is likely due to Support Vector Machines{\textquoteright} unique ability to support high dimension feature spaces. The findings suggest that combining the power of NLP and machine learning is an effective means of automating literary text comprehension assessment.",

keywords = "Classification, Interpretation, Natural language processing, Supervised machine learning",

author = "Renu Balyan and McCarthy, {Kathryn S.} and McNamara, {Danielle S.}",

note = "Funding Information: This research was supported in part by IES Grants R305A150176, R305A130124, and R305A120707, as well as ONR Grants N00014-14-1-0343 and N00014-17-1-2300. Opinions, conclusions, or recommendations do not necessarily reflect the views of the IES or ONR. Publisher Copyright: {\textcopyright} 2017 International Educational Data Mining Society. All rights reserved.; 10th International Conference on Educational Data Mining, EDM 2017 ; Conference date: 25-06-2017 Through 28-06-2017",

year = "2017",

language = "English (US)",

pages = "244--249",

}

TY - CONF

T1 - Combining machine learning and natural language processing to assess literary text comprehension

AU - Balyan, Renu

AU - McCarthy, Kathryn S.

AU - McNamara, Danielle S.

N1 - Funding Information: This research was supported in part by IES Grants R305A150176, R305A130124, and R305A120707, as well as ONR Grants N00014-14-1-0343 and N00014-17-1-2300. Opinions, conclusions, or recommendations do not necessarily reflect the views of the IES or ONR. Publisher Copyright: © 2017 International Educational Data Mining Society. All rights reserved.

PY - 2017

Y1 - 2017

N2 - This study examined how machine learning and natural language processing (NLP) techniques can be leveraged to assess the interpretive behavior that is required for successful literary text comprehension. We compared the accuracy of seven different machine learning classification algorithms in predicting human ratings of student essays about literary works. Three types of NLP feature sets: unigrams (single content words), elaborative (new) n-grams, and linguistic features were used to classify idea units (paraphrase, text-based inference, interpretive inference). The most accurate classifications emerged using all three NLP features sets in combination, with accuracy ranging from 0.61 to 0.94 (F=0.18 to 0.81). Random Forests, which employs multiple decision trees and a bagging approach, was the most accurate classifier for these data. In contrast, the single classifier, Trees, which tends to “overfit” the data during training, was the least accurate. Ensemble classifiers were generally more accurate than single classifiers. However, Support Vector Machines accuracy was comparable to that of the ensemble classifiers. This is likely due to Support Vector Machines’ unique ability to support high dimension feature spaces. The findings suggest that combining the power of NLP and machine learning is an effective means of automating literary text comprehension assessment.

AB - This study examined how machine learning and natural language processing (NLP) techniques can be leveraged to assess the interpretive behavior that is required for successful literary text comprehension. We compared the accuracy of seven different machine learning classification algorithms in predicting human ratings of student essays about literary works. Three types of NLP feature sets: unigrams (single content words), elaborative (new) n-grams, and linguistic features were used to classify idea units (paraphrase, text-based inference, interpretive inference). The most accurate classifications emerged using all three NLP features sets in combination, with accuracy ranging from 0.61 to 0.94 (F=0.18 to 0.81). Random Forests, which employs multiple decision trees and a bagging approach, was the most accurate classifier for these data. In contrast, the single classifier, Trees, which tends to “overfit” the data during training, was the least accurate. Ensemble classifiers were generally more accurate than single classifiers. However, Support Vector Machines accuracy was comparable to that of the ensemble classifiers. This is likely due to Support Vector Machines’ unique ability to support high dimension feature spaces. The findings suggest that combining the power of NLP and machine learning is an effective means of automating literary text comprehension assessment.

KW - Classification

KW - Interpretation

KW - Natural language processing

KW - Supervised machine learning

UR - http://www.scopus.com/inward/record.url?scp=85061969014&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85061969014&partnerID=8YFLogxK

M3 - Paper

AN - SCOPUS:85061969014

SP - 244

EP - 249

T2 - 10th International Conference on Educational Data Mining, EDM 2017

Y2 - 25 June 2017 through 28 June 2017

ER -

Combining machine learning and natural language processing to assess literary text comprehension

Abstract

Conference

Keywords

ASJC Scopus subject areas

Other files and links

Fingerprint

Cite this