TY - GEN
T1 - Assessing Entailer with a corpus of natural language from an intelligent tutoring system
AU - McCarthy, Philip M.
AU - Rus, Vasile
AU - Crossley, Scott A.
AU - Bigham, Sarah C.
AU - Graesser, Arthur C.
AU - McNamara, Danielle S.
PY - 2007/12/28
Y1 - 2007/12/28
N2 - In this study, we compared Entailer, a computational tool that evaluates the degree to which one text is entailed by another, to a variety of other text relatedness metrics (LSA, lemma overlap, and MED). Our corpus was a subset of 100 self-explanations of sentences from a recent experiment on interactions between students and iSTART, an Intelligent Tutoring System that helps students to apply metacognitive strategies to enhance deep comprehension. The sentence pairs were hand coded by experts in discourse processing across four categories of text relatedness: entailment, implicature, elaboration, and paraphrase. A series of regression analyses revealed that Entailer was the best measure for approximating these hand coded values. The Entailer could explain approximately 50% of the variance for entailment, 38% of the variance for elaboration, and 23% of the variance for paraphrase. LSA contributed marginally to the entailment model. Neither lemma-overlap nor MED contributed to any of the models, although a modified version of MED did correlate significantly with both the entailment and paraphrase hand coded evaluations. This study is an important step towards developing a set of indices designed to better assess natural language input by students in Intelligent Tutoring Systems.
AB - In this study, we compared Entailer, a computational tool that evaluates the degree to which one text is entailed by another, to a variety of other text relatedness metrics (LSA, lemma overlap, and MED). Our corpus was a subset of 100 self-explanations of sentences from a recent experiment on interactions between students and iSTART, an Intelligent Tutoring System that helps students to apply metacognitive strategies to enhance deep comprehension. The sentence pairs were hand coded by experts in discourse processing across four categories of text relatedness: entailment, implicature, elaboration, and paraphrase. A series of regression analyses revealed that Entailer was the best measure for approximating these hand coded values. The Entailer could explain approximately 50% of the variance for entailment, 38% of the variance for elaboration, and 23% of the variance for paraphrase. LSA contributed marginally to the entailment model. Neither lemma-overlap nor MED contributed to any of the models, although a modified version of MED did correlate significantly with both the entailment and paraphrase hand coded evaluations. This study is an important step towards developing a set of indices designed to better assess natural language input by students in Intelligent Tutoring Systems.
UR - http://www.scopus.com/inward/record.url?scp=37349131942&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=37349131942&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:37349131942
SN - 1577353196
SN - 9781577353195
T3 - Proceedings of the Twentieth International Florida Artificial Intelligence Research Society Conference, FLAIRS 2007
SP - 247
EP - 252
BT - Proceedings of the Twentieth International Florida Artificial Intelligence Research Society Conference, FLAIRS 2007
T2 - 20th International Florida Artificial Intelligence Research Society Conference, FLAIRS 2007
Y2 - 7 May 2007 through 9 May 2007
ER -