TY - GEN
T1 - A linguistic analysis of student-generated paraphrases
AU - Rus, Vasile
AU - Feng, Shi
AU - Brandon, Russell
AU - Crossely, Scott
AU - McNamara, Danielle S.
PY - 2011
Y1 - 2011
N2 - Paraphrase identification is a core Natural Language Processing task that involves assessing the semantic similarity of two texts. To foster systematic studies of this task, standardized datasets were created on which various approaches could be compared more fairly. However, a better understanding and more precise operational definition of a paraphrase are needed before any further datasets or systematic evaluations of the task of paraphrase identification are proposed. This study develops the concept of paraphrasing as a writing strategy. Six types of paraphrases are defined through the creation of a relatively large corpus of student-generated paraphrases. These paraphrases are analyzed along several dozen linguistic dimensions ranging from cohesion to lexical diversity. The most significant indices from these dimensions were then used to build a prediction model that could identify true and false paraphrases and each of the six paraphrase types.
AB - Paraphrase identification is a core Natural Language Processing task that involves assessing the semantic similarity of two texts. To foster systematic studies of this task, standardized datasets were created on which various approaches could be compared more fairly. However, a better understanding and more precise operational definition of a paraphrase are needed before any further datasets or systematic evaluations of the task of paraphrase identification are proposed. This study develops the concept of paraphrasing as a writing strategy. Six types of paraphrases are defined through the creation of a relatively large corpus of student-generated paraphrases. These paraphrases are analyzed along several dozen linguistic dimensions ranging from cohesion to lexical diversity. The most significant indices from these dimensions were then used to build a prediction model that could identify true and false paraphrases and each of the six paraphrase types.
UR - http://www.scopus.com/inward/record.url?scp=80052401306&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=80052401306&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:80052401306
SN - 9781577355014
T3 - Proceedings of the 24th International Florida Artificial Intelligence Research Society, FLAIRS - 24
SP - 293
EP - 298
BT - Proceedings of the 24th International Florida Artificial Intelligence Research Society, FLAIRS - 24
T2 - 24th International Florida Artificial Intelligence Research Society, FLAIRS - 24
Y2 - 18 May 2011 through 20 May 2011
ER -