A linguistic analysis of student-generated paraphrases

Vasile Rus, Shi Feng, Russell Brandon, Scott Crossely, Danielle McNamara

Research output: Chapter in Book/Report/Conference proceedingConference contribution

3 Citations (Scopus)

Abstract

Paraphrase identification is a core Natural Language Processing task that involves assessing the semantic similarity of two texts. To foster systematic studies of this task, standardized datasets were created on which various approaches could be compared more fairly. However, a better understanding and more precise operational definition of a paraphrase are needed before any further datasets or systematic evaluations of the task of paraphrase identification are proposed. This study develops the concept of paraphrasing as a writing strategy. Six types of paraphrases are defined through the creation of a relatively large corpus of student-generated paraphrases. These paraphrases are analyzed along several dozen linguistic dimensions ranging from cohesion to lexical diversity. The most significant indices from these dimensions were then used to build a prediction model that could identify true and false paraphrases and each of the six paraphrase types.

Original languageEnglish (US)
Title of host publicationProceedings of the 24th International Florida Artificial Intelligence Research Society, FLAIRS - 24
Pages293-298
Number of pages6
StatePublished - 2011
Externally publishedYes
Event24th International Florida Artificial Intelligence Research Society, FLAIRS - 24 - Palm Beach, FL, United States
Duration: May 18 2011May 20 2011

Other

Other24th International Florida Artificial Intelligence Research Society, FLAIRS - 24
CountryUnited States
CityPalm Beach, FL
Period5/18/115/20/11

Fingerprint

Linguistics
Semantics
Students
Processing

ASJC Scopus subject areas

  • Artificial Intelligence

Cite this

Rus, V., Feng, S., Brandon, R., Crossely, S., & McNamara, D. (2011). A linguistic analysis of student-generated paraphrases. In Proceedings of the 24th International Florida Artificial Intelligence Research Society, FLAIRS - 24 (pp. 293-298)

A linguistic analysis of student-generated paraphrases. / Rus, Vasile; Feng, Shi; Brandon, Russell; Crossely, Scott; McNamara, Danielle.

Proceedings of the 24th International Florida Artificial Intelligence Research Society, FLAIRS - 24. 2011. p. 293-298.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Rus, V, Feng, S, Brandon, R, Crossely, S & McNamara, D 2011, A linguistic analysis of student-generated paraphrases. in Proceedings of the 24th International Florida Artificial Intelligence Research Society, FLAIRS - 24. pp. 293-298, 24th International Florida Artificial Intelligence Research Society, FLAIRS - 24, Palm Beach, FL, United States, 5/18/11.
Rus V, Feng S, Brandon R, Crossely S, McNamara D. A linguistic analysis of student-generated paraphrases. In Proceedings of the 24th International Florida Artificial Intelligence Research Society, FLAIRS - 24. 2011. p. 293-298
Rus, Vasile ; Feng, Shi ; Brandon, Russell ; Crossely, Scott ; McNamara, Danielle. / A linguistic analysis of student-generated paraphrases. Proceedings of the 24th International Florida Artificial Intelligence Research Society, FLAIRS - 24. 2011. pp. 293-298
@inproceedings{cca8b6d0665348a6803cc65186963414,
title = "A linguistic analysis of student-generated paraphrases",
abstract = "Paraphrase identification is a core Natural Language Processing task that involves assessing the semantic similarity of two texts. To foster systematic studies of this task, standardized datasets were created on which various approaches could be compared more fairly. However, a better understanding and more precise operational definition of a paraphrase are needed before any further datasets or systematic evaluations of the task of paraphrase identification are proposed. This study develops the concept of paraphrasing as a writing strategy. Six types of paraphrases are defined through the creation of a relatively large corpus of student-generated paraphrases. These paraphrases are analyzed along several dozen linguistic dimensions ranging from cohesion to lexical diversity. The most significant indices from these dimensions were then used to build a prediction model that could identify true and false paraphrases and each of the six paraphrase types.",
author = "Vasile Rus and Shi Feng and Russell Brandon and Scott Crossely and Danielle McNamara",
year = "2011",
language = "English (US)",
isbn = "9781577355014",
pages = "293--298",
booktitle = "Proceedings of the 24th International Florida Artificial Intelligence Research Society, FLAIRS - 24",

}

TY - GEN

T1 - A linguistic analysis of student-generated paraphrases

AU - Rus, Vasile

AU - Feng, Shi

AU - Brandon, Russell

AU - Crossely, Scott

AU - McNamara, Danielle

PY - 2011

Y1 - 2011

N2 - Paraphrase identification is a core Natural Language Processing task that involves assessing the semantic similarity of two texts. To foster systematic studies of this task, standardized datasets were created on which various approaches could be compared more fairly. However, a better understanding and more precise operational definition of a paraphrase are needed before any further datasets or systematic evaluations of the task of paraphrase identification are proposed. This study develops the concept of paraphrasing as a writing strategy. Six types of paraphrases are defined through the creation of a relatively large corpus of student-generated paraphrases. These paraphrases are analyzed along several dozen linguistic dimensions ranging from cohesion to lexical diversity. The most significant indices from these dimensions were then used to build a prediction model that could identify true and false paraphrases and each of the six paraphrase types.

AB - Paraphrase identification is a core Natural Language Processing task that involves assessing the semantic similarity of two texts. To foster systematic studies of this task, standardized datasets were created on which various approaches could be compared more fairly. However, a better understanding and more precise operational definition of a paraphrase are needed before any further datasets or systematic evaluations of the task of paraphrase identification are proposed. This study develops the concept of paraphrasing as a writing strategy. Six types of paraphrases are defined through the creation of a relatively large corpus of student-generated paraphrases. These paraphrases are analyzed along several dozen linguistic dimensions ranging from cohesion to lexical diversity. The most significant indices from these dimensions were then used to build a prediction model that could identify true and false paraphrases and each of the six paraphrase types.

UR - http://www.scopus.com/inward/record.url?scp=80052401306&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=80052401306&partnerID=8YFLogxK

M3 - Conference contribution

SN - 9781577355014

SP - 293

EP - 298

BT - Proceedings of the 24th International Florida Artificial Intelligence Research Society, FLAIRS - 24

ER -