Text-to-text similarity of sentences

Vasile Rus; Mihai Lintean; Arthur C. Graesser; Danielle McNamara

doi:10.4018/978-1-60960-741-8.ch007

Text-to-text similarity of sentences

Vasile Rus, Mihai Lintean, Arthur C. Graesser, Danielle McNamara

Educational Leadership and Innovation, Division of

Research output: Chapter in Book/Report/Conference proceeding › Chapter

6 Scopus citations

Abstract

Assessing the semantic similarity between two texts is a central task in many applications, including summarization, intelligent tutoring systems, and software testing. Similarity of texts is typically explored at the level of word, sentence, paragraph, and document. The similarity can be defined quantitatively (e.g. in the form of a normalized value between 0 and 1) and qualitatively in the form of semantic relations such as elaboration, entailment, or paraphrase. In this chapter, we focus first on measuring quantitatively and then on detecting qualitatively sentence-level text-to-text semantic relations. A generic approach that relies on word-to-word similarity measures is presented as well as experiments and results obtained with various instantiations of the approach. In addition, we provide results of a study on the role of weighting in Latent Semantic Analysis, a statistical technique to assess similarity of texts. The results were obtained on two data sets: a standard data set on sentence-level paraphrase detection and a data set from an intelligent tutoring system.

Original language	English (US)
Title of host publication	Applied Natural Language Processing
Subtitle of host publication	Identification, Investigation and Resolution
Publisher	IGI Global
Pages	110-121
Number of pages	12
ISBN (Print)	9781609607418
DOIs	https://doi.org/10.4018/978-1-60960-741-8.ch007
State	Published - 2011

ASJC Scopus subject areas

General Computer Science

Access to Document

10.4018/978-1-60960-741-8.ch007

Cite this

@inbook{3b08f1cf70084ca4b313c0d061006b9d,

title = "Text-to-text similarity of sentences",

abstract = "Assessing the semantic similarity between two texts is a central task in many applications, including summarization, intelligent tutoring systems, and software testing. Similarity of texts is typically explored at the level of word, sentence, paragraph, and document. The similarity can be defined quantitatively (e.g. in the form of a normalized value between 0 and 1) and qualitatively in the form of semantic relations such as elaboration, entailment, or paraphrase. In this chapter, we focus first on measuring quantitatively and then on detecting qualitatively sentence-level text-to-text semantic relations. A generic approach that relies on word-to-word similarity measures is presented as well as experiments and results obtained with various instantiations of the approach. In addition, we provide results of a study on the role of weighting in Latent Semantic Analysis, a statistical technique to assess similarity of texts. The results were obtained on two data sets: a standard data set on sentence-level paraphrase detection and a data set from an intelligent tutoring system.",

author = "Vasile Rus and Mihai Lintean and Graesser, {Arthur C.} and Danielle McNamara",

year = "2011",

doi = "10.4018/978-1-60960-741-8.ch007",

language = "English (US)",

isbn = "9781609607418",

pages = "110--121",

booktitle = "Applied Natural Language Processing",

publisher = "IGI Global",

}

TY - CHAP

T1 - Text-to-text similarity of sentences

AU - Rus, Vasile

AU - Lintean, Mihai

AU - Graesser, Arthur C.

AU - McNamara, Danielle

PY - 2011

Y1 - 2011

N2 - Assessing the semantic similarity between two texts is a central task in many applications, including summarization, intelligent tutoring systems, and software testing. Similarity of texts is typically explored at the level of word, sentence, paragraph, and document. The similarity can be defined quantitatively (e.g. in the form of a normalized value between 0 and 1) and qualitatively in the form of semantic relations such as elaboration, entailment, or paraphrase. In this chapter, we focus first on measuring quantitatively and then on detecting qualitatively sentence-level text-to-text semantic relations. A generic approach that relies on word-to-word similarity measures is presented as well as experiments and results obtained with various instantiations of the approach. In addition, we provide results of a study on the role of weighting in Latent Semantic Analysis, a statistical technique to assess similarity of texts. The results were obtained on two data sets: a standard data set on sentence-level paraphrase detection and a data set from an intelligent tutoring system.

AB - Assessing the semantic similarity between two texts is a central task in many applications, including summarization, intelligent tutoring systems, and software testing. Similarity of texts is typically explored at the level of word, sentence, paragraph, and document. The similarity can be defined quantitatively (e.g. in the form of a normalized value between 0 and 1) and qualitatively in the form of semantic relations such as elaboration, entailment, or paraphrase. In this chapter, we focus first on measuring quantitatively and then on detecting qualitatively sentence-level text-to-text semantic relations. A generic approach that relies on word-to-word similarity measures is presented as well as experiments and results obtained with various instantiations of the approach. In addition, we provide results of a study on the role of weighting in Latent Semantic Analysis, a statistical technique to assess similarity of texts. The results were obtained on two data sets: a standard data set on sentence-level paraphrase detection and a data set from an intelligent tutoring system.

UR - http://www.scopus.com/inward/record.url?scp=84899351686&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84899351686&partnerID=8YFLogxK

U2 - 10.4018/978-1-60960-741-8.ch007

DO - 10.4018/978-1-60960-741-8.ch007

M3 - Chapter

AN - SCOPUS:84899351686

SN - 9781609607418

SP - 110

EP - 121

BT - Applied Natural Language Processing

PB - IGI Global

ER -

Text-to-text similarity of sentences

Abstract

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this