Text-to-text similarity of sentences

Vasile Rus, Mihai Lintean, Arthur C. Graesser, Danielle McNamara

Research output: Chapter in Book/Report/Conference proceedingChapter

5 Citations (Scopus)

Abstract

Assessing the semantic similarity between two texts is a central task in many applications, including summarization, intelligent tutoring systems, and software testing. Similarity of texts is typically explored at the level of word, sentence, paragraph, and document. The similarity can be defined quantitatively (e.g. in the form of a normalized value between 0 and 1) and qualitatively in the form of semantic relations such as elaboration, entailment, or paraphrase. In this chapter, we focus first on measuring quantitatively and then on detecting qualitatively sentence-level text-to-text semantic relations. A generic approach that relies on word-to-word similarity measures is presented as well as experiments and results obtained with various instantiations of the approach. In addition, we provide results of a study on the role of weighting in Latent Semantic Analysis, a statistical technique to assess similarity of texts. The results were obtained on two data sets: a standard data set on sentence-level paraphrase detection and a data set from an intelligent tutoring system.

Original languageEnglish (US)
Title of host publicationApplied Natural Language Processing: Identification, Investigation and Resolution
PublisherIGI Global
Pages110-121
Number of pages12
ISBN (Print)9781609607418
DOIs
StatePublished - 2011

Fingerprint

Semantics
Intelligent systems
Software testing
Experiments

ASJC Scopus subject areas

  • Computer Science(all)

Cite this

Rus, V., Lintean, M., Graesser, A. C., & McNamara, D. (2011). Text-to-text similarity of sentences. In Applied Natural Language Processing: Identification, Investigation and Resolution (pp. 110-121). IGI Global. https://doi.org/10.4018/978-1-60960-741-8.ch007

Text-to-text similarity of sentences. / Rus, Vasile; Lintean, Mihai; Graesser, Arthur C.; McNamara, Danielle.

Applied Natural Language Processing: Identification, Investigation and Resolution. IGI Global, 2011. p. 110-121.

Research output: Chapter in Book/Report/Conference proceedingChapter

Rus, V, Lintean, M, Graesser, AC & McNamara, D 2011, Text-to-text similarity of sentences. in Applied Natural Language Processing: Identification, Investigation and Resolution. IGI Global, pp. 110-121. https://doi.org/10.4018/978-1-60960-741-8.ch007
Rus V, Lintean M, Graesser AC, McNamara D. Text-to-text similarity of sentences. In Applied Natural Language Processing: Identification, Investigation and Resolution. IGI Global. 2011. p. 110-121 https://doi.org/10.4018/978-1-60960-741-8.ch007
Rus, Vasile ; Lintean, Mihai ; Graesser, Arthur C. ; McNamara, Danielle. / Text-to-text similarity of sentences. Applied Natural Language Processing: Identification, Investigation and Resolution. IGI Global, 2011. pp. 110-121
@inbook{3b08f1cf70084ca4b313c0d061006b9d,
title = "Text-to-text similarity of sentences",
abstract = "Assessing the semantic similarity between two texts is a central task in many applications, including summarization, intelligent tutoring systems, and software testing. Similarity of texts is typically explored at the level of word, sentence, paragraph, and document. The similarity can be defined quantitatively (e.g. in the form of a normalized value between 0 and 1) and qualitatively in the form of semantic relations such as elaboration, entailment, or paraphrase. In this chapter, we focus first on measuring quantitatively and then on detecting qualitatively sentence-level text-to-text semantic relations. A generic approach that relies on word-to-word similarity measures is presented as well as experiments and results obtained with various instantiations of the approach. In addition, we provide results of a study on the role of weighting in Latent Semantic Analysis, a statistical technique to assess similarity of texts. The results were obtained on two data sets: a standard data set on sentence-level paraphrase detection and a data set from an intelligent tutoring system.",
author = "Vasile Rus and Mihai Lintean and Graesser, {Arthur C.} and Danielle McNamara",
year = "2011",
doi = "10.4018/978-1-60960-741-8.ch007",
language = "English (US)",
isbn = "9781609607418",
pages = "110--121",
booktitle = "Applied Natural Language Processing: Identification, Investigation and Resolution",
publisher = "IGI Global",

}

TY - CHAP

T1 - Text-to-text similarity of sentences

AU - Rus, Vasile

AU - Lintean, Mihai

AU - Graesser, Arthur C.

AU - McNamara, Danielle

PY - 2011

Y1 - 2011

N2 - Assessing the semantic similarity between two texts is a central task in many applications, including summarization, intelligent tutoring systems, and software testing. Similarity of texts is typically explored at the level of word, sentence, paragraph, and document. The similarity can be defined quantitatively (e.g. in the form of a normalized value between 0 and 1) and qualitatively in the form of semantic relations such as elaboration, entailment, or paraphrase. In this chapter, we focus first on measuring quantitatively and then on detecting qualitatively sentence-level text-to-text semantic relations. A generic approach that relies on word-to-word similarity measures is presented as well as experiments and results obtained with various instantiations of the approach. In addition, we provide results of a study on the role of weighting in Latent Semantic Analysis, a statistical technique to assess similarity of texts. The results were obtained on two data sets: a standard data set on sentence-level paraphrase detection and a data set from an intelligent tutoring system.

AB - Assessing the semantic similarity between two texts is a central task in many applications, including summarization, intelligent tutoring systems, and software testing. Similarity of texts is typically explored at the level of word, sentence, paragraph, and document. The similarity can be defined quantitatively (e.g. in the form of a normalized value between 0 and 1) and qualitatively in the form of semantic relations such as elaboration, entailment, or paraphrase. In this chapter, we focus first on measuring quantitatively and then on detecting qualitatively sentence-level text-to-text semantic relations. A generic approach that relies on word-to-word similarity measures is presented as well as experiments and results obtained with various instantiations of the approach. In addition, we provide results of a study on the role of weighting in Latent Semantic Analysis, a statistical technique to assess similarity of texts. The results were obtained on two data sets: a standard data set on sentence-level paraphrase detection and a data set from an intelligent tutoring system.

UR - http://www.scopus.com/inward/record.url?scp=84899351686&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84899351686&partnerID=8YFLogxK

U2 - 10.4018/978-1-60960-741-8.ch007

DO - 10.4018/978-1-60960-741-8.ch007

M3 - Chapter

AN - SCOPUS:84899351686

SN - 9781609607418

SP - 110

EP - 121

BT - Applied Natural Language Processing: Identification, Investigation and Resolution

PB - IGI Global

ER -