Text-to-text similarity of sentences

Vasile Rus, Mihai Lintean, Arthur C. Graesser, Danielle McNamara

Research output: Chapter in Book/Report/Conference proceedingChapter

6 Scopus citations

Abstract

Assessing the semantic similarity between two texts is a central task in many applications, including summarization, intelligent tutoring systems, and software testing. Similarity of texts is typically explored at the level of word, sentence, paragraph, and document. The similarity can be defined quantitatively (e.g. in the form of a normalized value between 0 and 1) and qualitatively in the form of semantic relations such as elaboration, entailment, or paraphrase. In this chapter, we focus first on measuring quantitatively and then on detecting qualitatively sentence-level text-to-text semantic relations. A generic approach that relies on word-to-word similarity measures is presented as well as experiments and results obtained with various instantiations of the approach. In addition, we provide results of a study on the role of weighting in Latent Semantic Analysis, a statistical technique to assess similarity of texts. The results were obtained on two data sets: a standard data set on sentence-level paraphrase detection and a data set from an intelligent tutoring system.

Original languageEnglish (US)
Title of host publicationApplied Natural Language Processing
Subtitle of host publicationIdentification, Investigation and Resolution
PublisherIGI Global
Pages110-121
Number of pages12
ISBN (Print)9781609607418
DOIs
StatePublished - 2011

ASJC Scopus subject areas

  • General Computer Science

Fingerprint

Dive into the research topics of 'Text-to-text similarity of sentences'. Together they form a unique fingerprint.

Cite this