Newness and givenness of information: Automated identification in written discourse

Philip M. McCarthy, David Dufty, Christian F. Hempelmann, Zhiqiang Cai, Danielle McNamara, Arthur C. Graesser

Research output: Chapter in Book/Report/Conference proceedingChapter

Abstract

The identification of new versus given information within a text has been frequently investigated by researchers of language and discourse. Despite theoretical advances, an accurate computational method for assessing the degree to which a text contains new versus given information has not previously been implemented. This study discusses a variety of computational new/given systems and analyzes four typical expository and narrative texts against a widely accepted theory of new/given proposed by Prince (1981). Our findings suggest that a latent semantic analysis (LSA) based measure called span outperforms standard LSA in detecting both new and given information in text. Further, span outperforms standard LSA for distinguishing low versus high cohesion versions of text. Our results suggest that span may be a useful variable in a wide array of discourse analyses.

Original languageEnglish (US)
Title of host publicationApplied Natural Language Processing
Subtitle of host publicationIdentification, Investigation and Resolution
PublisherIGI Global
Pages455-476
Number of pages22
ISBN (Print)9781609607418
DOIs
StatePublished - Dec 1 2011

ASJC Scopus subject areas

  • Computer Science(all)

Cite this