Choosing the Right Words: Characterizing and Reducing Error of the Word Count Approach

H. Andrew Schwartz, Johannes Eichstaedt, Lukasz Dziurzynski, Eduardo Blanco, Margaret L. Kern, Stephanie Ramones, Martin Seligman, Lyle Ungar

Research output: Chapter in Book/Report/Conference proceedingConference contribution

13 Scopus citations

Abstract

Social scientists are increasingly using the vast amount of text available on social media to measure variation in happiness and other psychological states. Such studies count words deemed to be indicators of happiness and track how the word frequencies change across locations or time. This word count approach is simple and scalable, yet often picks up false signals, as words can appear in different contexts and take on different meanings. We characterize the types of errors that occur using the word count approach, and find lexical ambiguity to be the most prevalent. We then show that one can reduce error with a simple refinement to such lexica by automatically eliminating highly ambiguous words. The resulting refined lexica improve precision as measured by human judgments of word occurrences in Facebook posts.

Original languageEnglish (US)
Title of host publication*SEM 2013 - 2nd Joint Conference on Lexical and Computational Semantics
PublisherAssociation for Computational Linguistics (ACL)
Pages296-305
Number of pages10
ISBN (Electronic)9781937284480
StatePublished - 2013
Externally publishedYes
Event2nd Joint Conference on Lexical and Computational Semantics, *SEM 2013 - Atlanta, United States
Duration: Jun 13 2013Jun 14 2013

Publication series

Name*SEM 2013 - 2nd Joint Conference on Lexical and Computational Semantics
Volume1

Conference

Conference2nd Joint Conference on Lexical and Computational Semantics, *SEM 2013
Country/TerritoryUnited States
CityAtlanta
Period6/13/136/14/13

ASJC Scopus subject areas

  • Computational Theory and Mathematics
  • Computer Science Applications
  • Theoretical Computer Science

Fingerprint

Dive into the research topics of 'Choosing the Right Words: Characterizing and Reducing Error of the Word Count Approach'. Together they form a unique fingerprint.

Cite this