Choosing the RightWords: Characterizing and Reducing Error of theWord Count Approach

H. Andrew Schwartz, Johannes Eichstaedt, Lukasz Dziurzynski, Eduardo Blanco, Margaret L. Kern, Stephanie Ramones, Martin Seligman, Lyle Ungar

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Scopus citations

Abstract

Social scientists are increasingly using the vast amount of text available on social media to measure variation in happiness and other psychological states. Such studies count words deemed to be indicators of happiness and track how the word frequencies change across locations or time. This word count approach is simple and scalable, yet often picks up false signals, as words can appear in different contexts and take on different meanings. We characterize the types of errors that occur using the word count approach, and find lexical ambiguity to be the most prevalent. We then show that one can reduce error with a simple refinement to such lexica by automatically eliminating highly ambiguous words. The resulting refined lexica improve precision as measured by human judgments of word occurrences in Facebook posts.

Original languageEnglish (US)
Title of host publicationSEM 2013 - 2nd Joint Conference on Lexical and Computational Semantics, Proceedings of the Main Conference and the Shared Task
Subtitle of host publicationSemantic Textual SimilaritySEM 2013 - 2nd Joint Conference on Lexical and Computational Semantics, Proceedings of the Main Conference and the Shared Task: Semantic Textual Similarity
EditorsMona Diab, Tim Baldwin, Marco Baroni
PublisherAssociation for Computational Linguistics (ACL)
Pages296-305
Number of pages10
ISBN (Electronic)9781937284480
StatePublished - 2013
Externally publishedYes
Event2nd Joint Conference on Lexical and Computational Semantics, SEM 2013 - Atlanta, United States
Duration: Jun 13 2013Jun 14 2013

Publication series

NameSEM 2013 - 2nd Joint Conference on Lexical and Computational Semantics, Proceedings of the Main Conference and the Shared Task: Semantic Textual SimilaritySEM 2013 - 2nd Joint Conference on Lexical and Computational Semantics, Proceedings of the Main Conference and the Shared Task: Semantic Textual Similarity

Conference

Conference2nd Joint Conference on Lexical and Computational Semantics, SEM 2013
Country/TerritoryUnited States
CityAtlanta
Period6/13/136/14/13

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Computer Science Applications
  • Information Systems

Cite this