TY - GEN
T1 - Choosing the RightWords
T2 - 2nd Joint Conference on Lexical and Computational Semantics, SEM 2013
AU - Schwartz, H. Andrew
AU - Eichstaedt, Johannes
AU - Dziurzynski, Lukasz
AU - Blanco, Eduardo
AU - Kern, Margaret L.
AU - Ramones, Stephanie
AU - Seligman, Martin
AU - Ungar, Lyle
N1 - Funding Information:
Support for this research was provided by the Robert Wood Johnson Foundation’s Pioneer Portfolio, through a grant to Martin Seligman, “Exploring Concepts of Positive Health”. We thank the reviewers for their constructive and insightful comments.
Publisher Copyright:
©2013 Association for Computational Linguistics.
PY - 2013
Y1 - 2013
N2 - Social scientists are increasingly using the vast amount of text available on social media to measure variation in happiness and other psychological states. Such studies count words deemed to be indicators of happiness and track how the word frequencies change across locations or time. This word count approach is simple and scalable, yet often picks up false signals, as words can appear in different contexts and take on different meanings. We characterize the types of errors that occur using the word count approach, and find lexical ambiguity to be the most prevalent. We then show that one can reduce error with a simple refinement to such lexica by automatically eliminating highly ambiguous words. The resulting refined lexica improve precision as measured by human judgments of word occurrences in Facebook posts.
AB - Social scientists are increasingly using the vast amount of text available on social media to measure variation in happiness and other psychological states. Such studies count words deemed to be indicators of happiness and track how the word frequencies change across locations or time. This word count approach is simple and scalable, yet often picks up false signals, as words can appear in different contexts and take on different meanings. We characterize the types of errors that occur using the word count approach, and find lexical ambiguity to be the most prevalent. We then show that one can reduce error with a simple refinement to such lexica by automatically eliminating highly ambiguous words. The resulting refined lexica improve precision as measured by human judgments of word occurrences in Facebook posts.
UR - http://www.scopus.com/inward/record.url?scp=85123688242&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85123688242&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85123688242
T3 - SEM 2013 - 2nd Joint Conference on Lexical and Computational Semantics, Proceedings of the Main Conference and the Shared Task: Semantic Textual SimilaritySEM 2013 - 2nd Joint Conference on Lexical and Computational Semantics, Proceedings of the Main Conference and the Shared Task: Semantic Textual Similarity
SP - 296
EP - 305
BT - SEM 2013 - 2nd Joint Conference on Lexical and Computational Semantics, Proceedings of the Main Conference and the Shared Task
A2 - Diab, Mona
A2 - Baldwin, Tim
A2 - Baroni, Marco
PB - Association for Computational Linguistics (ACL)
Y2 - 13 June 2013 through 14 June 2013
ER -