TY - JOUR
T1 - Dictionaries and distributions
T2 - Combining expert knowledge and large scale textual data content analysis: Distributed dictionary representation
AU - Garten, Justin
AU - Hoover, Joe
AU - Johnson, Kate M.
AU - Boghrati, Reihane
AU - Iskiwitch, Carol
AU - Dehghani, Morteza
N1 - Funding Information:
This work has been funded in part by NSF IBSS #1520031. Correspondence concerning this article should be addressed to Morteza Dehghani, mdehghan@usc.edu, 3620 S. McClintock Ave, Los Angeles, CA 90089-1061.
Funding Information:
Acknowledgments This work has been funded in part by NSF IBSS #1520031. Correspondence concerning this article should be addressed to Morteza Dehghani, mdehghan@usc.edu, 3620 S. McClintock Ave, Los Angeles, CA 90089-1061.
Publisher Copyright:
© 2017, Psychonomic Society, Inc.
PY - 2018/2/1
Y1 - 2018/2/1
N2 - Theory-driven text analysis has made extensive use of psychological concept dictionaries, leading to a wide range of important results. These dictionaries have generally been applied through word count methods which have proven to be both simple and effective. In this paper, we introduce Distributed Dictionary Representations (DDR), a method that applies psychological dictionaries using semantic similarity rather than word counts. This allows for the measurement of the similarity between dictionaries and spans of text ranging from complete documents to individual words. We show how DDR enables dictionary authors to place greater emphasis on construct validity without sacrificing linguistic coverage. We further demonstrate the benefits of DDR on two real-world tasks and finally conduct an extensive study of the interaction between dictionary size and task performance. These studies allow us to examine how DDR and word count methods complement one another as tools for applying concept dictionaries and where each is best applied. Finally, we provide references to tools and resources to make this method both available and accessible to a broad psychological audience.
AB - Theory-driven text analysis has made extensive use of psychological concept dictionaries, leading to a wide range of important results. These dictionaries have generally been applied through word count methods which have proven to be both simple and effective. In this paper, we introduce Distributed Dictionary Representations (DDR), a method that applies psychological dictionaries using semantic similarity rather than word counts. This allows for the measurement of the similarity between dictionaries and spans of text ranging from complete documents to individual words. We show how DDR enables dictionary authors to place greater emphasis on construct validity without sacrificing linguistic coverage. We further demonstrate the benefits of DDR on two real-world tasks and finally conduct an extensive study of the interaction between dictionary size and task performance. These studies allow us to examine how DDR and word count methods complement one another as tools for applying concept dictionaries and where each is best applied. Finally, we provide references to tools and resources to make this method both available and accessible to a broad psychological audience.
KW - Dictionary-based text analysis
KW - Methodological innovation
KW - Semantic representation
KW - Text analysis
UR - http://www.scopus.com/inward/record.url?scp=85016647220&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85016647220&partnerID=8YFLogxK
U2 - 10.3758/s13428-017-0875-9
DO - 10.3758/s13428-017-0875-9
M3 - Article
C2 - 28364281
AN - SCOPUS:85016647220
SN - 1554-351X
VL - 50
SP - 344
EP - 361
JO - Behavior Research Methods
JF - Behavior Research Methods
IS - 1
ER -