Abstract

Topic modeling is an important tool in social media analysis, allowing researchers to quickly understand large text corpora by investigating the topics underlying them. One of the fundamental problems of topic models lies in how to assess the quality of the topics from the perspective of human interpretability. How well can humans understand the meaning of topics generated by statistical topic modeling algorithms? In this work we advance the study of this question by introducing Topic Consensus: a new measure that calculates the quality of a topic through investigating its consensus with some known topics underlying the data. We view the quality of the topics from three perspectives: 1) topic interpretability, 2) how documents relate to the underlying topics, and 3) how interpretable the topics are when the corpus has an underlying categorization. We provide insights into how well the results of Mechanical Turk match automated methods for calculating topic quality. The probability distribution of the words in the topic best fit the Topic Coherence measure, in terms of both correlation as well as finding the best topics.

Original languageEnglish (US)
Title of host publicationHT 2015 - Proceedings of the 26th ACM Conference on Hypertext and Social Media
PublisherAssociation for Computing Machinery, Inc
Pages123-131
Number of pages9
ISBN (Print)9781450333955
DOIs
StatePublished - Aug 24 2015
Event26th ACM Conference on Hypertext and Social Media, HT 2015 - Guzelyurt, Cyprus
Duration: Sep 1 2015Sep 4 2015

Other

Other26th ACM Conference on Hypertext and Social Media, HT 2015
CountryCyprus
CityGuzelyurt
Period9/1/159/4/15

Fingerprint

Probability distributions

Keywords

  • Text analysis
  • Text mining
  • Topic analysis
  • Topic modeling

ASJC Scopus subject areas

  • Artificial Intelligence
  • Software
  • Computer Graphics and Computer-Aided Design
  • Human-Computer Interaction

Cite this

Morstatter, F., Pfeffer, J., Mayer, K., & Liu, H. (2015). Text, topics, and turkers: A consensus measure for statistical topics. In HT 2015 - Proceedings of the 26th ACM Conference on Hypertext and Social Media (pp. 123-131). Association for Computing Machinery, Inc. https://doi.org/10.1145/2700171.2791028

Text, topics, and turkers : A consensus measure for statistical topics. / Morstatter, Fred; Pfeffer, Jürgen; Mayer, Katja; Liu, Huan.

HT 2015 - Proceedings of the 26th ACM Conference on Hypertext and Social Media. Association for Computing Machinery, Inc, 2015. p. 123-131.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Morstatter, F, Pfeffer, J, Mayer, K & Liu, H 2015, Text, topics, and turkers: A consensus measure for statistical topics. in HT 2015 - Proceedings of the 26th ACM Conference on Hypertext and Social Media. Association for Computing Machinery, Inc, pp. 123-131, 26th ACM Conference on Hypertext and Social Media, HT 2015, Guzelyurt, Cyprus, 9/1/15. https://doi.org/10.1145/2700171.2791028
Morstatter F, Pfeffer J, Mayer K, Liu H. Text, topics, and turkers: A consensus measure for statistical topics. In HT 2015 - Proceedings of the 26th ACM Conference on Hypertext and Social Media. Association for Computing Machinery, Inc. 2015. p. 123-131 https://doi.org/10.1145/2700171.2791028
Morstatter, Fred ; Pfeffer, Jürgen ; Mayer, Katja ; Liu, Huan. / Text, topics, and turkers : A consensus measure for statistical topics. HT 2015 - Proceedings of the 26th ACM Conference on Hypertext and Social Media. Association for Computing Machinery, Inc, 2015. pp. 123-131
@inproceedings{e3da00fdf77f4ed2aa21bb1d4ecaf57a,
title = "Text, topics, and turkers: A consensus measure for statistical topics",
abstract = "Topic modeling is an important tool in social media analysis, allowing researchers to quickly understand large text corpora by investigating the topics underlying them. One of the fundamental problems of topic models lies in how to assess the quality of the topics from the perspective of human interpretability. How well can humans understand the meaning of topics generated by statistical topic modeling algorithms? In this work we advance the study of this question by introducing Topic Consensus: a new measure that calculates the quality of a topic through investigating its consensus with some known topics underlying the data. We view the quality of the topics from three perspectives: 1) topic interpretability, 2) how documents relate to the underlying topics, and 3) how interpretable the topics are when the corpus has an underlying categorization. We provide insights into how well the results of Mechanical Turk match automated methods for calculating topic quality. The probability distribution of the words in the topic best fit the Topic Coherence measure, in terms of both correlation as well as finding the best topics.",
keywords = "Text analysis, Text mining, Topic analysis, Topic modeling",
author = "Fred Morstatter and J{\"u}rgen Pfeffer and Katja Mayer and Huan Liu",
year = "2015",
month = "8",
day = "24",
doi = "10.1145/2700171.2791028",
language = "English (US)",
isbn = "9781450333955",
pages = "123--131",
booktitle = "HT 2015 - Proceedings of the 26th ACM Conference on Hypertext and Social Media",
publisher = "Association for Computing Machinery, Inc",

}

TY - GEN

T1 - Text, topics, and turkers

T2 - A consensus measure for statistical topics

AU - Morstatter, Fred

AU - Pfeffer, Jürgen

AU - Mayer, Katja

AU - Liu, Huan

PY - 2015/8/24

Y1 - 2015/8/24

N2 - Topic modeling is an important tool in social media analysis, allowing researchers to quickly understand large text corpora by investigating the topics underlying them. One of the fundamental problems of topic models lies in how to assess the quality of the topics from the perspective of human interpretability. How well can humans understand the meaning of topics generated by statistical topic modeling algorithms? In this work we advance the study of this question by introducing Topic Consensus: a new measure that calculates the quality of a topic through investigating its consensus with some known topics underlying the data. We view the quality of the topics from three perspectives: 1) topic interpretability, 2) how documents relate to the underlying topics, and 3) how interpretable the topics are when the corpus has an underlying categorization. We provide insights into how well the results of Mechanical Turk match automated methods for calculating topic quality. The probability distribution of the words in the topic best fit the Topic Coherence measure, in terms of both correlation as well as finding the best topics.

AB - Topic modeling is an important tool in social media analysis, allowing researchers to quickly understand large text corpora by investigating the topics underlying them. One of the fundamental problems of topic models lies in how to assess the quality of the topics from the perspective of human interpretability. How well can humans understand the meaning of topics generated by statistical topic modeling algorithms? In this work we advance the study of this question by introducing Topic Consensus: a new measure that calculates the quality of a topic through investigating its consensus with some known topics underlying the data. We view the quality of the topics from three perspectives: 1) topic interpretability, 2) how documents relate to the underlying topics, and 3) how interpretable the topics are when the corpus has an underlying categorization. We provide insights into how well the results of Mechanical Turk match automated methods for calculating topic quality. The probability distribution of the words in the topic best fit the Topic Coherence measure, in terms of both correlation as well as finding the best topics.

KW - Text analysis

KW - Text mining

KW - Topic analysis

KW - Topic modeling

UR - http://www.scopus.com/inward/record.url?scp=84956970377&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84956970377&partnerID=8YFLogxK

U2 - 10.1145/2700171.2791028

DO - 10.1145/2700171.2791028

M3 - Conference contribution

SN - 9781450333955

SP - 123

EP - 131

BT - HT 2015 - Proceedings of the 26th ACM Conference on Hypertext and Social Media

PB - Association for Computing Machinery, Inc

ER -