Abstract

Big data presents new challenges for understanding large text corpora. Topic modeling algorithms help understand the underlying patterns, or "topics", in data. Researchersauthor often read these topics in order to gain an understanding of the underlying corpus. It is important to evaluate the interpretability of these automatically generated topics. Methods have previously been designed to use crowdsourcing platforms to measure interpretability. In this paper, we demonstrate the necessity of a key concept, coherence, when assessing the topics and propose an effective method for its measurement. We show that the proposed measure of coherence captures a different aspect of the topics than existing measures. We further study the automation of these topic measures for scalability and reproducibility, showing that these measures can be automated.

Original languageEnglish (US)
Title of host publication54th Annual Meeting of the Association for Computational Linguistics, ACL 2016 - Short Papers
PublisherAssociation for Computational Linguistics (ACL)
Pages543-548
Number of pages6
ISBN (Electronic)9781510827592
StatePublished - 2016
Event54th Annual Meeting of the Association for Computational Linguistics, ACL 2016 - Berlin, Germany
Duration: Aug 7 2016Aug 12 2016

Other

Other54th Annual Meeting of the Association for Computational Linguistics, ACL 2016
CountryGermany
CityBerlin
Period8/7/168/12/16

Fingerprint

automation
Scalability
Automation
Statistical Models
coherence
Big data
Text Corpus
Modeling
Reproducibility

ASJC Scopus subject areas

  • Artificial Intelligence
  • Linguistics and Language
  • Software
  • Language and Linguistics

Cite this

Morstatter, F., & Liu, H. (2016). A novel measure for coherence in statistical topic models. In 54th Annual Meeting of the Association for Computational Linguistics, ACL 2016 - Short Papers (pp. 543-548). Association for Computational Linguistics (ACL).

A novel measure for coherence in statistical topic models. / Morstatter, Fred; Liu, Huan.

54th Annual Meeting of the Association for Computational Linguistics, ACL 2016 - Short Papers. Association for Computational Linguistics (ACL), 2016. p. 543-548.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Morstatter, F & Liu, H 2016, A novel measure for coherence in statistical topic models. in 54th Annual Meeting of the Association for Computational Linguistics, ACL 2016 - Short Papers. Association for Computational Linguistics (ACL), pp. 543-548, 54th Annual Meeting of the Association for Computational Linguistics, ACL 2016, Berlin, Germany, 8/7/16.
Morstatter F, Liu H. A novel measure for coherence in statistical topic models. In 54th Annual Meeting of the Association for Computational Linguistics, ACL 2016 - Short Papers. Association for Computational Linguistics (ACL). 2016. p. 543-548
Morstatter, Fred ; Liu, Huan. / A novel measure for coherence in statistical topic models. 54th Annual Meeting of the Association for Computational Linguistics, ACL 2016 - Short Papers. Association for Computational Linguistics (ACL), 2016. pp. 543-548
@inproceedings{f12b8691bc44422aad9f34ad111ead72,
title = "A novel measure for coherence in statistical topic models",
abstract = "Big data presents new challenges for understanding large text corpora. Topic modeling algorithms help understand the underlying patterns, or {"}topics{"}, in data. Researchersauthor often read these topics in order to gain an understanding of the underlying corpus. It is important to evaluate the interpretability of these automatically generated topics. Methods have previously been designed to use crowdsourcing platforms to measure interpretability. In this paper, we demonstrate the necessity of a key concept, coherence, when assessing the topics and propose an effective method for its measurement. We show that the proposed measure of coherence captures a different aspect of the topics than existing measures. We further study the automation of these topic measures for scalability and reproducibility, showing that these measures can be automated.",
author = "Fred Morstatter and Huan Liu",
year = "2016",
language = "English (US)",
pages = "543--548",
booktitle = "54th Annual Meeting of the Association for Computational Linguistics, ACL 2016 - Short Papers",
publisher = "Association for Computational Linguistics (ACL)",

}

TY - GEN

T1 - A novel measure for coherence in statistical topic models

AU - Morstatter, Fred

AU - Liu, Huan

PY - 2016

Y1 - 2016

N2 - Big data presents new challenges for understanding large text corpora. Topic modeling algorithms help understand the underlying patterns, or "topics", in data. Researchersauthor often read these topics in order to gain an understanding of the underlying corpus. It is important to evaluate the interpretability of these automatically generated topics. Methods have previously been designed to use crowdsourcing platforms to measure interpretability. In this paper, we demonstrate the necessity of a key concept, coherence, when assessing the topics and propose an effective method for its measurement. We show that the proposed measure of coherence captures a different aspect of the topics than existing measures. We further study the automation of these topic measures for scalability and reproducibility, showing that these measures can be automated.

AB - Big data presents new challenges for understanding large text corpora. Topic modeling algorithms help understand the underlying patterns, or "topics", in data. Researchersauthor often read these topics in order to gain an understanding of the underlying corpus. It is important to evaluate the interpretability of these automatically generated topics. Methods have previously been designed to use crowdsourcing platforms to measure interpretability. In this paper, we demonstrate the necessity of a key concept, coherence, when assessing the topics and propose an effective method for its measurement. We show that the proposed measure of coherence captures a different aspect of the topics than existing measures. We further study the automation of these topic measures for scalability and reproducibility, showing that these measures can be automated.

UR - http://www.scopus.com/inward/record.url?scp=84991772352&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84991772352&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:84991772352

SP - 543

EP - 548

BT - 54th Annual Meeting of the Association for Computational Linguistics, ACL 2016 - Short Papers

PB - Association for Computational Linguistics (ACL)

ER -