Identifying topics for e-cigarette user-generated contents

A case study from multiple social media platforms

Yongcheng Zhan, Ruoran Liu, Qiudan Li, Scott Leischow, Daniel Dajun Zeng

Research output: Contribution to journalArticle

16 Citations (Scopus)

Abstract

Background: Electronic cigarette (e-cigarette) is an emerging product with a rapid-growth market in recent years. Social media has become an important platform for information seeking and sharing. We aim to mine hidden topics from e-cigarette datasets collected from different social media platforms. Objective: This paper aims to gain a systematic understanding of the characteristics of various types of social media, which will provide deep insights into how consumers and policy makers effectively use social media to track e-cigarette-related content and adjust their decisions and policies. Methods: We collected data from Reddit (27,638 e-cigarette flavor-related posts from January 1, 2011, to June 30, 2015), JuiceDB (14,433 e-juice reviews from June 26, 2013 to November 12, 2015), and Twitter (13,356 "e-cig ban"-related tweets from January, 1, 2010 to June 30, 2015). Latent Dirichlet Allocation, a generative model for topic modeling, was used to analyze the topics from these data. Results: We found four types of topics across the platforms: (1) promotions, (2) flavor discussions, (3) experience sharing, and (4) regulation debates. Promotions included sales from vendors to users, as well as trades among users. A total of 10.72% (2,962/27,638) of the posts from Reddit were related to trading. Promotion links were found between social media platforms. Most of the links (87.30%) in JuiceDB were related to Reddit posts. JuiceDB and Reddit identified consistent flavor categories. E-cigarette vaping methods and features such as steeping, throat hit, and vapor production were broadly discussed both on Reddit and on JuiceDB. Reddit provided space for policy discussions and majority of the posts (60.7%) holding a negative attitude toward regulations, whereas Twitter was used to launch campaigns using certain hashtags. Our findings are based on data across different platforms. The topic distribution between Reddit and JuiceDB was significantly different (P<.001), which indicated that the user discussions focused on different perspectives across the platforms. Conclusions: This study examined Reddit, JuiceDB, and Twitter as social media data sources for e-cigarette research. These mined findings could be further used by other researchers and policy makers. By utilizing the automatic topic-modeling method, the proposed unified feedback model could be a useful tool for policy makers to comprehensively consider how to collect valuable feedback from social media.

Original languageEnglish (US)
Article numbere24
JournalJournal of Medical Internet Research
Volume19
Issue number1
DOIs
StatePublished - Jan 1 2017
Externally publishedYes

Fingerprint

Social Media
Administrative Personnel
Information Dissemination
Information Storage and Retrieval
Pharynx
Electronic Cigarettes
Research Personnel
Growth
Research

Keywords

  • Electronic cigarettes
  • Infodemiology
  • Latent Dirichlet Allocation
  • Social media
  • Topic modeling

ASJC Scopus subject areas

  • Health Informatics

Cite this

Identifying topics for e-cigarette user-generated contents : A case study from multiple social media platforms. / Zhan, Yongcheng; Liu, Ruoran; Li, Qiudan; Leischow, Scott; Zeng, Daniel Dajun.

In: Journal of Medical Internet Research, Vol. 19, No. 1, e24, 01.01.2017.

Research output: Contribution to journalArticle

@article{4a3d4d3234994c45acbf2019781be633,
title = "Identifying topics for e-cigarette user-generated contents: A case study from multiple social media platforms",
abstract = "Background: Electronic cigarette (e-cigarette) is an emerging product with a rapid-growth market in recent years. Social media has become an important platform for information seeking and sharing. We aim to mine hidden topics from e-cigarette datasets collected from different social media platforms. Objective: This paper aims to gain a systematic understanding of the characteristics of various types of social media, which will provide deep insights into how consumers and policy makers effectively use social media to track e-cigarette-related content and adjust their decisions and policies. Methods: We collected data from Reddit (27,638 e-cigarette flavor-related posts from January 1, 2011, to June 30, 2015), JuiceDB (14,433 e-juice reviews from June 26, 2013 to November 12, 2015), and Twitter (13,356 {"}e-cig ban{"}-related tweets from January, 1, 2010 to June 30, 2015). Latent Dirichlet Allocation, a generative model for topic modeling, was used to analyze the topics from these data. Results: We found four types of topics across the platforms: (1) promotions, (2) flavor discussions, (3) experience sharing, and (4) regulation debates. Promotions included sales from vendors to users, as well as trades among users. A total of 10.72{\%} (2,962/27,638) of the posts from Reddit were related to trading. Promotion links were found between social media platforms. Most of the links (87.30{\%}) in JuiceDB were related to Reddit posts. JuiceDB and Reddit identified consistent flavor categories. E-cigarette vaping methods and features such as steeping, throat hit, and vapor production were broadly discussed both on Reddit and on JuiceDB. Reddit provided space for policy discussions and majority of the posts (60.7{\%}) holding a negative attitude toward regulations, whereas Twitter was used to launch campaigns using certain hashtags. Our findings are based on data across different platforms. The topic distribution between Reddit and JuiceDB was significantly different (P<.001), which indicated that the user discussions focused on different perspectives across the platforms. Conclusions: This study examined Reddit, JuiceDB, and Twitter as social media data sources for e-cigarette research. These mined findings could be further used by other researchers and policy makers. By utilizing the automatic topic-modeling method, the proposed unified feedback model could be a useful tool for policy makers to comprehensively consider how to collect valuable feedback from social media.",
keywords = "Electronic cigarettes, Infodemiology, Latent Dirichlet Allocation, Social media, Topic modeling",
author = "Yongcheng Zhan and Ruoran Liu and Qiudan Li and Scott Leischow and Zeng, {Daniel Dajun}",
year = "2017",
month = "1",
day = "1",
doi = "10.2196/jmir.5780",
language = "English (US)",
volume = "19",
journal = "Journal of Medical Internet Research",
issn = "1439-4456",
publisher = "Journal of medical Internet Research",
number = "1",

}

TY - JOUR

T1 - Identifying topics for e-cigarette user-generated contents

T2 - A case study from multiple social media platforms

AU - Zhan, Yongcheng

AU - Liu, Ruoran

AU - Li, Qiudan

AU - Leischow, Scott

AU - Zeng, Daniel Dajun

PY - 2017/1/1

Y1 - 2017/1/1

N2 - Background: Electronic cigarette (e-cigarette) is an emerging product with a rapid-growth market in recent years. Social media has become an important platform for information seeking and sharing. We aim to mine hidden topics from e-cigarette datasets collected from different social media platforms. Objective: This paper aims to gain a systematic understanding of the characteristics of various types of social media, which will provide deep insights into how consumers and policy makers effectively use social media to track e-cigarette-related content and adjust their decisions and policies. Methods: We collected data from Reddit (27,638 e-cigarette flavor-related posts from January 1, 2011, to June 30, 2015), JuiceDB (14,433 e-juice reviews from June 26, 2013 to November 12, 2015), and Twitter (13,356 "e-cig ban"-related tweets from January, 1, 2010 to June 30, 2015). Latent Dirichlet Allocation, a generative model for topic modeling, was used to analyze the topics from these data. Results: We found four types of topics across the platforms: (1) promotions, (2) flavor discussions, (3) experience sharing, and (4) regulation debates. Promotions included sales from vendors to users, as well as trades among users. A total of 10.72% (2,962/27,638) of the posts from Reddit were related to trading. Promotion links were found between social media platforms. Most of the links (87.30%) in JuiceDB were related to Reddit posts. JuiceDB and Reddit identified consistent flavor categories. E-cigarette vaping methods and features such as steeping, throat hit, and vapor production were broadly discussed both on Reddit and on JuiceDB. Reddit provided space for policy discussions and majority of the posts (60.7%) holding a negative attitude toward regulations, whereas Twitter was used to launch campaigns using certain hashtags. Our findings are based on data across different platforms. The topic distribution between Reddit and JuiceDB was significantly different (P<.001), which indicated that the user discussions focused on different perspectives across the platforms. Conclusions: This study examined Reddit, JuiceDB, and Twitter as social media data sources for e-cigarette research. These mined findings could be further used by other researchers and policy makers. By utilizing the automatic topic-modeling method, the proposed unified feedback model could be a useful tool for policy makers to comprehensively consider how to collect valuable feedback from social media.

AB - Background: Electronic cigarette (e-cigarette) is an emerging product with a rapid-growth market in recent years. Social media has become an important platform for information seeking and sharing. We aim to mine hidden topics from e-cigarette datasets collected from different social media platforms. Objective: This paper aims to gain a systematic understanding of the characteristics of various types of social media, which will provide deep insights into how consumers and policy makers effectively use social media to track e-cigarette-related content and adjust their decisions and policies. Methods: We collected data from Reddit (27,638 e-cigarette flavor-related posts from January 1, 2011, to June 30, 2015), JuiceDB (14,433 e-juice reviews from June 26, 2013 to November 12, 2015), and Twitter (13,356 "e-cig ban"-related tweets from January, 1, 2010 to June 30, 2015). Latent Dirichlet Allocation, a generative model for topic modeling, was used to analyze the topics from these data. Results: We found four types of topics across the platforms: (1) promotions, (2) flavor discussions, (3) experience sharing, and (4) regulation debates. Promotions included sales from vendors to users, as well as trades among users. A total of 10.72% (2,962/27,638) of the posts from Reddit were related to trading. Promotion links were found between social media platforms. Most of the links (87.30%) in JuiceDB were related to Reddit posts. JuiceDB and Reddit identified consistent flavor categories. E-cigarette vaping methods and features such as steeping, throat hit, and vapor production were broadly discussed both on Reddit and on JuiceDB. Reddit provided space for policy discussions and majority of the posts (60.7%) holding a negative attitude toward regulations, whereas Twitter was used to launch campaigns using certain hashtags. Our findings are based on data across different platforms. The topic distribution between Reddit and JuiceDB was significantly different (P<.001), which indicated that the user discussions focused on different perspectives across the platforms. Conclusions: This study examined Reddit, JuiceDB, and Twitter as social media data sources for e-cigarette research. These mined findings could be further used by other researchers and policy makers. By utilizing the automatic topic-modeling method, the proposed unified feedback model could be a useful tool for policy makers to comprehensively consider how to collect valuable feedback from social media.

KW - Electronic cigarettes

KW - Infodemiology

KW - Latent Dirichlet Allocation

KW - Social media

KW - Topic modeling

UR - http://www.scopus.com/inward/record.url?scp=85012876933&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85012876933&partnerID=8YFLogxK

U2 - 10.2196/jmir.5780

DO - 10.2196/jmir.5780

M3 - Article

VL - 19

JO - Journal of Medical Internet Research

JF - Journal of Medical Internet Research

SN - 1439-4456

IS - 1

M1 - e24

ER -