Finding needles of interested tweets in the haystack of Twitter network

Qiongjie Tian; Jashmi Lagisetty; Baoxin Li

doi:10.1109/ASONAM.2016.7752273

Finding needles of interested tweets in the haystack of Twitter network

Qiongjie Tian, Jashmi Lagisetty, Baoxin Li

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

5 Scopus citations

Abstract

Drug use and abuse is a serious societal problem. The fast development and adoption of social media and smart mobile devices in recent years bring about new opportunities for advancing computer-based strategies for understanding and intervention of drug-related behaviors. However, the existing literature still lacks principled ways of building computational models for supporting effective analysis of large-scale, often unstructured social media data. Part of the challenge stems from the difficulty of obtaining so-called ground-truth data that are typically required for training computational models. This paper presents a progressive semi-supervised learning approach to identifying Twitter tweets that are related to personal and recreational use of marijuana. Based on a small, labeled dataset, the proposed approach first learns optimal mapping of raw features from the tweets for classification, using a method of weakly hierarchical lasso. The learned feature model is then used to support unsupervised clustering of Web-scale data. Experiments with realistic data crawled from Twitter are used to validate the proposed approach, demonstrating its effectiveness.

Original language	English (US)
Title of host publication	Proceedings of the 2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, ASONAM 2016
Editors	Ravi Kumar, James Caverlee, Hanghang Tong
Publisher	Institute of Electrical and Electronics Engineers Inc.
Pages	447-452
Number of pages	6
ISBN (Electronic)	9781509028467
DOIs	https://doi.org/10.1109/ASONAM.2016.7752273
State	Published - Nov 21 2016
Event	2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, ASONAM 2016 - San Francisco, United States Duration: Aug 18 2016 → Aug 21 2016

Publication series

Name	Proceedings of the 2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, ASONAM 2016

Other

Other	2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, ASONAM 2016
Country/Territory	United States
City	San Francisco
Period	8/18/16 → 8/21/16

ASJC Scopus subject areas

Computer Networks and Communications
Sociology and Political Science
Communication

Access to Document

10.1109/ASONAM.2016.7752273

Cite this

Tian, Q., Lagisetty, J., & Li, B. (2016). Finding needles of interested tweets in the haystack of Twitter network. In R. Kumar, J. Caverlee, & H. Tong (Eds.), Proceedings of the 2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, ASONAM 2016 (pp. 447-452). Article 7752273 (Proceedings of the 2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, ASONAM 2016). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ASONAM.2016.7752273

Finding needles of interested tweets in the haystack of Twitter network. / Tian, Qiongjie; Lagisetty, Jashmi; Li, Baoxin.
Proceedings of the 2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, ASONAM 2016. ed. / Ravi Kumar; James Caverlee; Hanghang Tong. Institute of Electrical and Electronics Engineers Inc., 2016. p. 447-452 7752273 (Proceedings of the 2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, ASONAM 2016).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Tian, Q, Lagisetty, J & Li, B 2016, Finding needles of interested tweets in the haystack of Twitter network. in R Kumar, J Caverlee & H Tong (eds), Proceedings of the 2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, ASONAM 2016., 7752273, Proceedings of the 2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, ASONAM 2016, Institute of Electrical and Electronics Engineers Inc., pp. 447-452, 2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, ASONAM 2016, San Francisco, United States, 8/18/16. https://doi.org/10.1109/ASONAM.2016.7752273

Tian Q, Lagisetty J, Li B. Finding needles of interested tweets in the haystack of Twitter network. In Kumar R, Caverlee J, Tong H, editors, Proceedings of the 2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, ASONAM 2016. Institute of Electrical and Electronics Engineers Inc. 2016. p. 447-452. 7752273. (Proceedings of the 2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, ASONAM 2016). doi: 10.1109/ASONAM.2016.7752273

Tian, Qiongjie ; Lagisetty, Jashmi ; Li, Baoxin. / Finding needles of interested tweets in the haystack of Twitter network. Proceedings of the 2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, ASONAM 2016. editor / Ravi Kumar ; James Caverlee ; Hanghang Tong. Institute of Electrical and Electronics Engineers Inc., 2016. pp. 447-452 (Proceedings of the 2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, ASONAM 2016).

@inproceedings{651f929abbe84399b272e4918d69f6ee,

title = "Finding needles of interested tweets in the haystack of Twitter network",

abstract = "Drug use and abuse is a serious societal problem. The fast development and adoption of social media and smart mobile devices in recent years bring about new opportunities for advancing computer-based strategies for understanding and intervention of drug-related behaviors. However, the existing literature still lacks principled ways of building computational models for supporting effective analysis of large-scale, often unstructured social media data. Part of the challenge stems from the difficulty of obtaining so-called ground-truth data that are typically required for training computational models. This paper presents a progressive semi-supervised learning approach to identifying Twitter tweets that are related to personal and recreational use of marijuana. Based on a small, labeled dataset, the proposed approach first learns optimal mapping of raw features from the tweets for classification, using a method of weakly hierarchical lasso. The learned feature model is then used to support unsupervised clustering of Web-scale data. Experiments with realistic data crawled from Twitter are used to validate the proposed approach, demonstrating its effectiveness.",

author = "Qiongjie Tian and Jashmi Lagisetty and Baoxin Li",

note = "Funding Information: This work was supported in part by a grant (#1135616) from National Science Foundation. Any opinions expressed in this material are those of the authors and do not necessarily reflect the views of the NSF Publisher Copyright: {\textcopyright} 2016 IEEE. Copyright: Copyright 2017 Elsevier B.V., All rights reserved.; 2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, ASONAM 2016 ; Conference date: 18-08-2016 Through 21-08-2016",

year = "2016",

month = nov,

day = "21",

doi = "10.1109/ASONAM.2016.7752273",

language = "English (US)",

series = "Proceedings of the 2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, ASONAM 2016",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

pages = "447--452",

editor = "Ravi Kumar and James Caverlee and Hanghang Tong",

booktitle = "Proceedings of the 2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, ASONAM 2016",

}

TY - GEN

T1 - Finding needles of interested tweets in the haystack of Twitter network

AU - Tian, Qiongjie

AU - Lagisetty, Jashmi

AU - Li, Baoxin

N1 - Funding Information: This work was supported in part by a grant (#1135616) from National Science Foundation. Any opinions expressed in this material are those of the authors and do not necessarily reflect the views of the NSF Publisher Copyright: © 2016 IEEE. Copyright: Copyright 2017 Elsevier B.V., All rights reserved.

PY - 2016/11/21

Y1 - 2016/11/21

N2 - Drug use and abuse is a serious societal problem. The fast development and adoption of social media and smart mobile devices in recent years bring about new opportunities for advancing computer-based strategies for understanding and intervention of drug-related behaviors. However, the existing literature still lacks principled ways of building computational models for supporting effective analysis of large-scale, often unstructured social media data. Part of the challenge stems from the difficulty of obtaining so-called ground-truth data that are typically required for training computational models. This paper presents a progressive semi-supervised learning approach to identifying Twitter tweets that are related to personal and recreational use of marijuana. Based on a small, labeled dataset, the proposed approach first learns optimal mapping of raw features from the tweets for classification, using a method of weakly hierarchical lasso. The learned feature model is then used to support unsupervised clustering of Web-scale data. Experiments with realistic data crawled from Twitter are used to validate the proposed approach, demonstrating its effectiveness.

AB - Drug use and abuse is a serious societal problem. The fast development and adoption of social media and smart mobile devices in recent years bring about new opportunities for advancing computer-based strategies for understanding and intervention of drug-related behaviors. However, the existing literature still lacks principled ways of building computational models for supporting effective analysis of large-scale, often unstructured social media data. Part of the challenge stems from the difficulty of obtaining so-called ground-truth data that are typically required for training computational models. This paper presents a progressive semi-supervised learning approach to identifying Twitter tweets that are related to personal and recreational use of marijuana. Based on a small, labeled dataset, the proposed approach first learns optimal mapping of raw features from the tweets for classification, using a method of weakly hierarchical lasso. The learned feature model is then used to support unsupervised clustering of Web-scale data. Experiments with realistic data crawled from Twitter are used to validate the proposed approach, demonstrating its effectiveness.

UR - http://www.scopus.com/inward/record.url?scp=85006716950&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85006716950&partnerID=8YFLogxK

U2 - 10.1109/ASONAM.2016.7752273

DO - 10.1109/ASONAM.2016.7752273

M3 - Conference contribution

AN - SCOPUS:85006716950

T3 - Proceedings of the 2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, ASONAM 2016

SP - 447

EP - 452

BT - Proceedings of the 2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, ASONAM 2016

A2 - Kumar, Ravi

A2 - Caverlee, James

A2 - Tong, Hanghang

PB - Institute of Electrical and Electronics Engineers Inc.

T2 - 2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, ASONAM 2016

Y2 - 18 August 2016 through 21 August 2016

ER -

Finding needles of interested tweets in the haystack of Twitter network

Abstract

Publication series

Other

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this