Abstract

Supervised learning, e.g., classification, plays an important role in processing and organizing microblogging data. In microblogging, it is easy to mass vast quantities of unlabeled data, but would be costly to obtain labels, which are essential for supervised learning algorithms. In order to reduce the labeling cost, active learning is an effective way to select representative and informative instances to query for labels for improving the learned model. Different from traditional data in which the instances are assumed to be independent and identically distributed (i.i.d.), instances in microblogging are networked with each other. This presents both opportunities and challenges for applying active learning to microblogging data. Inspired by social correlation theories, we investigate whether social relations can help perform effective active learning on networked data. In this paper, we propose a novel Active learning framework for the classification of Networked Texts in microblogging (ActNeT). In particular, we study how to incorporate network information into text content modeling, and design strategies to select the most representative and informative instances from microblogging for labeling by taking advantage of social network structure. Experimental results on Twitter datasets show the benefit of incorporating network information in active learning and that the proposed framework outperforms existing state-of-the-art methods.

Original languageEnglish (US)
Title of host publicationProceedings of the 2013 SIAM International Conference on Data Mining, SDM 2013
PublisherSiam Society
Pages306-314
Number of pages9
ISBN (Print)9781611972627
StatePublished - 2013
EventSIAM International Conference on Data Mining, SDM 2013 - Austin, United States
Duration: May 2 2013May 4 2013

Other

OtherSIAM International Conference on Data Mining, SDM 2013
CountryUnited States
CityAustin
Period5/2/135/4/13

Fingerprint

Supervised learning
Labeling
Labels
Correlation theory
Learning algorithms
Problem-Based Learning
Processing
Costs

ASJC Scopus subject areas

  • Computer Science Applications
  • Software

Cite this

Hu, X., Tang, J., Gao, H., & Liu, H. (2013). ActNeT: Active learning for networked texts in microblogging. In Proceedings of the 2013 SIAM International Conference on Data Mining, SDM 2013 (pp. 306-314). Siam Society.

ActNeT : Active learning for networked texts in microblogging. / Hu, Xia; Tang, Jiliang; Gao, Huiji; Liu, Huan.

Proceedings of the 2013 SIAM International Conference on Data Mining, SDM 2013. Siam Society, 2013. p. 306-314.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Hu, X, Tang, J, Gao, H & Liu, H 2013, ActNeT: Active learning for networked texts in microblogging. in Proceedings of the 2013 SIAM International Conference on Data Mining, SDM 2013. Siam Society, pp. 306-314, SIAM International Conference on Data Mining, SDM 2013, Austin, United States, 5/2/13.
Hu X, Tang J, Gao H, Liu H. ActNeT: Active learning for networked texts in microblogging. In Proceedings of the 2013 SIAM International Conference on Data Mining, SDM 2013. Siam Society. 2013. p. 306-314
Hu, Xia ; Tang, Jiliang ; Gao, Huiji ; Liu, Huan. / ActNeT : Active learning for networked texts in microblogging. Proceedings of the 2013 SIAM International Conference on Data Mining, SDM 2013. Siam Society, 2013. pp. 306-314
@inproceedings{1c420b4d33eb4e0da42b73a4cca1b341,
title = "ActNeT: Active learning for networked texts in microblogging",
abstract = "Supervised learning, e.g., classification, plays an important role in processing and organizing microblogging data. In microblogging, it is easy to mass vast quantities of unlabeled data, but would be costly to obtain labels, which are essential for supervised learning algorithms. In order to reduce the labeling cost, active learning is an effective way to select representative and informative instances to query for labels for improving the learned model. Different from traditional data in which the instances are assumed to be independent and identically distributed (i.i.d.), instances in microblogging are networked with each other. This presents both opportunities and challenges for applying active learning to microblogging data. Inspired by social correlation theories, we investigate whether social relations can help perform effective active learning on networked data. In this paper, we propose a novel Active learning framework for the classification of Networked Texts in microblogging (ActNeT). In particular, we study how to incorporate network information into text content modeling, and design strategies to select the most representative and informative instances from microblogging for labeling by taking advantage of social network structure. Experimental results on Twitter datasets show the benefit of incorporating network information in active learning and that the proposed framework outperforms existing state-of-the-art methods.",
author = "Xia Hu and Jiliang Tang and Huiji Gao and Huan Liu",
year = "2013",
language = "English (US)",
isbn = "9781611972627",
pages = "306--314",
booktitle = "Proceedings of the 2013 SIAM International Conference on Data Mining, SDM 2013",
publisher = "Siam Society",

}

TY - GEN

T1 - ActNeT

T2 - Active learning for networked texts in microblogging

AU - Hu, Xia

AU - Tang, Jiliang

AU - Gao, Huiji

AU - Liu, Huan

PY - 2013

Y1 - 2013

N2 - Supervised learning, e.g., classification, plays an important role in processing and organizing microblogging data. In microblogging, it is easy to mass vast quantities of unlabeled data, but would be costly to obtain labels, which are essential for supervised learning algorithms. In order to reduce the labeling cost, active learning is an effective way to select representative and informative instances to query for labels for improving the learned model. Different from traditional data in which the instances are assumed to be independent and identically distributed (i.i.d.), instances in microblogging are networked with each other. This presents both opportunities and challenges for applying active learning to microblogging data. Inspired by social correlation theories, we investigate whether social relations can help perform effective active learning on networked data. In this paper, we propose a novel Active learning framework for the classification of Networked Texts in microblogging (ActNeT). In particular, we study how to incorporate network information into text content modeling, and design strategies to select the most representative and informative instances from microblogging for labeling by taking advantage of social network structure. Experimental results on Twitter datasets show the benefit of incorporating network information in active learning and that the proposed framework outperforms existing state-of-the-art methods.

AB - Supervised learning, e.g., classification, plays an important role in processing and organizing microblogging data. In microblogging, it is easy to mass vast quantities of unlabeled data, but would be costly to obtain labels, which are essential for supervised learning algorithms. In order to reduce the labeling cost, active learning is an effective way to select representative and informative instances to query for labels for improving the learned model. Different from traditional data in which the instances are assumed to be independent and identically distributed (i.i.d.), instances in microblogging are networked with each other. This presents both opportunities and challenges for applying active learning to microblogging data. Inspired by social correlation theories, we investigate whether social relations can help perform effective active learning on networked data. In this paper, we propose a novel Active learning framework for the classification of Networked Texts in microblogging (ActNeT). In particular, we study how to incorporate network information into text content modeling, and design strategies to select the most representative and informative instances from microblogging for labeling by taking advantage of social network structure. Experimental results on Twitter datasets show the benefit of incorporating network information in active learning and that the proposed framework outperforms existing state-of-the-art methods.

UR - http://www.scopus.com/inward/record.url?scp=84937403769&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84937403769&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:84937403769

SN - 9781611972627

SP - 306

EP - 314

BT - Proceedings of the 2013 SIAM International Conference on Data Mining, SDM 2013

PB - Siam Society

ER -