Abstract

Feature selection is widely used in preparing high-dimensional data for effective data mining. The explosive popularity of social media producesmassive and high-dimensional data at an unprecedented rate, presenting new challenges to feature selection. Social media data consists of (1) traditional high-dimensional, attributevalue data such as posts, tweets, comments, and images, and (2) linked data that provides social context for posts and describes the relationships between social media users as well as who generates the posts, and so on. The nature of social media also determines that its data is massive, noisy, and incomplete, which exacerbates the already challenging problem of feature selection. In this article, we study a novel feature selection problem of selecting features for social media data with its social context. In detail, we illustrate the differences between attribute-value data and social media data, investigate if linked data can be exploited in a new feature selection framework by taking advantage of social science theories. We design and conduct experiments on datasets from real-world social media Web sites, and the empirical results demonstrate that the proposed framework can significantly improve the performance of feature selection. Further experiments are conducted to evaluate the effects of user-user and user-post relationships manifested in linked data on feature selection, and research issues for future work will be discussed.

Original languageEnglish (US)
Article number19
JournalACM Transactions on Knowledge Discovery from Data
Volume8
Issue number4
DOIs
StatePublished - Oct 7 2014

Fingerprint

Feature extraction
Social sciences
Data mining
Websites
Experiments

Keywords

  • Feature selection
  • social context
  • social media data

ASJC Scopus subject areas

  • Computer Science(all)

Cite this

Feature selection for social media data. / Tang, Jiliang; Liu, Huan.

In: ACM Transactions on Knowledge Discovery from Data, Vol. 8, No. 4, 19, 07.10.2014.

Research output: Contribution to journalArticle

@article{0a7ebb80bc204424bca738529c1ef939,
title = "Feature selection for social media data",
abstract = "Feature selection is widely used in preparing high-dimensional data for effective data mining. The explosive popularity of social media producesmassive and high-dimensional data at an unprecedented rate, presenting new challenges to feature selection. Social media data consists of (1) traditional high-dimensional, attributevalue data such as posts, tweets, comments, and images, and (2) linked data that provides social context for posts and describes the relationships between social media users as well as who generates the posts, and so on. The nature of social media also determines that its data is massive, noisy, and incomplete, which exacerbates the already challenging problem of feature selection. In this article, we study a novel feature selection problem of selecting features for social media data with its social context. In detail, we illustrate the differences between attribute-value data and social media data, investigate if linked data can be exploited in a new feature selection framework by taking advantage of social science theories. We design and conduct experiments on datasets from real-world social media Web sites, and the empirical results demonstrate that the proposed framework can significantly improve the performance of feature selection. Further experiments are conducted to evaluate the effects of user-user and user-post relationships manifested in linked data on feature selection, and research issues for future work will be discussed.",
keywords = "Feature selection, social context, social media data",
author = "Jiliang Tang and Huan Liu",
year = "2014",
month = "10",
day = "7",
doi = "10.1145/2629587",
language = "English (US)",
volume = "8",
journal = "ACM Transactions on Knowledge Discovery from Data",
issn = "1556-4681",
publisher = "Association for Computing Machinery (ACM)",
number = "4",

}

TY - JOUR

T1 - Feature selection for social media data

AU - Tang, Jiliang

AU - Liu, Huan

PY - 2014/10/7

Y1 - 2014/10/7

N2 - Feature selection is widely used in preparing high-dimensional data for effective data mining. The explosive popularity of social media producesmassive and high-dimensional data at an unprecedented rate, presenting new challenges to feature selection. Social media data consists of (1) traditional high-dimensional, attributevalue data such as posts, tweets, comments, and images, and (2) linked data that provides social context for posts and describes the relationships between social media users as well as who generates the posts, and so on. The nature of social media also determines that its data is massive, noisy, and incomplete, which exacerbates the already challenging problem of feature selection. In this article, we study a novel feature selection problem of selecting features for social media data with its social context. In detail, we illustrate the differences between attribute-value data and social media data, investigate if linked data can be exploited in a new feature selection framework by taking advantage of social science theories. We design and conduct experiments on datasets from real-world social media Web sites, and the empirical results demonstrate that the proposed framework can significantly improve the performance of feature selection. Further experiments are conducted to evaluate the effects of user-user and user-post relationships manifested in linked data on feature selection, and research issues for future work will be discussed.

AB - Feature selection is widely used in preparing high-dimensional data for effective data mining. The explosive popularity of social media producesmassive and high-dimensional data at an unprecedented rate, presenting new challenges to feature selection. Social media data consists of (1) traditional high-dimensional, attributevalue data such as posts, tweets, comments, and images, and (2) linked data that provides social context for posts and describes the relationships between social media users as well as who generates the posts, and so on. The nature of social media also determines that its data is massive, noisy, and incomplete, which exacerbates the already challenging problem of feature selection. In this article, we study a novel feature selection problem of selecting features for social media data with its social context. In detail, we illustrate the differences between attribute-value data and social media data, investigate if linked data can be exploited in a new feature selection framework by taking advantage of social science theories. We design and conduct experiments on datasets from real-world social media Web sites, and the empirical results demonstrate that the proposed framework can significantly improve the performance of feature selection. Further experiments are conducted to evaluate the effects of user-user and user-post relationships manifested in linked data on feature selection, and research issues for future work will be discussed.

KW - Feature selection

KW - social context

KW - social media data

UR - http://www.scopus.com/inward/record.url?scp=84908237935&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84908237935&partnerID=8YFLogxK

U2 - 10.1145/2629587

DO - 10.1145/2629587

M3 - Article

AN - SCOPUS:84908237935

VL - 8

JO - ACM Transactions on Knowledge Discovery from Data

JF - ACM Transactions on Knowledge Discovery from Data

SN - 1556-4681

IS - 4

M1 - 19

ER -