Abstract

Feature selection is widely used in preparing high-dimensional data for effective data mining. The explosive popularity of social media producesmassive and high-dimensional data at an unprecedented rate, presenting new challenges to feature selection. Social media data consists of (1) traditional high-dimensional, attributevalue data such as posts, tweets, comments, and images, and (2) linked data that provides social context for posts and describes the relationships between social media users as well as who generates the posts, and so on. The nature of social media also determines that its data is massive, noisy, and incomplete, which exacerbates the already challenging problem of feature selection. In this article, we study a novel feature selection problem of selecting features for social media data with its social context. In detail, we illustrate the differences between attribute-value data and social media data, investigate if linked data can be exploited in a new feature selection framework by taking advantage of social science theories. We design and conduct experiments on datasets from real-world social media Web sites, and the empirical results demonstrate that the proposed framework can significantly improve the performance of feature selection. Further experiments are conducted to evaluate the effects of user-user and user-post relationships manifested in linked data on feature selection, and research issues for future work will be discussed.

Original languageEnglish (US)
Article number19
JournalACM Transactions on Knowledge Discovery from Data
Volume8
Issue number4
DOIs
StatePublished - Oct 7 2014

Keywords

  • Feature selection
  • social context
  • social media data

ASJC Scopus subject areas

  • Computer Science(all)

Fingerprint Dive into the research topics of 'Feature selection for social media data'. Together they form a unique fingerprint.

  • Cite this