Co-Select: Feature selection with instance selection for social media data

Jiliang Tang, Huan Liu

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Feature selection is widely used in preparing high-dimensional data for effective data mining. Attribute-value data in traditional feature selection differs from social media data, although both can be large-scale. Social media data is inherently not independent and identically distributed (i.i.d.), but linked. Furthermore, there is a lot of noise. The quality of social media data can vary drastically. These unique properties present challenges as well as opportunities for feature selection. Motivated by these differences, we propose a novel feature selection framework, CoSelect, for social media data. In particular, CoSelect can exploit link information by applying social correlation theories, incorporate instance selection with feature selection, and select relevant instances and features simultaneously. Experimental results on real-world social media dataseis demonstrate the effectiveness of our proposed framework and its potential in mining social media data.

Original languageEnglish (US)
Title of host publicationSIAM International Conference on Data Mining 2013, SMD 2013
PublisherSociety for Industrial and Applied Mathematics Publications
Pages695-703
Number of pages9
ISBN (Print)9781627487245
StatePublished - 2013
Event13th SIAM International Conference on Data Mining, SMD 2013 - Austin, United States
Duration: May 2 2013May 4 2013

Other

Other13th SIAM International Conference on Data Mining, SMD 2013
CountryUnited States
CityAustin
Period5/2/135/4/13

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Information Systems
  • Signal Processing
  • Software

Fingerprint Dive into the research topics of 'Co-Select: Feature selection with instance selection for social media data'. Together they form a unique fingerprint.

Cite this