CoSelect: Feature selection with instance selection for social media data

Jiliang Tang, Huan Liu

Research output: Chapter in Book/Report/Conference proceedingConference contribution

18 Citations (Scopus)

Abstract

Feature selection is widely used in preparing high-dimensional data for effective data mining. Attribute-value data in traditional feature selection differs from social media data, although both can be large-scale. Social media data is inherently not independent and identically distributed (i.i.d.), but linked. Furthermore, there is a lot of noise. The quality of social media data can vary drastically. These unique properties present challenges as well as opportunities for feature selection. Motivated by these differences, we propose a novel feature selection framework, CoSelect, for social media data. In particular, CoSelect can exploit link information by applying social correlation theories, incorporate instance selection with feature selection, and select relevant instances and features simultaneously. Experimental results on real-world social media datasets demonstrate the effectiveness of our proposed framework and its potential in mining social media data.

Original languageEnglish (US)
Title of host publicationProceedings of the 2013 SIAM International Conference on Data Mining, SDM 2013
PublisherSiam Society
Pages695-703
Number of pages9
ISBN (Print)9781611972627
StatePublished - 2013
EventSIAM International Conference on Data Mining, SDM 2013 - Austin, United States
Duration: May 2 2013May 4 2013

Other

OtherSIAM International Conference on Data Mining, SDM 2013
CountryUnited States
CityAustin
Period5/2/135/4/13

Fingerprint

Feature extraction
Correlation theory
Data mining

ASJC Scopus subject areas

  • Computer Science Applications
  • Software

Cite this

Tang, J., & Liu, H. (2013). CoSelect: Feature selection with instance selection for social media data. In Proceedings of the 2013 SIAM International Conference on Data Mining, SDM 2013 (pp. 695-703). Siam Society.

CoSelect : Feature selection with instance selection for social media data. / Tang, Jiliang; Liu, Huan.

Proceedings of the 2013 SIAM International Conference on Data Mining, SDM 2013. Siam Society, 2013. p. 695-703.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Tang, J & Liu, H 2013, CoSelect: Feature selection with instance selection for social media data. in Proceedings of the 2013 SIAM International Conference on Data Mining, SDM 2013. Siam Society, pp. 695-703, SIAM International Conference on Data Mining, SDM 2013, Austin, United States, 5/2/13.
Tang J, Liu H. CoSelect: Feature selection with instance selection for social media data. In Proceedings of the 2013 SIAM International Conference on Data Mining, SDM 2013. Siam Society. 2013. p. 695-703
Tang, Jiliang ; Liu, Huan. / CoSelect : Feature selection with instance selection for social media data. Proceedings of the 2013 SIAM International Conference on Data Mining, SDM 2013. Siam Society, 2013. pp. 695-703
@inproceedings{31b61305b39f481896870f102cd2a018,
title = "CoSelect: Feature selection with instance selection for social media data",
abstract = "Feature selection is widely used in preparing high-dimensional data for effective data mining. Attribute-value data in traditional feature selection differs from social media data, although both can be large-scale. Social media data is inherently not independent and identically distributed (i.i.d.), but linked. Furthermore, there is a lot of noise. The quality of social media data can vary drastically. These unique properties present challenges as well as opportunities for feature selection. Motivated by these differences, we propose a novel feature selection framework, CoSelect, for social media data. In particular, CoSelect can exploit link information by applying social correlation theories, incorporate instance selection with feature selection, and select relevant instances and features simultaneously. Experimental results on real-world social media datasets demonstrate the effectiveness of our proposed framework and its potential in mining social media data.",
author = "Jiliang Tang and Huan Liu",
year = "2013",
language = "English (US)",
isbn = "9781611972627",
pages = "695--703",
booktitle = "Proceedings of the 2013 SIAM International Conference on Data Mining, SDM 2013",
publisher = "Siam Society",

}

TY - GEN

T1 - CoSelect

T2 - Feature selection with instance selection for social media data

AU - Tang, Jiliang

AU - Liu, Huan

PY - 2013

Y1 - 2013

N2 - Feature selection is widely used in preparing high-dimensional data for effective data mining. Attribute-value data in traditional feature selection differs from social media data, although both can be large-scale. Social media data is inherently not independent and identically distributed (i.i.d.), but linked. Furthermore, there is a lot of noise. The quality of social media data can vary drastically. These unique properties present challenges as well as opportunities for feature selection. Motivated by these differences, we propose a novel feature selection framework, CoSelect, for social media data. In particular, CoSelect can exploit link information by applying social correlation theories, incorporate instance selection with feature selection, and select relevant instances and features simultaneously. Experimental results on real-world social media datasets demonstrate the effectiveness of our proposed framework and its potential in mining social media data.

AB - Feature selection is widely used in preparing high-dimensional data for effective data mining. Attribute-value data in traditional feature selection differs from social media data, although both can be large-scale. Social media data is inherently not independent and identically distributed (i.i.d.), but linked. Furthermore, there is a lot of noise. The quality of social media data can vary drastically. These unique properties present challenges as well as opportunities for feature selection. Motivated by these differences, we propose a novel feature selection framework, CoSelect, for social media data. In particular, CoSelect can exploit link information by applying social correlation theories, incorporate instance selection with feature selection, and select relevant instances and features simultaneously. Experimental results on real-world social media datasets demonstrate the effectiveness of our proposed framework and its potential in mining social media data.

UR - http://www.scopus.com/inward/record.url?scp=84942434123&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84942434123&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:84942434123

SN - 9781611972627

SP - 695

EP - 703

BT - Proceedings of the 2013 SIAM International Conference on Data Mining, SDM 2013

PB - Siam Society

ER -