Abstract

The prevalent use of social media produces mountains of unlabeled, high-dimensional data. Feature selection has been shown effective in dealing with high-dimensional data for efficient data mining. Feature selection for unlabeled data remains a challenging task due to the absence of label information by which the feature relevance can be assessed. The unique characteristics of social media data further complicate the already challenging problem of unsupervised feature selection, (e.g., part of social media data is linked, which makes invalid the independent and identically distributed assumption), bringing about new challenges to traditional unsupervised feature selection algorithms. In this paper, we study the differences between social media data and traditional attribute-value data, investigate if the relations revealed in linked data can be used to help select relevant features, and propose a novel unsupervised feature selection framework, LUFS, for linked social media data. We perform experiments with real-world social media datasets to evaluate the effectiveness of the proposed framework and probe the working of its key components.

Original languageEnglish (US)
Title of host publicationProceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
Pages904-912
Number of pages9
DOIs
StatePublished - 2012
Event18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2012 - Beijing, China
Duration: Aug 12 2012Aug 16 2012

Other

Other18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2012
CountryChina
CityBeijing
Period8/12/128/16/12

Fingerprint

Feature extraction
Data mining
Labels
Experiments

Keywords

  • linked social media data
  • pseudo-class label
  • unsupervised feature selection

ASJC Scopus subject areas

  • Software
  • Information Systems

Cite this

Tang, J., & Liu, H. (2012). Unsupervised feature selection for linked social media data. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 904-912) https://doi.org/10.1145/2339530.2339673

Unsupervised feature selection for linked social media data. / Tang, Jiliang; Liu, Huan.

Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2012. p. 904-912.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Tang, J & Liu, H 2012, Unsupervised feature selection for linked social media data. in Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. pp. 904-912, 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2012, Beijing, China, 8/12/12. https://doi.org/10.1145/2339530.2339673
Tang J, Liu H. Unsupervised feature selection for linked social media data. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2012. p. 904-912 https://doi.org/10.1145/2339530.2339673
Tang, Jiliang ; Liu, Huan. / Unsupervised feature selection for linked social media data. Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2012. pp. 904-912
@inproceedings{53ed8e4682d74c2db3b096f39dc5473f,
title = "Unsupervised feature selection for linked social media data",
abstract = "The prevalent use of social media produces mountains of unlabeled, high-dimensional data. Feature selection has been shown effective in dealing with high-dimensional data for efficient data mining. Feature selection for unlabeled data remains a challenging task due to the absence of label information by which the feature relevance can be assessed. The unique characteristics of social media data further complicate the already challenging problem of unsupervised feature selection, (e.g., part of social media data is linked, which makes invalid the independent and identically distributed assumption), bringing about new challenges to traditional unsupervised feature selection algorithms. In this paper, we study the differences between social media data and traditional attribute-value data, investigate if the relations revealed in linked data can be used to help select relevant features, and propose a novel unsupervised feature selection framework, LUFS, for linked social media data. We perform experiments with real-world social media datasets to evaluate the effectiveness of the proposed framework and probe the working of its key components.",
keywords = "linked social media data, pseudo-class label, unsupervised feature selection",
author = "Jiliang Tang and Huan Liu",
year = "2012",
doi = "10.1145/2339530.2339673",
language = "English (US)",
isbn = "9781450314626",
pages = "904--912",
booktitle = "Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining",

}

TY - GEN

T1 - Unsupervised feature selection for linked social media data

AU - Tang, Jiliang

AU - Liu, Huan

PY - 2012

Y1 - 2012

N2 - The prevalent use of social media produces mountains of unlabeled, high-dimensional data. Feature selection has been shown effective in dealing with high-dimensional data for efficient data mining. Feature selection for unlabeled data remains a challenging task due to the absence of label information by which the feature relevance can be assessed. The unique characteristics of social media data further complicate the already challenging problem of unsupervised feature selection, (e.g., part of social media data is linked, which makes invalid the independent and identically distributed assumption), bringing about new challenges to traditional unsupervised feature selection algorithms. In this paper, we study the differences between social media data and traditional attribute-value data, investigate if the relations revealed in linked data can be used to help select relevant features, and propose a novel unsupervised feature selection framework, LUFS, for linked social media data. We perform experiments with real-world social media datasets to evaluate the effectiveness of the proposed framework and probe the working of its key components.

AB - The prevalent use of social media produces mountains of unlabeled, high-dimensional data. Feature selection has been shown effective in dealing with high-dimensional data for efficient data mining. Feature selection for unlabeled data remains a challenging task due to the absence of label information by which the feature relevance can be assessed. The unique characteristics of social media data further complicate the already challenging problem of unsupervised feature selection, (e.g., part of social media data is linked, which makes invalid the independent and identically distributed assumption), bringing about new challenges to traditional unsupervised feature selection algorithms. In this paper, we study the differences between social media data and traditional attribute-value data, investigate if the relations revealed in linked data can be used to help select relevant features, and propose a novel unsupervised feature selection framework, LUFS, for linked social media data. We perform experiments with real-world social media datasets to evaluate the effectiveness of the proposed framework and probe the working of its key components.

KW - linked social media data

KW - pseudo-class label

KW - unsupervised feature selection

UR - http://www.scopus.com/inward/record.url?scp=84866052116&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84866052116&partnerID=8YFLogxK

U2 - 10.1145/2339530.2339673

DO - 10.1145/2339530.2339673

M3 - Conference contribution

SN - 9781450314626

SP - 904

EP - 912

BT - Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

ER -