Challenges of Feature Selection for Big Data Analytics

Jundong Li; Huan Liu

doi:10.1109/MIS.2017.38

Challenges of Feature Selection for Big Data Analytics

Jundong Li, Huan Liu

Research output: Contribution to journal › Article › peer-review

172 Scopus citations

Abstract

We're surrounded by huge amounts of large-scale high-dimensional data, but learning tasks require reduced data dimensionality. Feature selection has shown its effectiveness in many applications by building simpler and more comprehensive models, improving learning performance, and preparing clean, understandable data. Some unique characteristics of big data such as data velocity and data variety have presented challenges to the feature selection problem. In this article, the authors envision these challenges for big data analytics. To facilitate and promote feature selection research, they present an open source feature selection repository (scikit-feature) of popular algorithms.

Original language	English (US)
Article number	7887649
Pages (from-to)	9-15
Number of pages	7
Journal	IEEE Intelligent Systems
Volume	32
Issue number	2
DOIs	https://doi.org/10.1109/MIS.2017.38
State	Published - Mar 1 2017

Keywords

big data
feature selection
intelligent systems
repository

ASJC Scopus subject areas

Computer Networks and Communications
Artificial Intelligence

Access to Document

10.1109/MIS.2017.38

Cite this

@article{bf8739db7d9f4b0e9f7dc6d8c2bdd1dd,

title = "Challenges of Feature Selection for Big Data Analytics",

abstract = "We're surrounded by huge amounts of large-scale high-dimensional data, but learning tasks require reduced data dimensionality. Feature selection has shown its effectiveness in many applications by building simpler and more comprehensive models, improving learning performance, and preparing clean, understandable data. Some unique characteristics of big data such as data velocity and data variety have presented challenges to the feature selection problem. In this article, the authors envision these challenges for big data analytics. To facilitate and promote feature selection research, they present an open source feature selection repository (scikit-feature) of popular algorithms.",

keywords = "big data, feature selection, intelligent systems, repository",

author = "Jundong Li and Huan Liu",

note = "Funding Information: This material is, in part, supported by the US National Science Foundation (NSF) under grant numbers IIS-1217466 and 1614576. Publisher Copyright: {\textcopyright} 2017 IEEE.",

year = "2017",

month = mar,

day = "1",

doi = "10.1109/MIS.2017.38",

language = "English (US)",

volume = "32",

pages = "9--15",

journal = "IEEE Intelligent Systems",

issn = "1541-1672",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

number = "2",

}

TY - JOUR

T1 - Challenges of Feature Selection for Big Data Analytics

AU - Li, Jundong

AU - Liu, Huan

PY - 2017/3/1

Y1 - 2017/3/1

N2 - We're surrounded by huge amounts of large-scale high-dimensional data, but learning tasks require reduced data dimensionality. Feature selection has shown its effectiveness in many applications by building simpler and more comprehensive models, improving learning performance, and preparing clean, understandable data. Some unique characteristics of big data such as data velocity and data variety have presented challenges to the feature selection problem. In this article, the authors envision these challenges for big data analytics. To facilitate and promote feature selection research, they present an open source feature selection repository (scikit-feature) of popular algorithms.

AB - We're surrounded by huge amounts of large-scale high-dimensional data, but learning tasks require reduced data dimensionality. Feature selection has shown its effectiveness in many applications by building simpler and more comprehensive models, improving learning performance, and preparing clean, understandable data. Some unique characteristics of big data such as data velocity and data variety have presented challenges to the feature selection problem. In this article, the authors envision these challenges for big data analytics. To facilitate and promote feature selection research, they present an open source feature selection repository (scikit-feature) of popular algorithms.

KW - big data

KW - feature selection

KW - intelligent systems

KW - repository

UR - http://www.scopus.com/inward/record.url?scp=85017141803&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85017141803&partnerID=8YFLogxK

U2 - 10.1109/MIS.2017.38

DO - 10.1109/MIS.2017.38

M3 - Article

AN - SCOPUS:85017141803

SN - 1541-1672

VL - 32

SP - 9

EP - 15

JO - IEEE Intelligent Systems

JF - IEEE Intelligent Systems

IS - 2

M1 - 7887649

ER -

Challenges of Feature Selection for Big Data Analytics

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this