TY - JOUR
T1 - Feature selection for hierarchical classification via joint semantic and structural information of labels
AU - Huang, Hai
AU - Liu, Huan
N1 - Funding Information:
This work was supported by the National Natural Science Foundation of China under Grant No. 61802029 and China Scholarship Council . The authors wish to thank the anonymous reviewers and the editors for their valuable comments and suggestions.
Publisher Copyright:
© 2020 Elsevier B.V.
PY - 2020/5/11
Y1 - 2020/5/11
N2 - Hierarchical Classification is widely used in many real-world applications, where the label space is exhibited as a tree or a Directed Acyclic Graph (DAG) and each label has rich semantic descriptions. Feature selection, as a type of dimension reduction technique, has proven to be effective in improving the performance of machine learning algorithms. However, many existing feature selection methods cannot be directly applied to hierarchical classification problems since they ignore the hierarchical relations and take no advantage of the semantic information in the label space. In this paper, we propose a novel feature selection framework based on semantic and structural information of labels. First, we transform the label description into a mathematical representation and calculate the similarity score between labels as the semantic regularization. Second, we investigate the hierarchical relations in a tree structure of the label space as the structural regularization. Finally, we impose two regularization terms on a sparse learning based model for feature selection. Additionally, we adapt the proposed model to a DAG case, which makes our method more general and robust in many real-world tasks. Experimental results on real-world datasets demonstrate the effectiveness of the proposed framework for hierarchical classification domains.
AB - Hierarchical Classification is widely used in many real-world applications, where the label space is exhibited as a tree or a Directed Acyclic Graph (DAG) and each label has rich semantic descriptions. Feature selection, as a type of dimension reduction technique, has proven to be effective in improving the performance of machine learning algorithms. However, many existing feature selection methods cannot be directly applied to hierarchical classification problems since they ignore the hierarchical relations and take no advantage of the semantic information in the label space. In this paper, we propose a novel feature selection framework based on semantic and structural information of labels. First, we transform the label description into a mathematical representation and calculate the similarity score between labels as the semantic regularization. Second, we investigate the hierarchical relations in a tree structure of the label space as the structural regularization. Finally, we impose two regularization terms on a sparse learning based model for feature selection. Additionally, we adapt the proposed model to a DAG case, which makes our method more general and robust in many real-world tasks. Experimental results on real-world datasets demonstrate the effectiveness of the proposed framework for hierarchical classification domains.
KW - Feature selection
KW - Hierarchical classification
KW - Label hierarchical structure
KW - Label semantic similarity
UR - http://www.scopus.com/inward/record.url?scp=85079552550&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85079552550&partnerID=8YFLogxK
U2 - 10.1016/j.knosys.2020.105655
DO - 10.1016/j.knosys.2020.105655
M3 - Article
AN - SCOPUS:85079552550
SN - 0950-7051
VL - 195
JO - Knowledge-Based Systems
JF - Knowledge-Based Systems
M1 - 105655
ER -