Family Rank: a graphical domain knowledge informed feature ranking algorithm

Michelle Saul; Valentin Dinu

doi:10.1093/bioinformatics/btab387

Family Rank: a graphical domain knowledge informed feature ranking algorithm

Michelle Saul, Valentin Dinu

Health Solutions, College of (CHS)

Research output: Contribution to journal › Article › peer-review

Abstract

Motivation: When designing prediction models built with many features and relatively small sample sizes, feature selection methods often overfit training data, leading to selection of irrelevant features. One way to potentially mitigate overfitting is to incorporate domain knowledge during feature selection. Here, a feature ranking algorithm called 'Family Rank' is presented in which features are ranked based on a combination of graphical domain knowledge and feature scores computed from empirical data. Results: A simulated dataset is used to demonstrate a scenario in which family rank outperforms other state-of-theart graph based ranking algorithms, decreasing the sample size needed to detect true predictors by 2- to 3-fold. An example from oncology is then used to explore a real-world application of family rank.

Original language	English (US)
Pages (from-to)	3626-3631
Number of pages	6
Journal	Bioinformatics
Volume	37
Issue number	20
DOIs	https://doi.org/10.1093/bioinformatics/btab387
State	Published - Oct 15 2021

ASJC Scopus subject areas

Statistics and Probability
Biochemistry
Molecular Biology
Computer Science Applications
Computational Theory and Mathematics
Computational Mathematics

Access to Document

10.1093/bioinformatics/btab387

Cite this

@article{9be973238ecf495caeaefd582f5b3029,

title = "Family Rank: a graphical domain knowledge informed feature ranking algorithm",

abstract = "Motivation: When designing prediction models built with many features and relatively small sample sizes, feature selection methods often overfit training data, leading to selection of irrelevant features. One way to potentially mitigate overfitting is to incorporate domain knowledge during feature selection. Here, a feature ranking algorithm called 'Family Rank' is presented in which features are ranked based on a combination of graphical domain knowledge and feature scores computed from empirical data. Results: A simulated dataset is used to demonstrate a scenario in which family rank outperforms other state-of-theart graph based ranking algorithms, decreasing the sample size needed to detect true predictors by 2- to 3-fold. An example from oncology is then used to explore a real-world application of family rank.",

author = "Michelle Saul and Valentin Dinu",

note = "Publisher Copyright: {\textcopyright} 2021 The Author(s).",

year = "2021",

month = oct,

day = "15",

doi = "10.1093/bioinformatics/btab387",

language = "English (US)",

volume = "37",

pages = "3626--3631",

journal = "Bioinformatics",

issn = "1367-4803",

publisher = "Oxford University Press",

number = "20",

}

TY - JOUR

T1 - Family Rank

T2 - a graphical domain knowledge informed feature ranking algorithm

AU - Saul, Michelle

AU - Dinu, Valentin

PY - 2021/10/15

Y1 - 2021/10/15

N2 - Motivation: When designing prediction models built with many features and relatively small sample sizes, feature selection methods often overfit training data, leading to selection of irrelevant features. One way to potentially mitigate overfitting is to incorporate domain knowledge during feature selection. Here, a feature ranking algorithm called 'Family Rank' is presented in which features are ranked based on a combination of graphical domain knowledge and feature scores computed from empirical data. Results: A simulated dataset is used to demonstrate a scenario in which family rank outperforms other state-of-theart graph based ranking algorithms, decreasing the sample size needed to detect true predictors by 2- to 3-fold. An example from oncology is then used to explore a real-world application of family rank.

AB - Motivation: When designing prediction models built with many features and relatively small sample sizes, feature selection methods often overfit training data, leading to selection of irrelevant features. One way to potentially mitigate overfitting is to incorporate domain knowledge during feature selection. Here, a feature ranking algorithm called 'Family Rank' is presented in which features are ranked based on a combination of graphical domain knowledge and feature scores computed from empirical data. Results: A simulated dataset is used to demonstrate a scenario in which family rank outperforms other state-of-theart graph based ranking algorithms, decreasing the sample size needed to detect true predictors by 2- to 3-fold. An example from oncology is then used to explore a real-world application of family rank.

UR - http://www.scopus.com/inward/record.url?scp=85134911010&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85134911010&partnerID=8YFLogxK

U2 - 10.1093/bioinformatics/btab387

DO - 10.1093/bioinformatics/btab387

M3 - Article

C2 - 34009295

AN - SCOPUS:85134911010

SN - 1367-4803

VL - 37

SP - 3626

EP - 3631

JO - Bioinformatics

JF - Bioinformatics

IS - 20

ER -

Family Rank: a graphical domain knowledge informed feature ranking algorithm

Abstract

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this