Family Rank: a graphical domain knowledge informed feature ranking algorithm

Michelle Saul, Valentin Dinu

Research output: Contribution to journalArticlepeer-review

Abstract

Motivation: When designing prediction models built with many features and relatively small sample sizes, feature selection methods often overfit training data, leading to selection of irrelevant features. One way to potentially mitigate overfitting is to incorporate domain knowledge during feature selection. Here, a feature ranking algorithm called 'Family Rank' is presented in which features are ranked based on a combination of graphical domain knowledge and feature scores computed from empirical data. Results: A simulated dataset is used to demonstrate a scenario in which family rank outperforms other state-of-theart graph based ranking algorithms, decreasing the sample size needed to detect true predictors by 2- to 3-fold. An example from oncology is then used to explore a real-world application of family rank.

Original languageEnglish (US)
Pages (from-to)3626-3631
Number of pages6
JournalBioinformatics
Volume37
Issue number20
DOIs
StatePublished - Oct 15 2021

ASJC Scopus subject areas

  • Statistics and Probability
  • Biochemistry
  • Molecular Biology
  • Computer Science Applications
  • Computational Theory and Mathematics
  • Computational Mathematics

Fingerprint

Dive into the research topics of 'Family Rank: a graphical domain knowledge informed feature ranking algorithm'. Together they form a unique fingerprint.

Cite this