Integrating domain knowledge with statistical and data mining methods for high-density genomic SNP disease association analysis

Valentin Dinu, Hongyu Zhao, Perry L. Miller

Research output: Contribution to journalArticle

21 Scopus citations


Genome-wide association studies can help identify multi-gene contributions to disease. As the number of high-density genomic markers tested increases, however, so does the number of loci associated with disease by chance. Performing a brute-force test for the interaction of four or more high-density genomic loci is unfeasible given the current computational limitations. Heuristics must be employed to limit the number of statistical tests performed. In this paper we explore the use of biological domain knowledge to supplement statistical analysis and data mining methods to identify genes and pathways associated with disease. We describe Pathway/SNP, a software application designed to help evaluate the association between pathways and disease. Pathway/SNP integrates domain knowledge-SNP, gene and pathway annotation from multiple sources-with statistical and data mining algorithms into a tool that can be used to explore the etiology of complex diseases.

Original languageEnglish (US)
Pages (from-to)750-760
Number of pages11
JournalJournal of Biomedical Informatics
Issue number6
Publication statusPublished - Dec 2007
Externally publishedYes



  • Data integration
  • Data mining
  • False discovery rate (FDR)
  • Genome-wide association (GWA)
  • Pathway-based disease association
  • Single nucleotide polymorphisms (SNP)

ASJC Scopus subject areas

  • Computer Science Applications
  • Health Informatics
  • Computer Science (miscellaneous)
  • Catalysis

Cite this