Random forest similarity for protein-protein interaction prediction from multiple sources

Yanjun Qi, Judith Klein-Sbetharaman, Ziv Bar-Joseph

Research output: Chapter in Book/Report/Conference proceedingConference contribution

164 Scopus citations

Abstract

One of the most important, but often ignored, parts of any clustering and classification algorithm is the computation of the similarity matrix. This is especially important when integrating high throughput biological data sources because of the high noise rates and the many missing values. In this paper we present a new method to compute such similarities for the task of classifying pairs of proteins as interacting or not. Our method uses direct and indirect information about interaction pairs to constructs a random forest (a collection of decision tress) from a training set. The resulting forest is used to determine the similarity between protein pairs and this similarity is used by a classification algorithm (a modified kNN) to classify protein pairs. Testing the algorithm on yeast data indicates that it is able to improve coverage to 20% of interacting pairs with a false positive rate of 50%. These results compare favorably with all previously suggested methods for this task indicating the importance of robust similarity estimates.

Original languageEnglish (US)
Title of host publicationProceedings of the Pacific Symposium on Biocomputing 2005, PSB 2005
Pages531-542
Number of pages12
StatePublished - 2005
Externally publishedYes
Event10th Pacific Symposium on Biocomputing, PSB 2005 - Big Island of Hawaii, United States
Duration: Jan 4 2005Jan 8 2005

Publication series

NameProceedings of the Pacific Symposium on Biocomputing 2005, PSB 2005

Other

Other10th Pacific Symposium on Biocomputing, PSB 2005
Country/TerritoryUnited States
CityBig Island of Hawaii
Period1/4/051/8/05

ASJC Scopus subject areas

  • Computational Theory and Mathematics
  • Biomedical Engineering

Fingerprint

Dive into the research topics of 'Random forest similarity for protein-protein interaction prediction from multiple sources'. Together they form a unique fingerprint.

Cite this