TY - JOUR
T1 - Interpretable network propagation with application to expanding the repertoire of human proteins that interact with SARS-CoV-2
AU - Law, Jeffrey N.
AU - Akers, Kyle
AU - Tasnina, Nure
AU - Santina, Catherine M.Della
AU - Deutsch, Shay
AU - Kshirsagar, Meghana
AU - Klein-Seetharaman, Judith
AU - Crovella, Mark
AU - Rajagopalan, Padmavathy
AU - Kasif, Simon
AU - Murali, T. M.
N1 - Funding Information:
T.M.M. acknowledges support from National Science Foundation (NSF) grants DBI-1759858 and MCB-1817736. K.A. acknowledges support from the Genetics, Bioinformatics, and Computational Biology program at Virginia Tech. J.K. acknowledges support from NSF grant CCF-2029543. M.C. acknowledges support from NSF grant CNS-1618207. C.M.D.S. acknowledges support from the Hariri Institute and the Department of Biomedical Engineering at Boston University. P.R. acknowledges support from NSF grant CBET-1510920 and USDA-NIFA grant 2018-07578. P.R. and T.M.M. acknowledge support from the Computational Tissue Engineering Graduate Education Program at Virginia Tech.
Publisher Copyright:
© 2021 The Author(s).
PY - 2021/12/1
Y1 - 2021/12/1
N2 - Background: Network propagation has been widely used for nearly 20 years to predict gene functions and phenotypes. Despite the popularity of this approach, little attention has been paid to the question of provenance tracing in this context, e.g., determining how much any experimental observation in the input contributes to the score of every prediction. Results: We design a network propagation framework with 2 novel components and apply it to predict human proteins that directly or indirectly interact with SARS-CoV-2 proteins. First, we trace the provenance of each prediction to its experimentally validated sources, which in our case are human proteins experimentally determined to interact with viral proteins. Second, we design a technique that helps to reduce the manual adjustment of parameters by users. We find that for every top-ranking prediction, the highest contribution to its score arises from a direct neighbor in a human protein-protein interaction network. We further analyze these results to develop functional insights on SARS-CoV-2 that expand on known biology such as the connection between endoplasmic reticulum stress, HSPA5, and anti-clotting agents. Conclusions: We examine how our provenance-tracing method can be generalized to a broad class of network-based algorithms. We provide a useful resource for the SARS-CoV-2 community that implicates many previously undocumented proteins with putative functional relationships to viral infection. This resource includes potential drugs that can be opportunistically repositioned to target these proteins. We also discuss how our overall framework can be extended to other, newly emerging viruses.
AB - Background: Network propagation has been widely used for nearly 20 years to predict gene functions and phenotypes. Despite the popularity of this approach, little attention has been paid to the question of provenance tracing in this context, e.g., determining how much any experimental observation in the input contributes to the score of every prediction. Results: We design a network propagation framework with 2 novel components and apply it to predict human proteins that directly or indirectly interact with SARS-CoV-2 proteins. First, we trace the provenance of each prediction to its experimentally validated sources, which in our case are human proteins experimentally determined to interact with viral proteins. Second, we design a technique that helps to reduce the manual adjustment of parameters by users. We find that for every top-ranking prediction, the highest contribution to its score arises from a direct neighbor in a human protein-protein interaction network. We further analyze these results to develop functional insights on SARS-CoV-2 that expand on known biology such as the connection between endoplasmic reticulum stress, HSPA5, and anti-clotting agents. Conclusions: We examine how our provenance-tracing method can be generalized to a broad class of network-based algorithms. We provide a useful resource for the SARS-CoV-2 community that implicates many previously undocumented proteins with putative functional relationships to viral infection. This resource includes potential drugs that can be opportunistically repositioned to target these proteins. We also discuss how our overall framework can be extended to other, newly emerging viruses.
KW - COVID-19
KW - SARS-CoV-2
KW - interpretable machine learning
KW - network propagation
KW - provenance tracing
KW - virus-host protein interaction networks
UR - http://www.scopus.com/inward/record.url?scp=85123036851&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85123036851&partnerID=8YFLogxK
U2 - 10.1093/gigascience/giab082
DO - 10.1093/gigascience/giab082
M3 - Article
C2 - 34966926
AN - SCOPUS:85123036851
SN - 2047-217X
VL - 10
JO - GigaScience
JF - GigaScience
IS - 12
M1 - giab082
ER -