Mapping SequenceStructure Function Landscape by Integrating Evolutionary Landscape Inference with Folding and Dynamics

Project: Research project

Project Details


Mapping SequenceStructure Function Landscape by Integrating Evolutionary Landscape Inference with Folding and Dynamics Mapping Sequence-Structure Function Landscape by Integrating Evolutionary Landscape Inference with Folding and Dynamics Evolution has been a single massive ongoing experiment in diversification and optimization of protein sequence-structure-function relation over billions of years. This leads to proteins with unique functional characteristics such as (i) displaying tolerance to random mutations yet evolving for new functions, (ii) involving in allosteric regulations, (iii) interacting with more than one partner yet showing specificity in their interaction. All these are encoded in the evolutionary history. With the advancement in sequencing, and evolutionary inference methods, it is now possible to obtain co-evolved positions and also conservation profiles. The next challenge is to develop methods that will biophysically characterize all the co-evolved positions to evaluate quantitatively their contribution to folding and function. This aims to answer the how and why a set of co-evolved positions are critical for fold and function. Only after this, can we come to understand how evolutionary landscape designs the functional landscape in proteomic world. Here we propose to develop novel integrated experimental and computational methods to tackle this challenge. We will apply our developed methods to one of the well-known protein interaction domain (PID) the WW domain as a model system. They are independently folding, smallest ideal PID system that has all the unique characteristics discussed above. Moreover, they are one of the most important functional modules in cell, mediating regulatory protein complexes in various signaling networks involved in physiological and disease states. Our hypothesis is that pairwise contacts forming early on in the folding pathway have a dominant effect over the folding of a domain, and that amino acids at those positions are always coevolved. Onto this framework, function is achieved through dynamically coupled positions that modulate binding allosterically; these positions are also coevolved. To obtain the minimum set of positions critical for the fold and function and how to alter these positions, we will: Aim#1 Determine crucial contacts for WW domain fold and design novel sequences of WW domains: We hypothesize that all the information required for specifying the fold and characteristic function of small proteins such as PIDs may be sufficiently encoded in a small set of amino acid interactions. In the past, there were also efforts to design novel sequences using co-evolutionary information and sequence conservation per position based on multiple sequence alignment of native sequences. However, only ~30 % of these evolutionary based approach yielded foldable sequences, leaving a question to be solved: What evolutionary information was missing in 70 % that was also designed by the same evolutionary information. Our hypothesis is that while co-evolution and conservation may satisfy the necessary information to an average stable fold (i.e. minimum folding stability), a small set of co-evolved positions still needs to be considered for folding kinetics. Particularly those local contacts (i.e. contacts that are close in sequence that nucleate the folding needs to be identified. In other words, designed sequences not only satisfy minimum folding stability, but also ensure a minimally frustrated folding landscape (i.e. strength of non-native contacts versus strength of native contacts on folding pathway). Thus, we propose (i) to analyze and identify co-evolved pairs through recently developed inference-based information theory (ii) to design artificial sequences based on highly co-evolved sites and their amino-acid frequency space, and then verify their foldability and stability by experimental characterization, (iii) to quantify contributions of each co-evolved contacts to WW domain folding stability and kinetics through experimental and computational biophysical measurements. Aim#2 Determine the mechanism of Peptide-WW domain interactions and unveil the design principles of binding specificity: WW domains bind to proline-rich peptides within their target proteins, a process highly regulated by allostery. We hypothesize that certain co-evolved positions should retain information regarding allosteric regulations and binding specificity. While this idea may make intuitive sense based on recent data, it is entirely unproven how to evaluate each co-evolved positions contributions accurately. Our hypothesis is that positions that have strong dynamic coupling with the binding sites should also have co-evolved. Therefore, we propose to (i) obtain evolutionary coupling values for the non-binding interface sites that may co-evolve with the binding sites, (ii) screen the evolutionary coupling information by determining the positions that are dynamically coupled to critical binding sites (iii) generate sequence-binding selectivity maps of WW domains by mutating these screened, co-evolved positions along with the binding site positions, and obtain binding affinities for all WW domain-peptide binding events through the deep scanning method, (iv) provide a mechanistic description of how these critical, co-evolved sites affect allosteric interactions and binding specificity through conformational dynamics analysis. The long term goal of this proposal is to devise means for predicting factors that dictate fold and recognition in protein interaction domains (PIDs), which underpin all cellular functions . Intellectual Merit This project will result in cutting edge computational tools to understand and enhance recognition in PIDs and enable to design artificial PID modules.this could lead to the engineering of non-natural signaling proteins and pathways with novel, amplified or suppressed behavior. Second, using designed modules, it is possible to decipher the PID mediated signaling pathways, opening up a new way to identify and characterize PID mediated protein interaction networks. Finally, the design principles of PIDs could lead to broader drug design efforts such as artificial blocking peptides or small molecules, widening our arsenal to treat a large number of diseases. Broader Impacts. The multidisciplinary, computational and experimental nature of this proposal offers unique learning opportunities to trainees at the graduate, undergraduate, and high school level. The PIs have established a mentoring program for high school students through close connections to local high schools, specifically Basis Schools (Ahwatuekee and Scottsdale) and Arete Prepatory. The program reaches a large number of students and is comprehensive of teaching modules, campus visits, and research internships. As part of their Senior Projects, high school students participating in internships further disseminate their experience through public presentations and through a blog that conveys their experiences in the lab.
Effective start/end date8/1/177/31/21


  • NSF: Directorate for Biological Sciences (BIO): $300,041.00


Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.