Earlier experiments suggest that the evolutionary information (conservation and coevolution) encoded in protein sequences is necessary and sufficient to specify the fold of a protein family. However, there is no computational work to quantify the effect of such evolutionary information on the folding process. Here we explore the role of early folding steps for sequences designed using coevolution and conservation through a combination of computational and experimental methods. We simulated a repertoire of native and designed WW domain sequences to analyze early local contact formation and found that the N-terminal β-hairpin turn would not form correctly due to strong non-native local contacts in unfoldable sequences. Through a maximum likelihood approach, we identified five local contacts that play a critical role in folding, suggesting that a small subset of amino acid pairs can be used to solve the “needle in the haystack” problem to design foldable sequences. Thus, using the contact probability of those five local contacts that form during the early stage of folding, we built a classification model that predicts the foldability of a WW sequence with 81% accuracy. This classification model was used to redesign WW domain sequences that could not fold due to frustration and make them foldable by introducing a few mutations that led to the stabilization of these critical local contacts. The experimental analysis shows that a redesigned sequence folds and binds to polyproline peptides with a similar affinity as those observed for native WW domains. Overall, our analysis shows that evolutionary-designed sequences should not only satisfy the folding stability but also ensure a minimally frustrated folding landscape.
ASJC Scopus subject areas
- Physical and Theoretical Chemistry
- Surfaces, Coatings and Films
- Materials Chemistry