Motivation: Protein secondary structure prediction is an important step towards understanding how proteins fold in three dimensions. Recent analysis by information theory indicates that the correlation between neighboring secondary structures are much stronger than that of neighboring amino acids. In this article, we focus on the combination problem for sequences, i.e. combining the scores or assignments from single or multiple prediction systems under the constraint of a whole sequence, as a target for improvement in protein secondary structure prediction. Results: We apply several graphical chain models to solve the combination problem and show that they are consistently more effective than the traditional window-based methods. In particular, conditional random fields (CRFs) moderately improve the predictions for helices and, more importantly, for beta sheets, which are the major bottleneck for protein secondary structure prediction.
ASJC Scopus subject areas
- Statistics and Probability
- Molecular Biology
- Computer Science Applications
- Computational Theory and Mathematics
- Computational Mathematics