TY - GEN
T1 - Context sensitive vocabulary and its application in protein secondary structure prediction
AU - Liu, Yan
AU - Carbonell, Jaime
AU - Klein-Seetharaman, Judith
AU - Gooalakrishnan, Vanathi
PY - 2004
Y1 - 2004
N2 - Protein secondary structure prediction is an important step towards understanding the relation between protein sequence and structure. However, most current prediction methods use features difficult for biologists to interpret. In this paper, we present a new method that applies information retrieval techniques to solve the problem: we extract a context sensitive biological vocabulary for protein sequences and apply text classification methods to predict protein secondary structure. Experimental results show that our method performs comparably to the state-of-art methods. Furthermore, the context sensitive vocabularies can serve as a useful tool to discover meaningful regular expression patterns for protein structures.
AB - Protein secondary structure prediction is an important step towards understanding the relation between protein sequence and structure. However, most current prediction methods use features difficult for biologists to interpret. In this paper, we present a new method that applies information retrieval techniques to solve the problem: we extract a context sensitive biological vocabulary for protein sequences and apply text classification methods to predict protein secondary structure. Experimental results show that our method performs comparably to the state-of-art methods. Furthermore, the context sensitive vocabularies can serve as a useful tool to discover meaningful regular expression patterns for protein structures.
KW - Biological language
KW - Context Sensitive Vocabulary
KW - Protein secondary structure prediction
UR - http://www.scopus.com/inward/record.url?scp=8644253682&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=8644253682&partnerID=8YFLogxK
U2 - 10.1145/1008992.1009109
DO - 10.1145/1008992.1009109
M3 - Conference contribution
AN - SCOPUS:8644253682
SN - 1581138814
SN - 9781581138818
T3 - Proceedings of Sheffield SIGIR - Twenty-Seventh Annual International ACM SIGIR Conference on Research and Development in Information Retrieval
SP - 538
EP - 539
BT - Proceedings of Sheffield SIGIR - Twenty-Seventh Annual International ACM SIGIR Conference on Research and Development in Information Retrieval
PB - Association for Computing Machinery (ACM)
T2 - Proceedings of Sheffield SIGIR - Twenty-Seventh Annual International ACM SIGIR Conference on Research and Development in Information Retrieval
Y2 - 25 July 2004 through 29 July 2004
ER -