Abstract Background Compared with disease biomarkers in blood and urine, biomarkers in saliva have distinct advantages in clinical tests, as they can be conveniently examined through noninvasive sample collection. Therefore, identifying human saliva-secretory proteins and further detecting protein biomarkers in saliva have significant value in clinical medicine. There are only a few methods for predicting saliva-secretory proteins based on conventional machine learning algorithms, and all are highly dependent on annotated protein features. Unlike conventional machine learning algorithms, deep learning algorithms can automatically learn feature representations from input data and thus hold promise for predicting saliva-secretory proteins. Results We present a novel end-to-end deep learning model based on multilane capsule network (CapsNet) with differently sized convolution kernels to identify saliva-secretory proteins only from sequence information. The proposed model CapsNet-SSP outperforms existing methods based on conventional machine learning algorithms. Furthermore, the model performs better than other state-of-the-art deep learning architectures mostly used to analyze biological sequences. In addition, we further validate the effectiveness of CapsNet-SSP by comparison with human saliva-secretory proteins from existing studies and known salivary protein biomarkers of cancer. Conclusions The main contributions of this study are as follows: (1) an end-to-end model based on CapsNet is proposed to identify saliva-secretory proteins from the sequence information; (2) the proposed model achieves better performance and outperforms existing models; and (3) the saliva-secretory proteins predicted by our model are statistically significant compared with existing cancer biomarkers in saliva. In addition, a web server of CapsNet-SSP is developed for saliva-secretory protein identification, and it can be accessed at the following URL: http://www.csbg-jlu.info/CapsNet-SSP/. We believe that our model and web server will be useful for biomedical researchers who are interested in finding salivary protein biomarkers, especially when they have identified candidate proteins for analyzing diseased tissues near or distal to salivary glands using transcriptome or proteomics.
|Date made available||2020|