A methodology for the construction of quantitative, predictive models of physiology from transcriptional profiles is presented. The method utilizes partial least squares (PLS) regression properly modified to allow gene pre-selection based on their signal-to-noise ratio (SNR). The final set of genes is obtained from a consensus ranking of genes across several thousand trials, each carried out with a different set of training samples. The method was tested with transcriptional data from a large-scale microarray study profiling the effects of high-fat diet on the diet-induced obese mouse model C57BL/6J, and the obese-resistant A/J mouse model. Quantitative predictive models were constructed for the age of the C57BL/6J mice and the A/J mice, and for the insulin and leptin levels of the C57Bl/6J mice based on transcriptional data of liver obtained over a 12-week period. Similarly, models for the growth rate of yeast mutants, and the age of Drosophila samples were developed from literature data. Specifically, it is demonstrated that highly predictive models can be constructed with current levels of precision in DNA microarray measurements provided the variation in the physiological measurements is controlled. Genes identified by this method are important for their ability to collectively predict phenotype. The method can be expanded to include various types of physiological or cellular data, thus providing an integrative framework for the construction of predictive models.
- Insulin resistance
- Partial least squares
ASJC Scopus subject areas
- Applied Microbiology and Biotechnology