TY - JOUR
T1 - Bi-level multi-source learning for heterogeneous block-wise missing data
AU - Alzheimer's Disease Neuroimaging Initiative
AU - Xiang, Shuo
AU - Yuan, Lei
AU - Fan, Wei
AU - Wang, Yalin
AU - Thompson, Paul M.
AU - Ye, Jieping
N1 - Publisher Copyright:
© 2013 Elsevier Inc.
PY - 2014/11/5
Y1 - 2014/11/5
N2 - Bio-imaging technologies allow scientists to collect large amounts of high-dimensional data from multiple heterogeneous sources for many biomedical applications. In the study of Alzheimer's Disease (AD), neuroimaging data, gene/protein expression data, etc., are often analyzed together to improve predictive power. Joint learning from multiple complementary data sources is advantageous, but feature-pruning and data source selection are critical to learn interpretable models from high-dimensional data. Often, the data collected has block-wise missing entries. In the Alzheimer's Disease Neuroimaging Initiative (ADNI), most subjects have MRI and genetic information, but only half have cerebrospinal fluid (CSF) measures, a different half has FDG-PET; only some have proteomic data. Here we propose how to effectively integrate information from multiple heterogeneous data sources when data is block-wise missing. We present a unified "bi-level" learning model for complete multi-source data, and extend it to incomplete data. Our major contributions are: (1) our proposed models unify feature-level and source-level analysis, including several existing feature learning approaches as special cases; (2) the model for incomplete data avoids imputing missing data and offers superior performance; it generalizes to other applications with block-wise missing data sources; (3) we present efficient optimization algorithms for modeling complete and incomplete data. We comprehensively evaluate the proposed models including all ADNI subjects with at least one of four data types at baseline: MRI, FDG-PET, CSF and proteomics. Our proposed models compare favorably with existing approaches.
AB - Bio-imaging technologies allow scientists to collect large amounts of high-dimensional data from multiple heterogeneous sources for many biomedical applications. In the study of Alzheimer's Disease (AD), neuroimaging data, gene/protein expression data, etc., are often analyzed together to improve predictive power. Joint learning from multiple complementary data sources is advantageous, but feature-pruning and data source selection are critical to learn interpretable models from high-dimensional data. Often, the data collected has block-wise missing entries. In the Alzheimer's Disease Neuroimaging Initiative (ADNI), most subjects have MRI and genetic information, but only half have cerebrospinal fluid (CSF) measures, a different half has FDG-PET; only some have proteomic data. Here we propose how to effectively integrate information from multiple heterogeneous data sources when data is block-wise missing. We present a unified "bi-level" learning model for complete multi-source data, and extend it to incomplete data. Our major contributions are: (1) our proposed models unify feature-level and source-level analysis, including several existing feature learning approaches as special cases; (2) the model for incomplete data avoids imputing missing data and offers superior performance; it generalizes to other applications with block-wise missing data sources; (3) we present efficient optimization algorithms for modeling complete and incomplete data. We comprehensively evaluate the proposed models including all ADNI subjects with at least one of four data types at baseline: MRI, FDG-PET, CSF and proteomics. Our proposed models compare favorably with existing approaches.
KW - Alzheimer's disease
KW - Block-wise missing data
KW - Multi-modal fusion
KW - Multi-source
KW - Optimization
UR - http://www.scopus.com/inward/record.url?scp=84908378904&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84908378904&partnerID=8YFLogxK
U2 - 10.1016/j.neuroimage.2013.08.015
DO - 10.1016/j.neuroimage.2013.08.015
M3 - Review article
C2 - 23988272
AN - SCOPUS:84908378904
SN - 1053-8119
VL - 102
SP - 192
EP - 206
JO - NeuroImage
JF - NeuroImage
IS - P1
ER -