TY - JOUR
T1 - Multi-source feature learning for joint analysis of incomplete multiple heterogeneous neuroimaging data
AU - Yuan, Lei
AU - Wang, Yalin
AU - Thompson, Paul M.
AU - Narayan, Vaibhav A.
AU - Ye, Jieping
N1 - Funding Information:
Data collection and sharing for this project were funded by the Alzheimer's Disease Neuroimaging Initiative (ADNI) ( National Institutes of Health Grant U01 AG024904 ). ADNI is funded by the National Institute on Aging , the National Institute of Biomedical Imaging and Bioengineering , and through generous contributions from the following: Abbott; Alzheimer's Association; Alzheimer's Drug Discovery Foundation; Amorfix Life Sciences Ltd.; AstraZeneca; Bayer HealthCare; BioClinica, Inc.; Biogen Idec Inc.; Bristol-Myers Squibb Company; Eisai Inc.; Elan Pharmaceuticals Inc.; Eli Lilly and Company; F. Hoffmann-La Roche Ltd and its affiliated company Genentech, Inc.; GE Healthcare; Innogenetics, N.V.; Janssen Alzheimer Immunotherapy Research & Development, LLC.; Johnson & Johnson Pharmaceutical Research & Development LLC.; Medpace, Inc.; Merck & Co., Inc.; Meso Scale Diagnostics, LLC.; Novartis Pharmaceuticals Corporation; Pfizer Inc.; Servier; Synarc Inc.; and Takeda Pharmaceutical Company. The Canadian Institutes of Health Research is providing funds to support ADNI clinical sites in Canada. Private sector contributions are facilitated by the Foundation for the National Institutes of Health ( www.fnih.org ). The grantee organization is the Northern California Institute for Research and Education, and the study is coordinated by the Alzheimer's Disease Cooperative Study at the University of California, San Diego. ADNI data are disseminated by the Laboratory for Neuro Imaging at the University of California, Los Angeles. This research was also supported by NIH grants P30 AG010129 and K01 AG030514 , and the Dana Foundation .
Funding Information:
This work was funded by the National Institute on Aging ( AG016570 to PMT), the National Library of Medicine , the National Institute for Biomedical Imaging and Bioengineering , and the National Center for Research Resources ( LM05639 , EB01651 , and RR019771 to PMT), the US National Science Foundation (NSF ) ( IIS-0812551 and IIS-0953662 to JY), and the National Library of Medicine ( R01 LM010730 to JY).
PY - 2012/7/2
Y1 - 2012/7/2
N2 - Analysis of incomplete data is a big challenge when integrating large-scale brain imaging datasets from different imaging modalities. In the Alzheimer's Disease Neuroimaging Initiative (ADNI), for example, over half of the subjects lack cerebrospinal fluid (CSF) measurements; an independent half of the subjects do not have fluorodeoxyglucose positron emission tomography (FDG-PET) scans; many lack proteomics measurements. Traditionally, subjects with missing measures are discarded, resulting in a severe loss of available information. In this paper, we address this problem by proposing an incomplete Multi-Source Feature (iMSF) learning method where all the samples (with at least one available data source) can be used. To illustrate the proposed approach, we classify patients from the ADNI study into groups with Alzheimer's disease (AD), mild cognitive impairment (MCI) and normal controls, based on the multi-modality data. At baseline, ADNI's 780 participants (172. AD, 397 MCI, 211 NC), have at least one of four data types: magnetic resonance imaging (MRI), FDG-PET, CSF and proteomics. These data are used to test our algorithm. Depending on the problem being solved, we divide our samples according to the availability of data sources, and we learn shared sets of features with state-of-the-art sparse learning methods. To build a practical and robust system, we construct a classifier ensemble by combining our method with four other methods for missing value estimation. Comprehensive experiments with various parameters show that our proposed iMSF method and the ensemble model yield stable and promising results.
AB - Analysis of incomplete data is a big challenge when integrating large-scale brain imaging datasets from different imaging modalities. In the Alzheimer's Disease Neuroimaging Initiative (ADNI), for example, over half of the subjects lack cerebrospinal fluid (CSF) measurements; an independent half of the subjects do not have fluorodeoxyglucose positron emission tomography (FDG-PET) scans; many lack proteomics measurements. Traditionally, subjects with missing measures are discarded, resulting in a severe loss of available information. In this paper, we address this problem by proposing an incomplete Multi-Source Feature (iMSF) learning method where all the samples (with at least one available data source) can be used. To illustrate the proposed approach, we classify patients from the ADNI study into groups with Alzheimer's disease (AD), mild cognitive impairment (MCI) and normal controls, based on the multi-modality data. At baseline, ADNI's 780 participants (172. AD, 397 MCI, 211 NC), have at least one of four data types: magnetic resonance imaging (MRI), FDG-PET, CSF and proteomics. These data are used to test our algorithm. Depending on the problem being solved, we divide our samples according to the availability of data sources, and we learn shared sets of features with state-of-the-art sparse learning methods. To build a practical and robust system, we construct a classifier ensemble by combining our method with four other methods for missing value estimation. Comprehensive experiments with various parameters show that our proposed iMSF method and the ensemble model yield stable and promising results.
KW - Ensemble
KW - Incomplete data
KW - Multi-source feature learning
KW - Multi-task learning
UR - http://www.scopus.com/inward/record.url?scp=84861187815&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84861187815&partnerID=8YFLogxK
U2 - 10.1016/j.neuroimage.2012.03.059
DO - 10.1016/j.neuroimage.2012.03.059
M3 - Article
C2 - 22498655
AN - SCOPUS:84861187815
VL - 61
SP - 622
EP - 632
JO - NeuroImage
JF - NeuroImage
SN - 1053-8119
IS - 3
ER -