Multi-source learning with block-wise missing data for Alzheimer's disease prediction

Shuo Xiang, Lei Yuan, Wei Fan, Yalin Wang, Paul M. Thompson, Jieping Ye

Research output: Chapter in Book/Report/Conference proceedingConference contribution

41 Citations (Scopus)

Abstract

With the advances and increasing sophistication in data collection techniques, we are facing with large amounts of data collected from multiple heterogeneous sources in many applications. For example, in the study of Alzheimer's Disease (AD), different types of measurements such as neuroimages, gene/protein expression data, genetic data etc. are often collected and analyzed together for improved predictive power. It is believed that a joint learning of multiple data sources is beneficial as different data sources may contain complementary information, and feature-pruning and data source selection are critical for learning interpretable models from high-dimensional data. Very often the collected data comes with block-wise missing entries; for example, a patient with- out the MRI scan will have no information in the MRI data block, making his/her overall record incomplete. There has been a growing interest in the data mining community on expanding traditional techniques for single-source complete data analysis to the study of multi-source incomplete data. The key challenge is how to effiectively integrate information from multiple heterogeneous sources in the presence of block-wise missing data. In this paper we first investigate the situation of complete data and present a unified \bi- level" learning model for multi-source data. Then we give a natural extension of this model to the more challenging case with incomplete data. Our major contributions are three- fold: (1) the proposed models handle both feature-level and source-level analysis in a unified formulation and include several existing feature learning approaches as special cases; (2) the model for incomplete data avoids direct imputation of the missing elements and thus provides superior performances. Moreover, it can be easily generalized to other applications with block-wise missing data sources; (3) efficient optimization algorithms are presented for both the complete and incomplete models. We have performed comprehensive evaluations of the proposed models on the application of AD diagnosis. Our proposed models compare favorably against existing approaches.

Original languageEnglish (US)
Title of host publicationKDD 2013 - 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
PublisherAssociation for Computing Machinery
Pages185-193
Number of pages9
VolumePart F128815
ISBN (Electronic)9781450321747
DOIs
StatePublished - Aug 11 2013
Externally publishedYes
Event19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2013 - Chicago, United States
Duration: Aug 11 2013Aug 14 2013

Other

Other19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2013
CountryUnited States
CityChicago
Period8/11/138/14/13

Fingerprint

Magnetic resonance imaging
Data mining
Genes
Proteins
Magnetic Resonance Imaging

Keywords

  • Alzheimer's disease
  • Block-wise missing data
  • Multi-source
  • Optimization

ASJC Scopus subject areas

  • Software
  • Information Systems

Cite this

Xiang, S., Yuan, L., Fan, W., Wang, Y., Thompson, P. M., & Ye, J. (2013). Multi-source learning with block-wise missing data for Alzheimer's disease prediction. In KDD 2013 - 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (Vol. Part F128815, pp. 185-193). [2487594] Association for Computing Machinery. https://doi.org/10.1145/2487575.2487594

Multi-source learning with block-wise missing data for Alzheimer's disease prediction. / Xiang, Shuo; Yuan, Lei; Fan, Wei; Wang, Yalin; Thompson, Paul M.; Ye, Jieping.

KDD 2013 - 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Vol. Part F128815 Association for Computing Machinery, 2013. p. 185-193 2487594.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Xiang, S, Yuan, L, Fan, W, Wang, Y, Thompson, PM & Ye, J 2013, Multi-source learning with block-wise missing data for Alzheimer's disease prediction. in KDD 2013 - 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. vol. Part F128815, 2487594, Association for Computing Machinery, pp. 185-193, 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2013, Chicago, United States, 8/11/13. https://doi.org/10.1145/2487575.2487594
Xiang S, Yuan L, Fan W, Wang Y, Thompson PM, Ye J. Multi-source learning with block-wise missing data for Alzheimer's disease prediction. In KDD 2013 - 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Vol. Part F128815. Association for Computing Machinery. 2013. p. 185-193. 2487594 https://doi.org/10.1145/2487575.2487594
Xiang, Shuo ; Yuan, Lei ; Fan, Wei ; Wang, Yalin ; Thompson, Paul M. ; Ye, Jieping. / Multi-source learning with block-wise missing data for Alzheimer's disease prediction. KDD 2013 - 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Vol. Part F128815 Association for Computing Machinery, 2013. pp. 185-193
@inproceedings{5452223771944022b842dcd3fe207d54,
title = "Multi-source learning with block-wise missing data for Alzheimer's disease prediction",
abstract = "With the advances and increasing sophistication in data collection techniques, we are facing with large amounts of data collected from multiple heterogeneous sources in many applications. For example, in the study of Alzheimer's Disease (AD), different types of measurements such as neuroimages, gene/protein expression data, genetic data etc. are often collected and analyzed together for improved predictive power. It is believed that a joint learning of multiple data sources is beneficial as different data sources may contain complementary information, and feature-pruning and data source selection are critical for learning interpretable models from high-dimensional data. Very often the collected data comes with block-wise missing entries; for example, a patient with- out the MRI scan will have no information in the MRI data block, making his/her overall record incomplete. There has been a growing interest in the data mining community on expanding traditional techniques for single-source complete data analysis to the study of multi-source incomplete data. The key challenge is how to effiectively integrate information from multiple heterogeneous sources in the presence of block-wise missing data. In this paper we first investigate the situation of complete data and present a unified \bi- level{"} learning model for multi-source data. Then we give a natural extension of this model to the more challenging case with incomplete data. Our major contributions are three- fold: (1) the proposed models handle both feature-level and source-level analysis in a unified formulation and include several existing feature learning approaches as special cases; (2) the model for incomplete data avoids direct imputation of the missing elements and thus provides superior performances. Moreover, it can be easily generalized to other applications with block-wise missing data sources; (3) efficient optimization algorithms are presented for both the complete and incomplete models. We have performed comprehensive evaluations of the proposed models on the application of AD diagnosis. Our proposed models compare favorably against existing approaches.",
keywords = "Alzheimer's disease, Block-wise missing data, Multi-source, Optimization",
author = "Shuo Xiang and Lei Yuan and Wei Fan and Yalin Wang and Thompson, {Paul M.} and Jieping Ye",
year = "2013",
month = "8",
day = "11",
doi = "10.1145/2487575.2487594",
language = "English (US)",
volume = "Part F128815",
pages = "185--193",
booktitle = "KDD 2013 - 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining",
publisher = "Association for Computing Machinery",

}

TY - GEN

T1 - Multi-source learning with block-wise missing data for Alzheimer's disease prediction

AU - Xiang, Shuo

AU - Yuan, Lei

AU - Fan, Wei

AU - Wang, Yalin

AU - Thompson, Paul M.

AU - Ye, Jieping

PY - 2013/8/11

Y1 - 2013/8/11

N2 - With the advances and increasing sophistication in data collection techniques, we are facing with large amounts of data collected from multiple heterogeneous sources in many applications. For example, in the study of Alzheimer's Disease (AD), different types of measurements such as neuroimages, gene/protein expression data, genetic data etc. are often collected and analyzed together for improved predictive power. It is believed that a joint learning of multiple data sources is beneficial as different data sources may contain complementary information, and feature-pruning and data source selection are critical for learning interpretable models from high-dimensional data. Very often the collected data comes with block-wise missing entries; for example, a patient with- out the MRI scan will have no information in the MRI data block, making his/her overall record incomplete. There has been a growing interest in the data mining community on expanding traditional techniques for single-source complete data analysis to the study of multi-source incomplete data. The key challenge is how to effiectively integrate information from multiple heterogeneous sources in the presence of block-wise missing data. In this paper we first investigate the situation of complete data and present a unified \bi- level" learning model for multi-source data. Then we give a natural extension of this model to the more challenging case with incomplete data. Our major contributions are three- fold: (1) the proposed models handle both feature-level and source-level analysis in a unified formulation and include several existing feature learning approaches as special cases; (2) the model for incomplete data avoids direct imputation of the missing elements and thus provides superior performances. Moreover, it can be easily generalized to other applications with block-wise missing data sources; (3) efficient optimization algorithms are presented for both the complete and incomplete models. We have performed comprehensive evaluations of the proposed models on the application of AD diagnosis. Our proposed models compare favorably against existing approaches.

AB - With the advances and increasing sophistication in data collection techniques, we are facing with large amounts of data collected from multiple heterogeneous sources in many applications. For example, in the study of Alzheimer's Disease (AD), different types of measurements such as neuroimages, gene/protein expression data, genetic data etc. are often collected and analyzed together for improved predictive power. It is believed that a joint learning of multiple data sources is beneficial as different data sources may contain complementary information, and feature-pruning and data source selection are critical for learning interpretable models from high-dimensional data. Very often the collected data comes with block-wise missing entries; for example, a patient with- out the MRI scan will have no information in the MRI data block, making his/her overall record incomplete. There has been a growing interest in the data mining community on expanding traditional techniques for single-source complete data analysis to the study of multi-source incomplete data. The key challenge is how to effiectively integrate information from multiple heterogeneous sources in the presence of block-wise missing data. In this paper we first investigate the situation of complete data and present a unified \bi- level" learning model for multi-source data. Then we give a natural extension of this model to the more challenging case with incomplete data. Our major contributions are three- fold: (1) the proposed models handle both feature-level and source-level analysis in a unified formulation and include several existing feature learning approaches as special cases; (2) the model for incomplete data avoids direct imputation of the missing elements and thus provides superior performances. Moreover, it can be easily generalized to other applications with block-wise missing data sources; (3) efficient optimization algorithms are presented for both the complete and incomplete models. We have performed comprehensive evaluations of the proposed models on the application of AD diagnosis. Our proposed models compare favorably against existing approaches.

KW - Alzheimer's disease

KW - Block-wise missing data

KW - Multi-source

KW - Optimization

UR - http://www.scopus.com/inward/record.url?scp=84991665233&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84991665233&partnerID=8YFLogxK

U2 - 10.1145/2487575.2487594

DO - 10.1145/2487575.2487594

M3 - Conference contribution

AN - SCOPUS:84991665233

VL - Part F128815

SP - 185

EP - 193

BT - KDD 2013 - 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

PB - Association for Computing Machinery

ER -