Machine learning to predict rapid progression of carotid atherosclerosis in patients with impaired glucose tolerance

Xia Hu, Peter D. Reaven, Aramesh Saremi, Ninghao Liu, Mohammad Ali Abbasi, Huan Liu, Raymond Q. Migrino, ACT NOW Study Investigators the ACT NOW Study Investigators

Research output: Contribution to journalArticle

5 Citations (Scopus)

Abstract

Objectives: Prediabetes is a major epidemic and is associated with adverse cardio-cerebrovascular outcomes. Early identification of patients who will develop rapid progression of atherosclerosis could be beneficial for improved risk stratification. In this paper, we investigate important factors impacting the prediction, using several machine learning methods, of rapid progression of carotid intima-media thickness in impaired glucose tolerance (IGT) participants. Methods: In the Actos Now for Prevention of Diabetes (ACT NOW) study, 382 participants with IGT underwent carotid intima-media thickness (CIMT) ultrasound evaluation at baseline and at 15–18 months, and were divided into rapid progressors (RP, n = 39, 58 ± 17.5 μM change) and non-rapid progressors (NRP, n = 343, 5.8 ± 20 μM change, p < 0.001 versus RP). To deal with complex multi-modal data consisting of demographic, clinical, and laboratory variables, we propose a general data-driven framework to investigate the ACT NOW dataset. In particular, we first employed a Fisher Score-based feature selection method to identify the most effective variables and then proposed a probabilistic Bayes-based learning method for the prediction. Comparison of the methods and factors was conducted using area under the receiver operating characteristic curve (AUC) analyses and Brier score. Results: The experimental results show that the proposed learning methods performed well in identifying or predicting RP. Among the methods, the performance of Naïve Bayes was the best (AUC 0.797, Brier score 0.085) compared to multilayer perceptron (0.729, 0.086) and random forest (0.642, 0.10). The results also show that feature selection has a significant positive impact on the data prediction performance. Conclusions: By dealing with multi-modal data, the proposed learning methods show effectiveness in predicting prediabetics at risk for rapid atherosclerosis progression. The proposed framework demonstrated utility in outcome prediction in a typical multidimensional clinical dataset with a relatively small number of subjects, extending the potential utility of machine learning approaches beyond extremely large-scale datasets.

Original languageEnglish (US)
Article number14
JournalEurasip Journal on Bioinformatics and Systems Biology
Volume2016
Issue number1
DOIs
StatePublished - Dec 1 2016

Fingerprint

Atherosclerosis
Carotid Artery Diseases
Glucose Intolerance
Glucose
Progression
Tolerance
Learning systems
Machine Learning
Predict
pioglitazone
Feature extraction
Carotid Intima-Media Thickness
Learning
Multilayer neural networks
Bayes
Medical problems
Feature Selection
Area Under Curve
Prediction
Ultrasonics

Keywords

  • Atherosclerosis
  • Diabetes
  • Machine learning
  • Model
  • Prognosis

ASJC Scopus subject areas

  • Signal Processing
  • Statistics and Probability
  • Computer Science(all)
  • Medicine(all)
  • General

Cite this

Machine learning to predict rapid progression of carotid atherosclerosis in patients with impaired glucose tolerance. / Hu, Xia; Reaven, Peter D.; Saremi, Aramesh; Liu, Ninghao; Abbasi, Mohammad Ali; Liu, Huan; Migrino, Raymond Q.; the ACT NOW Study Investigators, ACT NOW Study Investigators.

In: Eurasip Journal on Bioinformatics and Systems Biology, Vol. 2016, No. 1, 14, 01.12.2016.

Research output: Contribution to journalArticle

Hu, Xia ; Reaven, Peter D. ; Saremi, Aramesh ; Liu, Ninghao ; Abbasi, Mohammad Ali ; Liu, Huan ; Migrino, Raymond Q. ; the ACT NOW Study Investigators, ACT NOW Study Investigators. / Machine learning to predict rapid progression of carotid atherosclerosis in patients with impaired glucose tolerance. In: Eurasip Journal on Bioinformatics and Systems Biology. 2016 ; Vol. 2016, No. 1.
@article{e33e3145cb644d34b0d2aae31b55106d,
title = "Machine learning to predict rapid progression of carotid atherosclerosis in patients with impaired glucose tolerance",
abstract = "Objectives: Prediabetes is a major epidemic and is associated with adverse cardio-cerebrovascular outcomes. Early identification of patients who will develop rapid progression of atherosclerosis could be beneficial for improved risk stratification. In this paper, we investigate important factors impacting the prediction, using several machine learning methods, of rapid progression of carotid intima-media thickness in impaired glucose tolerance (IGT) participants. Methods: In the Actos Now for Prevention of Diabetes (ACT NOW) study, 382 participants with IGT underwent carotid intima-media thickness (CIMT) ultrasound evaluation at baseline and at 15–18 months, and were divided into rapid progressors (RP, n = 39, 58 ± 17.5 μM change) and non-rapid progressors (NRP, n = 343, 5.8 ± 20 μM change, p < 0.001 versus RP). To deal with complex multi-modal data consisting of demographic, clinical, and laboratory variables, we propose a general data-driven framework to investigate the ACT NOW dataset. In particular, we first employed a Fisher Score-based feature selection method to identify the most effective variables and then proposed a probabilistic Bayes-based learning method for the prediction. Comparison of the methods and factors was conducted using area under the receiver operating characteristic curve (AUC) analyses and Brier score. Results: The experimental results show that the proposed learning methods performed well in identifying or predicting RP. Among the methods, the performance of Na{\"i}ve Bayes was the best (AUC 0.797, Brier score 0.085) compared to multilayer perceptron (0.729, 0.086) and random forest (0.642, 0.10). The results also show that feature selection has a significant positive impact on the data prediction performance. Conclusions: By dealing with multi-modal data, the proposed learning methods show effectiveness in predicting prediabetics at risk for rapid atherosclerosis progression. The proposed framework demonstrated utility in outcome prediction in a typical multidimensional clinical dataset with a relatively small number of subjects, extending the potential utility of machine learning approaches beyond extremely large-scale datasets.",
keywords = "Atherosclerosis, Diabetes, Machine learning, Model, Prognosis",
author = "Xia Hu and Reaven, {Peter D.} and Aramesh Saremi and Ninghao Liu and Abbasi, {Mohammad Ali} and Huan Liu and Migrino, {Raymond Q.} and {the ACT NOW Study Investigators}, {ACT NOW Study Investigators}",
year = "2016",
month = "12",
day = "1",
doi = "10.1186/s13637-016-0049-6",
language = "English (US)",
volume = "2016",
journal = "Eurasip Journal on Bioinformatics and Systems Biology",
issn = "1687-4145",
publisher = "Springer Publishing Company",
number = "1",

}

TY - JOUR

T1 - Machine learning to predict rapid progression of carotid atherosclerosis in patients with impaired glucose tolerance

AU - Hu, Xia

AU - Reaven, Peter D.

AU - Saremi, Aramesh

AU - Liu, Ninghao

AU - Abbasi, Mohammad Ali

AU - Liu, Huan

AU - Migrino, Raymond Q.

AU - the ACT NOW Study Investigators, ACT NOW Study Investigators

PY - 2016/12/1

Y1 - 2016/12/1

N2 - Objectives: Prediabetes is a major epidemic and is associated with adverse cardio-cerebrovascular outcomes. Early identification of patients who will develop rapid progression of atherosclerosis could be beneficial for improved risk stratification. In this paper, we investigate important factors impacting the prediction, using several machine learning methods, of rapid progression of carotid intima-media thickness in impaired glucose tolerance (IGT) participants. Methods: In the Actos Now for Prevention of Diabetes (ACT NOW) study, 382 participants with IGT underwent carotid intima-media thickness (CIMT) ultrasound evaluation at baseline and at 15–18 months, and were divided into rapid progressors (RP, n = 39, 58 ± 17.5 μM change) and non-rapid progressors (NRP, n = 343, 5.8 ± 20 μM change, p < 0.001 versus RP). To deal with complex multi-modal data consisting of demographic, clinical, and laboratory variables, we propose a general data-driven framework to investigate the ACT NOW dataset. In particular, we first employed a Fisher Score-based feature selection method to identify the most effective variables and then proposed a probabilistic Bayes-based learning method for the prediction. Comparison of the methods and factors was conducted using area under the receiver operating characteristic curve (AUC) analyses and Brier score. Results: The experimental results show that the proposed learning methods performed well in identifying or predicting RP. Among the methods, the performance of Naïve Bayes was the best (AUC 0.797, Brier score 0.085) compared to multilayer perceptron (0.729, 0.086) and random forest (0.642, 0.10). The results also show that feature selection has a significant positive impact on the data prediction performance. Conclusions: By dealing with multi-modal data, the proposed learning methods show effectiveness in predicting prediabetics at risk for rapid atherosclerosis progression. The proposed framework demonstrated utility in outcome prediction in a typical multidimensional clinical dataset with a relatively small number of subjects, extending the potential utility of machine learning approaches beyond extremely large-scale datasets.

AB - Objectives: Prediabetes is a major epidemic and is associated with adverse cardio-cerebrovascular outcomes. Early identification of patients who will develop rapid progression of atherosclerosis could be beneficial for improved risk stratification. In this paper, we investigate important factors impacting the prediction, using several machine learning methods, of rapid progression of carotid intima-media thickness in impaired glucose tolerance (IGT) participants. Methods: In the Actos Now for Prevention of Diabetes (ACT NOW) study, 382 participants with IGT underwent carotid intima-media thickness (CIMT) ultrasound evaluation at baseline and at 15–18 months, and were divided into rapid progressors (RP, n = 39, 58 ± 17.5 μM change) and non-rapid progressors (NRP, n = 343, 5.8 ± 20 μM change, p < 0.001 versus RP). To deal with complex multi-modal data consisting of demographic, clinical, and laboratory variables, we propose a general data-driven framework to investigate the ACT NOW dataset. In particular, we first employed a Fisher Score-based feature selection method to identify the most effective variables and then proposed a probabilistic Bayes-based learning method for the prediction. Comparison of the methods and factors was conducted using area under the receiver operating characteristic curve (AUC) analyses and Brier score. Results: The experimental results show that the proposed learning methods performed well in identifying or predicting RP. Among the methods, the performance of Naïve Bayes was the best (AUC 0.797, Brier score 0.085) compared to multilayer perceptron (0.729, 0.086) and random forest (0.642, 0.10). The results also show that feature selection has a significant positive impact on the data prediction performance. Conclusions: By dealing with multi-modal data, the proposed learning methods show effectiveness in predicting prediabetics at risk for rapid atherosclerosis progression. The proposed framework demonstrated utility in outcome prediction in a typical multidimensional clinical dataset with a relatively small number of subjects, extending the potential utility of machine learning approaches beyond extremely large-scale datasets.

KW - Atherosclerosis

KW - Diabetes

KW - Machine learning

KW - Model

KW - Prognosis

UR - http://www.scopus.com/inward/record.url?scp=84986255572&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84986255572&partnerID=8YFLogxK

U2 - 10.1186/s13637-016-0049-6

DO - 10.1186/s13637-016-0049-6

M3 - Article

AN - SCOPUS:84986255572

VL - 2016

JO - Eurasip Journal on Bioinformatics and Systems Biology

JF - Eurasip Journal on Bioinformatics and Systems Biology

SN - 1687-4145

IS - 1

M1 - 14

ER -