From micro to macro: Data driven phenotyping by densification of longitudinal electronic medical records

Jiayu Zhou; Fei Wang; Jianying Hu; Jieping Ye

doi:10.1145/2623330.2623711

From micro to macro: Data driven phenotyping by densification of longitudinal electronic medical records

Jiayu Zhou, Fei Wang, Jianying Hu, Jieping Ye

Computing and Augmented Intelligence, School of (IAFSE-SCAI)

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

102 Scopus citations

Abstract

Inferring phenotypic patterns from population-scale clinical data is a core computational task in the development of personalized medicine. One important source of data on which to conduct this type of research is patient Electronic Medical Records (EMR). However, the patient EMRs are typically sparse and noisy, which creates significant challenges if we use them directly to represent patient phenotypes. In this paper, we propose a data driven phenotyping framework called Pacifier (PAtient reCord densIFIER), where we interpret the longitudinal EMR data of each patient as a sparse matrix with a feature dimension and a time dimension, and derive more robust patient phenotypes by exploring the latent structure of those matrices. Specifically, we assume that each derived phenotype is composed of a subset of the medical features contained in original patient EMR, whose value evolves smoothly over time. We propose two formulations to achieve such goal. One is Individual Basis Approach (IBA), which assumes the phenotypes are different for every patient. The other is Shared Basis Approach (SBA), which assumes the patient population shares a common set of phenotypes. We develop an efficient optimization algorithm that is capable of resolving both problems efficiently. Finally we validate Pacifier on two real world EMR cohorts for the tasks of early prediction of Congestive Heart Failure (CHF) and End Stage Renal Disease (ESRD). Our results show that the predictive performance in both tasks can be improved significantly by the proposed algorithms (average AUC score improved from 0.689 to 0.816 on CHF, and from 0.756 to 0.838 on ESRD respectively, on diagnosis group granularity). We also illustrate some interesting phenotypes derived from our data.

Original language	English (US)
Title of host publication	KDD 2014 - Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
Publisher	Association for Computing Machinery
Pages	135-144
Number of pages	10
ISBN (Print)	9781450329569
DOIs	https://doi.org/10.1145/2623330.2623711
State	Published - 2014
Event	20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2014 - New York, NY, United States Duration: Aug 24 2014 → Aug 27 2014

Publication series

Name	Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

Conference

Conference	20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2014
Country/Territory	United States
City	New York, NY
Period	8/24/14 → 8/27/14

Keywords

densification
matrix completion
medical informatics
phenotyping
sparse learning

ASJC Scopus subject areas

Software
Information Systems

Access to Document

10.1145/2623330.2623711

Cite this

Zhou, J., Wang, F., Hu, J., & Ye, J. (2014). From micro to macro: Data driven phenotyping by densification of longitudinal electronic medical records. In KDD 2014 - Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 135-144). (Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining). Association for Computing Machinery. https://doi.org/10.1145/2623330.2623711

From micro to macro: Data driven phenotyping by densification of longitudinal electronic medical records. / Zhou, Jiayu; Wang, Fei; Hu, Jianying et al.
KDD 2014 - Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Association for Computing Machinery, 2014. p. 135-144 (Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Zhou, J, Wang, F, Hu, J & Ye, J 2014, From micro to macro: Data driven phenotyping by densification of longitudinal electronic medical records. in KDD 2014 - Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Association for Computing Machinery, pp. 135-144, 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2014, New York, NY, United States, 8/24/14. https://doi.org/10.1145/2623330.2623711

Zhou J, Wang F, Hu J, Ye J. From micro to macro: Data driven phenotyping by densification of longitudinal electronic medical records. In KDD 2014 - Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Association for Computing Machinery. 2014. p. 135-144. (Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining). doi: 10.1145/2623330.2623711

Zhou, Jiayu ; Wang, Fei ; Hu, Jianying et al. / From micro to macro : Data driven phenotyping by densification of longitudinal electronic medical records. KDD 2014 - Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Association for Computing Machinery, 2014. pp. 135-144 (Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining).

@inproceedings{55c12a0772924c1b9a1e920be401e4af,

title = "From micro to macro: Data driven phenotyping by densification of longitudinal electronic medical records",

abstract = "Inferring phenotypic patterns from population-scale clinical data is a core computational task in the development of personalized medicine. One important source of data on which to conduct this type of research is patient Electronic Medical Records (EMR). However, the patient EMRs are typically sparse and noisy, which creates significant challenges if we use them directly to represent patient phenotypes. In this paper, we propose a data driven phenotyping framework called Pacifier (PAtient reCord densIFIER), where we interpret the longitudinal EMR data of each patient as a sparse matrix with a feature dimension and a time dimension, and derive more robust patient phenotypes by exploring the latent structure of those matrices. Specifically, we assume that each derived phenotype is composed of a subset of the medical features contained in original patient EMR, whose value evolves smoothly over time. We propose two formulations to achieve such goal. One is Individual Basis Approach (IBA), which assumes the phenotypes are different for every patient. The other is Shared Basis Approach (SBA), which assumes the patient population shares a common set of phenotypes. We develop an efficient optimization algorithm that is capable of resolving both problems efficiently. Finally we validate Pacifier on two real world EMR cohorts for the tasks of early prediction of Congestive Heart Failure (CHF) and End Stage Renal Disease (ESRD). Our results show that the predictive performance in both tasks can be improved significantly by the proposed algorithms (average AUC score improved from 0.689 to 0.816 on CHF, and from 0.756 to 0.838 on ESRD respectively, on diagnosis group granularity). We also illustrate some interesting phenotypes derived from our data.",

keywords = "densification, matrix completion, medical informatics, phenotyping, sparse learning",

author = "Jiayu Zhou and Fei Wang and Jianying Hu and Jieping Ye",

year = "2014",

doi = "10.1145/2623330.2623711",

language = "English (US)",

isbn = "9781450329569",

series = "Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining",

publisher = "Association for Computing Machinery",

pages = "135--144",

booktitle = "KDD 2014 - Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining",

note = "20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2014 ; Conference date: 24-08-2014 Through 27-08-2014",

}

TY - GEN

T1 - From micro to macro

T2 - 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2014

AU - Zhou, Jiayu

AU - Wang, Fei

AU - Hu, Jianying

AU - Ye, Jieping

PY - 2014

Y1 - 2014

N2 - Inferring phenotypic patterns from population-scale clinical data is a core computational task in the development of personalized medicine. One important source of data on which to conduct this type of research is patient Electronic Medical Records (EMR). However, the patient EMRs are typically sparse and noisy, which creates significant challenges if we use them directly to represent patient phenotypes. In this paper, we propose a data driven phenotyping framework called Pacifier (PAtient reCord densIFIER), where we interpret the longitudinal EMR data of each patient as a sparse matrix with a feature dimension and a time dimension, and derive more robust patient phenotypes by exploring the latent structure of those matrices. Specifically, we assume that each derived phenotype is composed of a subset of the medical features contained in original patient EMR, whose value evolves smoothly over time. We propose two formulations to achieve such goal. One is Individual Basis Approach (IBA), which assumes the phenotypes are different for every patient. The other is Shared Basis Approach (SBA), which assumes the patient population shares a common set of phenotypes. We develop an efficient optimization algorithm that is capable of resolving both problems efficiently. Finally we validate Pacifier on two real world EMR cohorts for the tasks of early prediction of Congestive Heart Failure (CHF) and End Stage Renal Disease (ESRD). Our results show that the predictive performance in both tasks can be improved significantly by the proposed algorithms (average AUC score improved from 0.689 to 0.816 on CHF, and from 0.756 to 0.838 on ESRD respectively, on diagnosis group granularity). We also illustrate some interesting phenotypes derived from our data.

AB - Inferring phenotypic patterns from population-scale clinical data is a core computational task in the development of personalized medicine. One important source of data on which to conduct this type of research is patient Electronic Medical Records (EMR). However, the patient EMRs are typically sparse and noisy, which creates significant challenges if we use them directly to represent patient phenotypes. In this paper, we propose a data driven phenotyping framework called Pacifier (PAtient reCord densIFIER), where we interpret the longitudinal EMR data of each patient as a sparse matrix with a feature dimension and a time dimension, and derive more robust patient phenotypes by exploring the latent structure of those matrices. Specifically, we assume that each derived phenotype is composed of a subset of the medical features contained in original patient EMR, whose value evolves smoothly over time. We propose two formulations to achieve such goal. One is Individual Basis Approach (IBA), which assumes the phenotypes are different for every patient. The other is Shared Basis Approach (SBA), which assumes the patient population shares a common set of phenotypes. We develop an efficient optimization algorithm that is capable of resolving both problems efficiently. Finally we validate Pacifier on two real world EMR cohorts for the tasks of early prediction of Congestive Heart Failure (CHF) and End Stage Renal Disease (ESRD). Our results show that the predictive performance in both tasks can be improved significantly by the proposed algorithms (average AUC score improved from 0.689 to 0.816 on CHF, and from 0.756 to 0.838 on ESRD respectively, on diagnosis group granularity). We also illustrate some interesting phenotypes derived from our data.

KW - densification

KW - matrix completion

KW - medical informatics

KW - phenotyping

KW - sparse learning

UR - http://www.scopus.com/inward/record.url?scp=84907034165&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84907034165&partnerID=8YFLogxK

U2 - 10.1145/2623330.2623711

DO - 10.1145/2623330.2623711

M3 - Conference contribution

AN - SCOPUS:84907034165

SN - 9781450329569

T3 - Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

SP - 135

EP - 144

BT - KDD 2014 - Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

PB - Association for Computing Machinery

Y2 - 24 August 2014 through 27 August 2014

ER -

From micro to macro: Data driven phenotyping by densification of longitudinal electronic medical records

Abstract

Publication series

Conference

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this