TY - JOUR
T1 - Missing information imputation for disease-dedicated social networks with heterogeneous auxiliary data
AU - Liu, Xu
AU - He, Jingrui
AU - Min, Wanli
AU - Yang, Hongxia
N1 - Funding Information:
This work is supported by National Science Foundation under Grant No. IIS-1947203 and Grant No. IIS-1813464, the U.S. Department of Homeland Security under Grant Award Number 17STQAC00001-02-00 and Ordering Agreement Number HSHQDC-16-A-B0001, and an IBM Faculty Award. The views and conclusions are those of the authors and should not be interpreted as representing the official policies of the funding agencies or the government.
Publisher Copyright:
© 2020, © 2020 “IISE”.
PY - 2020/4/2
Y1 - 2020/4/2
N2 - Many high impact applications suffer from missing information. For example, disease-dedicated social networks provide additional resources to glimpse into patients’ daily life related to disease management. However, due to the voluntary nature of such social networks, the information reported by patients is often incomplete, making the following data analytics tasks particularly challenging. On the other hand, in addition to the target data that we aim to analyze, we may also have other related data at our disposal. For example, to analyze disease-dedicated social networks, auxiliary clinical data (with potentially non-overlapping patients), as well as the users’ online social relationship might provide additional information for estimating the missing information. Therefore, the key question we aim to answer in this paper is how we can leverage the heterogeneous auxiliary data for the sake of missing information imputation. To answer this question, we focus on diabetes-dedicated social networks, and we aim to estimate the missing information from patients’ self-reported biomarker measurements. In particular, we propose a hypergraph structure to model the relationship among users and user-generated content (posts). Based on the hypergraph structure, we further introduce an optimization framework to estimate the missing biomarker measurements using heterogeneous auxiliary data. To solve the optimization framework, we design iterative algorithms to find the local optimal solution. Experimental results on both synthetic and real data sets (including a data set collected from a diabetes-dedicated social network) demonstrate the effectiveness of the proposed algorithms.
AB - Many high impact applications suffer from missing information. For example, disease-dedicated social networks provide additional resources to glimpse into patients’ daily life related to disease management. However, due to the voluntary nature of such social networks, the information reported by patients is often incomplete, making the following data analytics tasks particularly challenging. On the other hand, in addition to the target data that we aim to analyze, we may also have other related data at our disposal. For example, to analyze disease-dedicated social networks, auxiliary clinical data (with potentially non-overlapping patients), as well as the users’ online social relationship might provide additional information for estimating the missing information. Therefore, the key question we aim to answer in this paper is how we can leverage the heterogeneous auxiliary data for the sake of missing information imputation. To answer this question, we focus on diabetes-dedicated social networks, and we aim to estimate the missing information from patients’ self-reported biomarker measurements. In particular, we propose a hypergraph structure to model the relationship among users and user-generated content (posts). Based on the hypergraph structure, we further introduce an optimization framework to estimate the missing biomarker measurements using heterogeneous auxiliary data. To solve the optimization framework, we design iterative algorithms to find the local optimal solution. Experimental results on both synthetic and real data sets (including a data set collected from a diabetes-dedicated social network) demonstrate the effectiveness of the proposed algorithms.
KW - Disease-dedicated social network
KW - heterogeneous learning
KW - missing value imputation
UR - http://www.scopus.com/inward/record.url?scp=85078922626&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85078922626&partnerID=8YFLogxK
U2 - 10.1080/24725579.2020.1716115
DO - 10.1080/24725579.2020.1716115
M3 - Article
AN - SCOPUS:85078922626
SN - 2472-5579
VL - 10
SP - 87
EP - 98
JO - IISE Transactions on Healthcare Systems Engineering
JF - IISE Transactions on Healthcare Systems Engineering
IS - 2
ER -