TY - GEN
T1 - Scaling of Union of Intersections for Inference of Granger Causal Networks from Observational Data
AU - Balasubramanian, Mahesh
AU - Ruiz, Trevor D.
AU - Cook, Brandon
AU - Prabhat, Mr
AU - Bhattacharyya, Sharmodeep
AU - Shrivastava, Aviral
AU - Bouchard, Kristofer E.
N1 - Funding Information:
This research used resources of the National Energy Research Scientific Computing Center (NERSC), a U.S. Department of Energy Office of Science User Facility operated under Contract No. DE-AC02-05CH11231. This work was partially supported by funding from the National Science Foundation grants CCF 1723476 - the NSF/Intel joint research center for Computer Assisted Programming for Heterogeneous Architectures (CAPA). K.E.B. was funded by Lawrence Berkeley National Laboratory-internal LDRD “Neuro/NanoTechnology for BRAIN” led by Peter Denes. This research was sponsored by the U.S. Army Research Laboratory and Defense Advanced Research Projects Agency under Cooperative Agreement Number W911NF-15-2-0056. The views, opinions, and/or findings contained in this material are those of the authors and should not be interpreted as representing the official views or policies of the Department of Defense or the U.S. Government.
Publisher Copyright:
© 2020 IEEE.
PY - 2020/5
Y1 - 2020/5
N2 - The development of advanced recording and measurement devices in scientific fields is producing high-dimensional time series data. Vector autoregressive (VAR) models are well suited for inferring Granger-causal networks from high dimensional time series data sets, but accurate inference at scale remains a central challenge. We have recently introduced a flexible and scalable statistical machine learning framework, Union of Intersections (UoI), which enables low false-positive and low false-negative feature selection along with low bias and low variance estimation, enhancing interpretation and predictive accuracy. In this paper, we scale the UoI framework for VAR models (algorithm UoIV AR) to infer network connectivity from large time series data sets (TBs). To achieve this, we optimize distributed convex optimization and introduce novel strategies for improved data read and data distribution times. We study the strong and weak scaling of the algorithm on a Xeon-phi based supercomputer (100,000 cores). These advances enable us to estimate the largest VAR model as known (1000 nodes, corresponding to 1M parameters) and apply it to large time series data from neurophysiology (192 neurons) and finance (470 companies).
AB - The development of advanced recording and measurement devices in scientific fields is producing high-dimensional time series data. Vector autoregressive (VAR) models are well suited for inferring Granger-causal networks from high dimensional time series data sets, but accurate inference at scale remains a central challenge. We have recently introduced a flexible and scalable statistical machine learning framework, Union of Intersections (UoI), which enables low false-positive and low false-negative feature selection along with low bias and low variance estimation, enhancing interpretation and predictive accuracy. In this paper, we scale the UoI framework for VAR models (algorithm UoIV AR) to infer network connectivity from large time series data sets (TBs). To achieve this, we optimize distributed convex optimization and introduce novel strategies for improved data read and data distribution times. We study the strong and weak scaling of the algorithm on a Xeon-phi based supercomputer (100,000 cores). These advances enable us to estimate the largest VAR model as known (1000 nodes, corresponding to 1M parameters) and apply it to large time series data from neurophysiology (192 neurons) and finance (470 companies).
UR - http://www.scopus.com/inward/record.url?scp=85088895616&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85088895616&partnerID=8YFLogxK
U2 - 10.1109/IPDPS47924.2020.00036
DO - 10.1109/IPDPS47924.2020.00036
M3 - Conference contribution
AN - SCOPUS:85088895616
T3 - Proceedings - 2020 IEEE 34th International Parallel and Distributed Processing Symposium, IPDPS 2020
SP - 264
EP - 273
BT - Proceedings - 2020 IEEE 34th International Parallel and Distributed Processing Symposium, IPDPS 2020
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 34th IEEE International Parallel and Distributed Processing Symposium, IPDPS 2020
Y2 - 18 May 2020 through 22 May 2020
ER -