TY - JOUR
T1 - Individualized passenger travel pattern multi-clustering based on graph regularized tensor latent dirichlet allocation
AU - Li, Ziyue
AU - Yan, Hao
AU - Zhang, Chen
AU - Tsung, Fugee
N1 - Funding Information:
This work is collaborative research with Hong Kong MTR Co. under the RGC GRF 16201718 and 16216119. The authors appreciate the help from MTR research, marketing, and customer service teams. The preliminary idea of this work has been presented IEEE ICDE 2021 Ph.D. Symposium (Li et al. 2021). The authors would like to give their appreciations to the comments and advice from the symposium.
Publisher Copyright:
© 2022, The Author(s).
PY - 2022/7
Y1 - 2022/7
N2 - Individual passenger travel patterns have significant value in understanding passenger’s behavior, such as learning the hidden clusters of locations, time, and passengers. The learned clusters further enable commercially beneficial actions such as customized services, promotions, data-driven urban-use planning, peak hour discovery, and so on. However, the individualized passenger modeling is very challenging for the following reasons: 1) The individual passenger travel data are multi-dimensional spatiotemporal big data, including at least the origin, destination, and time dimensions; 2) Moreover, individualized passenger travel patterns usually depend on the external environment, such as the distances and functions of locations, which are ignored in most current works. This work proposes a multi-clustering model to learn the latent clusters along the multiple dimensions of Origin, Destination, Time, and eventually, Passenger (ODT-P). We develop a graph-regularized tensor Latent Dirichlet Allocation (LDA) model by first extending the traditional LDA model into a tensor version and then applies to individual travel data. Then, the external information of stations is formulated as semantic graphs and incorporated as the Laplacian regularizations; Furthermore, to improve the model scalability when dealing with massive data, an online stochastic learning method based on tensorized variational Expectation-Maximization algorithm is developed. Finally, a case study based on passengers in the Hong Kong metro system is conducted and demonstrates that a better clustering performance is achieved compared to state-of-the-arts with the improvement in point-wise mutual information index and algorithm convergence speed by a factor of two.
AB - Individual passenger travel patterns have significant value in understanding passenger’s behavior, such as learning the hidden clusters of locations, time, and passengers. The learned clusters further enable commercially beneficial actions such as customized services, promotions, data-driven urban-use planning, peak hour discovery, and so on. However, the individualized passenger modeling is very challenging for the following reasons: 1) The individual passenger travel data are multi-dimensional spatiotemporal big data, including at least the origin, destination, and time dimensions; 2) Moreover, individualized passenger travel patterns usually depend on the external environment, such as the distances and functions of locations, which are ignored in most current works. This work proposes a multi-clustering model to learn the latent clusters along the multiple dimensions of Origin, Destination, Time, and eventually, Passenger (ODT-P). We develop a graph-regularized tensor Latent Dirichlet Allocation (LDA) model by first extending the traditional LDA model into a tensor version and then applies to individual travel data. Then, the external information of stations is formulated as semantic graphs and incorporated as the Laplacian regularizations; Furthermore, to improve the model scalability when dealing with massive data, an online stochastic learning method based on tensorized variational Expectation-Maximization algorithm is developed. Finally, a case study based on passengers in the Hong Kong metro system is conducted and demonstrates that a better clustering performance is achieved compared to state-of-the-arts with the improvement in point-wise mutual information index and algorithm convergence speed by a factor of two.
KW - Graph structure
KW - Individualized analysis
KW - Online algorithm
KW - Spatiotemporal data
KW - Tensor
KW - Topic model
UR - http://www.scopus.com/inward/record.url?scp=85132195172&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85132195172&partnerID=8YFLogxK
U2 - 10.1007/s10618-022-00842-3
DO - 10.1007/s10618-022-00842-3
M3 - Article
AN - SCOPUS:85132195172
SN - 1384-5810
VL - 36
SP - 1247
EP - 1278
JO - Data Mining and Knowledge Discovery
JF - Data Mining and Knowledge Discovery
IS - 4
ER -