TY - JOUR
T1 - An Online Reinforcement Learning Approach for User-Optimal Parking Searching Strategy Exploiting Unique Problem Property and Network Topology
AU - Xiao, Jun
AU - Lou, Yingyan
N1 - Funding Information:
This work was supported by the National Science Foundation through the project Collaborative Research: Modeling and Analysis of Advanced Parking Management for Congestion Mitigation under Project CMMI 1363244 and the project EAGER: A Living Lab for Smartphone-Based Parking Management Services under Project CMMI 1643175
Publisher Copyright:
© 2000-2011 IEEE.
PY - 2022/7/1
Y1 - 2022/7/1
N2 - This paper investigates the idea of introducing learning algorithms into parking guidance and information systems that employ a central server, in order to provide estimated optimal parking searching strategies to travelers. The parking searching process on a network with uncertain parking availability can naturally be modeled as a Markov Decision Process (MDP). Such an MDP with full information can easily be solved by dynamic programming approaches. However, the probabilities of finding parking are difficult to define and calculate. Learning algorithms are suitable for addressing this issue. We propose an algorithm based on Q-learning, where a unique property of the parking searching MDP and the topology of the underlying transportation network are incorporated and utilized to improve its performance. This modification allows us to reduce the size of the learning problem dramatically, and thus the amount of data required to learn the optimal strategy. Numerical experiments conducted on a toy network with fixed parking probabilities show that the proposed learning algorithm outperforms the original Q-learning algorithm and three greedy heuristics in terms of the quality of the approximated optimal solution as well as the amount of training data required. Our numerical experiments on a real network with time-dependent underlying probabilities show that effective searching strategies can be achieved by the proposed algorithm, even though the learning algorithms treat the parking probabilities as constant during each exploration-exploitation cycle. The results again demonstrate that the proposed modified Q-learning algorithm significantly outperforms the original Q-learning with the same amount of training data. The results also provide insights into how the length and the split of the exploration-exploitation cycle affect the effectiveness of the proposed learning algorithm.
AB - This paper investigates the idea of introducing learning algorithms into parking guidance and information systems that employ a central server, in order to provide estimated optimal parking searching strategies to travelers. The parking searching process on a network with uncertain parking availability can naturally be modeled as a Markov Decision Process (MDP). Such an MDP with full information can easily be solved by dynamic programming approaches. However, the probabilities of finding parking are difficult to define and calculate. Learning algorithms are suitable for addressing this issue. We propose an algorithm based on Q-learning, where a unique property of the parking searching MDP and the topology of the underlying transportation network are incorporated and utilized to improve its performance. This modification allows us to reduce the size of the learning problem dramatically, and thus the amount of data required to learn the optimal strategy. Numerical experiments conducted on a toy network with fixed parking probabilities show that the proposed learning algorithm outperforms the original Q-learning algorithm and three greedy heuristics in terms of the quality of the approximated optimal solution as well as the amount of training data required. Our numerical experiments on a real network with time-dependent underlying probabilities show that effective searching strategies can be achieved by the proposed algorithm, even though the learning algorithms treat the parking probabilities as constant during each exploration-exploitation cycle. The results again demonstrate that the proposed modified Q-learning algorithm significantly outperforms the original Q-learning with the same amount of training data. The results also provide insights into how the length and the split of the exploration-exploitation cycle affect the effectiveness of the proposed learning algorithm.
KW - Markov decision process
KW - parking search strategy
KW - reinforcement learning
UR - http://www.scopus.com/inward/record.url?scp=85105846056&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85105846056&partnerID=8YFLogxK
U2 - 10.1109/TITS.2021.3076408
DO - 10.1109/TITS.2021.3076408
M3 - Article
AN - SCOPUS:85105846056
SN - 1524-9050
VL - 23
SP - 8157
EP - 8169
JO - IEEE Transactions on Intelligent Transportation Systems
JF - IEEE Transactions on Intelligent Transportation Systems
IS - 7
ER -