TY - GEN
T1 - When Shall I Estimate Your Intent? Costs and Benefits of Intent Inference in Multi-Agent Interactions
AU - Amatya, Sunny
AU - Ghimire, Mukesh
AU - Ren, Yi
AU - Xu, Zhe
AU - Zhang, Wenlong
N1 - Publisher Copyright:
© 2022 American Automatic Control Council.
PY - 2022
Y1 - 2022
N2 - This paper addresses incomplete-information dynamic games, where reward parameters of agents are private. Previous studies have shown that online belief update is necessary for deriving equilibrial policies of such games, especially for high-risk games such as vehicle interactions. However, updating beliefs in real time is computationally expensive as it requires continuous computation of Nash equilibria of the sub-games starting from the current states. In this paper, we consider the triggering mechanism of belief update as a policy defined on the agents' physical and belief states, and propose learning this policy through reinforcement learning (RL). Using a two-vehicle uncontrolled intersection case, we show that intermittent belief update via RL is sufficient for safe interactions, reducing the computation cost of updates by 59% when agents have full observations of physical states. Simulation results also show that the belief update frequency will increase as noise becomes more significant in measurements of the vehicle positions.
AB - This paper addresses incomplete-information dynamic games, where reward parameters of agents are private. Previous studies have shown that online belief update is necessary for deriving equilibrial policies of such games, especially for high-risk games such as vehicle interactions. However, updating beliefs in real time is computationally expensive as it requires continuous computation of Nash equilibria of the sub-games starting from the current states. In this paper, we consider the triggering mechanism of belief update as a policy defined on the agents' physical and belief states, and propose learning this policy through reinforcement learning (RL). Using a two-vehicle uncontrolled intersection case, we show that intermittent belief update via RL is sufficient for safe interactions, reducing the computation cost of updates by 59% when agents have full observations of physical states. Simulation results also show that the belief update frequency will increase as noise becomes more significant in measurements of the vehicle positions.
UR - http://www.scopus.com/inward/record.url?scp=85138495923&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85138495923&partnerID=8YFLogxK
U2 - 10.23919/ACC53348.2022.9867155
DO - 10.23919/ACC53348.2022.9867155
M3 - Conference contribution
AN - SCOPUS:85138495923
T3 - Proceedings of the American Control Conference
SP - 586
EP - 592
BT - 2022 American Control Conference, ACC 2022
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2022 American Control Conference, ACC 2022
Y2 - 8 June 2022 through 10 June 2022
ER -