TY - JOUR
T1 - 3M-RL
T2 - Multi-Resolution, Multi-Agent, Mean-Field Reinforcement Learning for Autonomous UAV Routing
AU - Wang, Weichang
AU - Liu, Yongming
AU - Srikant, Rayadurgam
AU - Ying, Lei
N1 - Publisher Copyright:
© 2000-2011 IEEE.
PY - 2022/7/1
Y1 - 2022/7/1
N2 - Collision-free path planning is a major challenge in managing unmanned aerial vehicles (UAVs) fleets, especially in uncertain environments. In this paper, we consider the design of UAV routing policies using multi-agent reinforcement learning, and propose a Multi-resolution, Multi-agent, Mean-field reinforcement learning algorithm, named 3M-RL, for flight planning, where multiple vehicles need to avoid collisions with each other while moving towards their destinations. In the system we consider, each UAV makes decisions based on local observations, and does not communicate with other UAVs. The algorithm trains a routing policy using an Actor-Critic neural network with multi-resolution observations, including detailed local information and aggregated global information based on mean-field. The algorithm tackles the curse-of-dimensionality problem in multi-agent reinforcement learning and provides a scalable solution. We test our algorithm in different complex scenarios in both 2D and 3D space and our simulation results show that 3M-RL result in good routing policies.
AB - Collision-free path planning is a major challenge in managing unmanned aerial vehicles (UAVs) fleets, especially in uncertain environments. In this paper, we consider the design of UAV routing policies using multi-agent reinforcement learning, and propose a Multi-resolution, Multi-agent, Mean-field reinforcement learning algorithm, named 3M-RL, for flight planning, where multiple vehicles need to avoid collisions with each other while moving towards their destinations. In the system we consider, each UAV makes decisions based on local observations, and does not communicate with other UAVs. The algorithm trains a routing policy using an Actor-Critic neural network with multi-resolution observations, including detailed local information and aggregated global information based on mean-field. The algorithm tackles the curse-of-dimensionality problem in multi-agent reinforcement learning and provides a scalable solution. We test our algorithm in different complex scenarios in both 2D and 3D space and our simulation results show that 3M-RL result in good routing policies.
KW - Multiagent reinforcement learning
KW - actor-critic
KW - mean-field
UR - http://www.scopus.com/inward/record.url?scp=85113205851&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85113205851&partnerID=8YFLogxK
U2 - 10.1109/TITS.2021.3089120
DO - 10.1109/TITS.2021.3089120
M3 - Article
AN - SCOPUS:85113205851
SN - 1524-9050
VL - 23
SP - 8985
EP - 8996
JO - IEEE Transactions on Intelligent Transportation Systems
JF - IEEE Transactions on Intelligent Transportation Systems
IS - 7
ER -