TY - GEN
T1 - Reward machines for cooperative multi-agent reinforcement learning
AU - Neary, Cyrus
AU - Xu, Zhe
AU - Wu, Bo
AU - Topcu, Ufuk
N1 - Funding Information:
This work was supported in part by ARO W911NF-20-1-0140, DARPA D19AP00004, and ONR N00014-18-1-2829.
Publisher Copyright:
© 2021 International Foundation for Autonomous Agents and Multiagent Systems (www.ifaamas.org). All rights reserved.
PY - 2021
Y1 - 2021
N2 - In cooperative multi-agent reinforcement learning, a collection of agents learns to interact in a shared environment to achieve a common goal. We propose the use of reward machines (RM) — Mealy machines used as structured representations of reward functions — to encode the team’s task. The proposed novel interpretation of RMs in the multi-agent setting explicitly encodes required teammate interdependencies, allowing the team-level task to be decomposed into sub-tasks for individual agents. We define such a notion of RM decomposition and present algorithmically verifiable conditions guaranteeing that distributed completion of the sub-tasks leads to team behavior accomplishing the original task. This framework for task decomposition provides a natural approach to decentralized learning: agents may learn to accomplish their sub-tasks while observing only their local state and abstracted representations of their teammates. We accordingly propose a decentralized q-learning algorithm. Furthermore, in the case of undiscounted rewards, we use local value functions to derive lower and upper bounds for the global value function corresponding to the team task. Experimental results in three discrete settings exemplify the effectiveness of the proposed RM decomposition approach, which converges to a successful team policy an order of magnitude faster than a centralized learner and significantly outperforms hierarchical and independent q-learning approaches.
AB - In cooperative multi-agent reinforcement learning, a collection of agents learns to interact in a shared environment to achieve a common goal. We propose the use of reward machines (RM) — Mealy machines used as structured representations of reward functions — to encode the team’s task. The proposed novel interpretation of RMs in the multi-agent setting explicitly encodes required teammate interdependencies, allowing the team-level task to be decomposed into sub-tasks for individual agents. We define such a notion of RM decomposition and present algorithmically verifiable conditions guaranteeing that distributed completion of the sub-tasks leads to team behavior accomplishing the original task. This framework for task decomposition provides a natural approach to decentralized learning: agents may learn to accomplish their sub-tasks while observing only their local state and abstracted representations of their teammates. We accordingly propose a decentralized q-learning algorithm. Furthermore, in the case of undiscounted rewards, we use local value functions to derive lower and upper bounds for the global value function corresponding to the team task. Experimental results in three discrete settings exemplify the effectiveness of the proposed RM decomposition approach, which converges to a successful team policy an order of magnitude faster than a centralized learner and significantly outperforms hierarchical and independent q-learning approaches.
KW - Bisimulation
KW - Decentralized multi-agent learning
KW - Discrete event systems
KW - Task decomposition
UR - http://www.scopus.com/inward/record.url?scp=85109139719&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85109139719&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85109139719
T3 - Proceedings of the International Joint Conference on Autonomous Agents and Multiagent Systems, AAMAS
SP - 934
EP - 942
BT - 20th International Conference on Autonomous Agents and Multiagent Systems, AAMAS 2021
PB - International Foundation for Autonomous Agents and Multiagent Systems (IFAAMAS)
T2 - 20th International Conference on Autonomous Agents and Multiagent Systems, AAMAS 2021
Y2 - 3 May 2021 through 7 May 2021
ER -