Reward machines for cooperative multi-agent reinforcement learning

Cyrus Neary; Zhe Xu; Bo Wu; Ufuk Topcu

Reward machines for cooperative multi-agent reinforcement learning

Cyrus Neary, Zhe Xu, Bo Wu, Ufuk Topcu

Mechanical and Aerospace Engineering

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

12 Scopus citations

Abstract

In cooperative multi-agent reinforcement learning, a collection of agents learns to interact in a shared environment to achieve a common goal. We propose the use of reward machines (RM) — Mealy machines used as structured representations of reward functions — to encode the team’s task. The proposed novel interpretation of RMs in the multi-agent setting explicitly encodes required teammate interdependencies, allowing the team-level task to be decomposed into sub-tasks for individual agents. We define such a notion of RM decomposition and present algorithmically verifiable conditions guaranteeing that distributed completion of the sub-tasks leads to team behavior accomplishing the original task. This framework for task decomposition provides a natural approach to decentralized learning: agents may learn to accomplish their sub-tasks while observing only their local state and abstracted representations of their teammates. We accordingly propose a decentralized q-learning algorithm. Furthermore, in the case of undiscounted rewards, we use local value functions to derive lower and upper bounds for the global value function corresponding to the team task. Experimental results in three discrete settings exemplify the effectiveness of the proposed RM decomposition approach, which converges to a successful team policy an order of magnitude faster than a centralized learner and significantly outperforms hierarchical and independent q-learning approaches.

Original language	English (US)
Title of host publication	20th International Conference on Autonomous Agents and Multiagent Systems, AAMAS 2021
Publisher	International Foundation for Autonomous Agents and Multiagent Systems (IFAAMAS)
Pages	934-942
Number of pages	9
ISBN (Electronic)	9781713832621
State	Published - 2021
Event	20th International Conference on Autonomous Agents and Multiagent Systems, AAMAS 2021 - Virtual, Online Duration: May 3 2021 → May 7 2021

Publication series

Name	Proceedings of the International Joint Conference on Autonomous Agents and Multiagent Systems, AAMAS
Volume	2
ISSN (Print)	1548-8403
ISSN (Electronic)	1558-2914

Conference

Conference	20th International Conference on Autonomous Agents and Multiagent Systems, AAMAS 2021
City	Virtual, Online
Period	5/3/21 → 5/7/21

Keywords

Bisimulation
Decentralized multi-agent learning
Discrete event systems
Task decomposition

ASJC Scopus subject areas

Artificial Intelligence
Software
Control and Systems Engineering

Cite this

Neary, C., Xu, Z., Wu, B., & Topcu, U. (2021). Reward machines for cooperative multi-agent reinforcement learning. In 20th International Conference on Autonomous Agents and Multiagent Systems, AAMAS 2021 (pp. 934-942). (Proceedings of the International Joint Conference on Autonomous Agents and Multiagent Systems, AAMAS; Vol. 2). International Foundation for Autonomous Agents and Multiagent Systems (IFAAMAS).

Reward machines for cooperative multi-agent reinforcement learning. / Neary, Cyrus; Xu, Zhe; Wu, Bo et al.
20th International Conference on Autonomous Agents and Multiagent Systems, AAMAS 2021. International Foundation for Autonomous Agents and Multiagent Systems (IFAAMAS), 2021. p. 934-942 (Proceedings of the International Joint Conference on Autonomous Agents and Multiagent Systems, AAMAS; Vol. 2).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Neary, C, Xu, Z, Wu, B & Topcu, U 2021, Reward machines for cooperative multi-agent reinforcement learning. in 20th International Conference on Autonomous Agents and Multiagent Systems, AAMAS 2021. Proceedings of the International Joint Conference on Autonomous Agents and Multiagent Systems, AAMAS, vol. 2, International Foundation for Autonomous Agents and Multiagent Systems (IFAAMAS), pp. 934-942, 20th International Conference on Autonomous Agents and Multiagent Systems, AAMAS 2021, Virtual, Online, 5/3/21.

Neary C, Xu Z, Wu B, Topcu U. Reward machines for cooperative multi-agent reinforcement learning. In 20th International Conference on Autonomous Agents and Multiagent Systems, AAMAS 2021. International Foundation for Autonomous Agents and Multiagent Systems (IFAAMAS). 2021. p. 934-942. (Proceedings of the International Joint Conference on Autonomous Agents and Multiagent Systems, AAMAS).

Neary, Cyrus ; Xu, Zhe ; Wu, Bo et al. / Reward machines for cooperative multi-agent reinforcement learning. 20th International Conference on Autonomous Agents and Multiagent Systems, AAMAS 2021. International Foundation for Autonomous Agents and Multiagent Systems (IFAAMAS), 2021. pp. 934-942 (Proceedings of the International Joint Conference on Autonomous Agents and Multiagent Systems, AAMAS).

@inproceedings{d5d8f21c9ab14c13ba922eebda145d19,

title = "Reward machines for cooperative multi-agent reinforcement learning",

abstract = "In cooperative multi-agent reinforcement learning, a collection of agents learns to interact in a shared environment to achieve a common goal. We propose the use of reward machines (RM) — Mealy machines used as structured representations of reward functions — to encode the team{\textquoteright}s task. The proposed novel interpretation of RMs in the multi-agent setting explicitly encodes required teammate interdependencies, allowing the team-level task to be decomposed into sub-tasks for individual agents. We define such a notion of RM decomposition and present algorithmically verifiable conditions guaranteeing that distributed completion of the sub-tasks leads to team behavior accomplishing the original task. This framework for task decomposition provides a natural approach to decentralized learning: agents may learn to accomplish their sub-tasks while observing only their local state and abstracted representations of their teammates. We accordingly propose a decentralized q-learning algorithm. Furthermore, in the case of undiscounted rewards, we use local value functions to derive lower and upper bounds for the global value function corresponding to the team task. Experimental results in three discrete settings exemplify the effectiveness of the proposed RM decomposition approach, which converges to a successful team policy an order of magnitude faster than a centralized learner and significantly outperforms hierarchical and independent q-learning approaches.",

keywords = "Bisimulation, Decentralized multi-agent learning, Discrete event systems, Task decomposition",

author = "Cyrus Neary and Zhe Xu and Bo Wu and Ufuk Topcu",

note = "Funding Information: This work was supported in part by ARO W911NF-20-1-0140, DARPA D19AP00004, and ONR N00014-18-1-2829. Publisher Copyright: {\textcopyright} 2021 International Foundation for Autonomous Agents and Multiagent Systems (www.ifaamas.org). All rights reserved.; 20th International Conference on Autonomous Agents and Multiagent Systems, AAMAS 2021 ; Conference date: 03-05-2021 Through 07-05-2021",

year = "2021",

language = "English (US)",

series = "Proceedings of the International Joint Conference on Autonomous Agents and Multiagent Systems, AAMAS",

publisher = "International Foundation for Autonomous Agents and Multiagent Systems (IFAAMAS)",

pages = "934--942",

booktitle = "20th International Conference on Autonomous Agents and Multiagent Systems, AAMAS 2021",

}

TY - GEN

T1 - Reward machines for cooperative multi-agent reinforcement learning

AU - Neary, Cyrus

AU - Xu, Zhe

AU - Wu, Bo

AU - Topcu, Ufuk

N1 - Funding Information: This work was supported in part by ARO W911NF-20-1-0140, DARPA D19AP00004, and ONR N00014-18-1-2829. Publisher Copyright: © 2021 International Foundation for Autonomous Agents and Multiagent Systems (www.ifaamas.org). All rights reserved.

PY - 2021

Y1 - 2021

N2 - In cooperative multi-agent reinforcement learning, a collection of agents learns to interact in a shared environment to achieve a common goal. We propose the use of reward machines (RM) — Mealy machines used as structured representations of reward functions — to encode the team’s task. The proposed novel interpretation of RMs in the multi-agent setting explicitly encodes required teammate interdependencies, allowing the team-level task to be decomposed into sub-tasks for individual agents. We define such a notion of RM decomposition and present algorithmically verifiable conditions guaranteeing that distributed completion of the sub-tasks leads to team behavior accomplishing the original task. This framework for task decomposition provides a natural approach to decentralized learning: agents may learn to accomplish their sub-tasks while observing only their local state and abstracted representations of their teammates. We accordingly propose a decentralized q-learning algorithm. Furthermore, in the case of undiscounted rewards, we use local value functions to derive lower and upper bounds for the global value function corresponding to the team task. Experimental results in three discrete settings exemplify the effectiveness of the proposed RM decomposition approach, which converges to a successful team policy an order of magnitude faster than a centralized learner and significantly outperforms hierarchical and independent q-learning approaches.

AB - In cooperative multi-agent reinforcement learning, a collection of agents learns to interact in a shared environment to achieve a common goal. We propose the use of reward machines (RM) — Mealy machines used as structured representations of reward functions — to encode the team’s task. The proposed novel interpretation of RMs in the multi-agent setting explicitly encodes required teammate interdependencies, allowing the team-level task to be decomposed into sub-tasks for individual agents. We define such a notion of RM decomposition and present algorithmically verifiable conditions guaranteeing that distributed completion of the sub-tasks leads to team behavior accomplishing the original task. This framework for task decomposition provides a natural approach to decentralized learning: agents may learn to accomplish their sub-tasks while observing only their local state and abstracted representations of their teammates. We accordingly propose a decentralized q-learning algorithm. Furthermore, in the case of undiscounted rewards, we use local value functions to derive lower and upper bounds for the global value function corresponding to the team task. Experimental results in three discrete settings exemplify the effectiveness of the proposed RM decomposition approach, which converges to a successful team policy an order of magnitude faster than a centralized learner and significantly outperforms hierarchical and independent q-learning approaches.

KW - Bisimulation

KW - Decentralized multi-agent learning

KW - Discrete event systems

KW - Task decomposition

UR - http://www.scopus.com/inward/record.url?scp=85109139719&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85109139719&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:85109139719

T3 - Proceedings of the International Joint Conference on Autonomous Agents and Multiagent Systems, AAMAS

SP - 934

EP - 942

BT - 20th International Conference on Autonomous Agents and Multiagent Systems, AAMAS 2021

PB - International Foundation for Autonomous Agents and Multiagent Systems (IFAAMAS)

T2 - 20th International Conference on Autonomous Agents and Multiagent Systems, AAMAS 2021

Y2 - 3 May 2021 through 7 May 2021

ER -

Reward machines for cooperative multi-agent reinforcement learning

Abstract

Publication series

Conference

Keywords

ASJC Scopus subject areas

Other files and links

Fingerprint

Cite this