Reinforcement Learning for POMDP: Partitioned Rollout and Policy Iteration with Application to Autonomous Sequential Repair Problems

Sushmita Bhattacharya; Sahil Badyal; Thomas Wheeler; Stephanie Gil; Dimitri Bertsekas

doi:10.1109/LRA.2020.2978451

Reinforcement Learning for POMDP: Partitioned Rollout and Policy Iteration with Application to Autonomous Sequential Repair Problems

Sushmita Bhattacharya, Sahil Badyal, Thomas Wheeler, Stephanie Gil, Dimitri Bertsekas

Engineering, Ira A. Fulton Schools of (IAFSE)

Research output: Contribution to journal › Article › peer-review

27 Scopus citations

Abstract

In this letter we consider infinite horizon discounted dynamic programming problems with finite state and control spaces, and partial state observations. We discuss an algorithm that uses multistep lookahead, truncated rollout with a known base policy, and a terminal cost function approximation. This algorithm is also used for policy improvement in an approximate policy iteration scheme, where successive policies are approximated by using a neural network classifier. A novel feature of our approach is that it is well suited for distributed computation through an extended belief space formulation and the use of a partitioned architecture, which is trained with multiple neural networks. We apply our methods in simulation to a class of sequential repair problems where a robot inspects and repairs a pipeline with potentially several rupture sites under partial information about the state of the pipeline.

Original language	English (US)
Article number	9024010
Pages (from-to)	3967-3974
Number of pages	8
Journal	IEEE Robotics and Automation Letters
Volume	5
Issue number	3
DOIs	https://doi.org/10.1109/LRA.2020.2978451
State	Published - Jul 2020

Keywords

Optimization and optimal control
autonomous agents
deep learning in robotics and automation
distributed robot systems
search and rescue robots

ASJC Scopus subject areas

Control and Systems Engineering
Biomedical Engineering
Human-Computer Interaction
Mechanical Engineering
Computer Vision and Pattern Recognition
Computer Science Applications
Control and Optimization
Artificial Intelligence

Access to Document

10.1109/LRA.2020.2978451

Cite this

@article{6cb64ddc13624399ac90204dac79b62a,

title = "Reinforcement Learning for POMDP: Partitioned Rollout and Policy Iteration with Application to Autonomous Sequential Repair Problems",

abstract = "In this letter we consider infinite horizon discounted dynamic programming problems with finite state and control spaces, and partial state observations. We discuss an algorithm that uses multistep lookahead, truncated rollout with a known base policy, and a terminal cost function approximation. This algorithm is also used for policy improvement in an approximate policy iteration scheme, where successive policies are approximated by using a neural network classifier. A novel feature of our approach is that it is well suited for distributed computation through an extended belief space formulation and the use of a partitioned architecture, which is trained with multiple neural networks. We apply our methods in simulation to a class of sequential repair problems where a robot inspects and repairs a pipeline with potentially several rupture sites under partial information about the state of the pipeline.",

keywords = "Optimization and optimal control, autonomous agents, deep learning in robotics and automation, distributed robot systems, search and rescue robots",

author = "Sushmita Bhattacharya and Sahil Badyal and Thomas Wheeler and Stephanie Gil and Dimitri Bertsekas",

note = "Funding Information: Manuscript received September 10, 2019; accepted January 23, 2020. Date of publication March 4, 2020; date of current version April 21, 2020. This letter was recommended for publication by Associate Editor G. Neumann and Editor T. Asfour upon evaluation of the reviewers{\textquoteright} comments. This work was supported by the National Science Foundation CAREER Award under Grant 1845225. (Corresponding author: Stephanie Gil.) Sushmita Bhattacharya, Sahil Badyal, and Thomas Wheeler are with the REACT Lab, Arizona State University, Tempe, AZ 85287 USA (e-mail: sbhatt55@ asu.edu; sbadyal@asu.edu; thomassw66@gmail.com). Publisher Copyright: {\textcopyright} 2016 IEEE.",

year = "2020",

month = jul,

doi = "10.1109/LRA.2020.2978451",

language = "English (US)",

volume = "5",

pages = "3967--3974",

journal = "IEEE Robotics and Automation Letters",

issn = "2377-3766",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

number = "3",

}

TY - JOUR

T1 - Reinforcement Learning for POMDP

T2 - Partitioned Rollout and Policy Iteration with Application to Autonomous Sequential Repair Problems

AU - Bhattacharya, Sushmita

AU - Badyal, Sahil

AU - Wheeler, Thomas

AU - Gil, Stephanie

AU - Bertsekas, Dimitri

N1 - Funding Information: Manuscript received September 10, 2019; accepted January 23, 2020. Date of publication March 4, 2020; date of current version April 21, 2020. This letter was recommended for publication by Associate Editor G. Neumann and Editor T. Asfour upon evaluation of the reviewers’ comments. This work was supported by the National Science Foundation CAREER Award under Grant 1845225. (Corresponding author: Stephanie Gil.) Sushmita Bhattacharya, Sahil Badyal, and Thomas Wheeler are with the REACT Lab, Arizona State University, Tempe, AZ 85287 USA (e-mail: sbhatt55@ asu.edu; sbadyal@asu.edu; thomassw66@gmail.com). Publisher Copyright: © 2016 IEEE.

PY - 2020/7

Y1 - 2020/7

N2 - In this letter we consider infinite horizon discounted dynamic programming problems with finite state and control spaces, and partial state observations. We discuss an algorithm that uses multistep lookahead, truncated rollout with a known base policy, and a terminal cost function approximation. This algorithm is also used for policy improvement in an approximate policy iteration scheme, where successive policies are approximated by using a neural network classifier. A novel feature of our approach is that it is well suited for distributed computation through an extended belief space formulation and the use of a partitioned architecture, which is trained with multiple neural networks. We apply our methods in simulation to a class of sequential repair problems where a robot inspects and repairs a pipeline with potentially several rupture sites under partial information about the state of the pipeline.

AB - In this letter we consider infinite horizon discounted dynamic programming problems with finite state and control spaces, and partial state observations. We discuss an algorithm that uses multistep lookahead, truncated rollout with a known base policy, and a terminal cost function approximation. This algorithm is also used for policy improvement in an approximate policy iteration scheme, where successive policies are approximated by using a neural network classifier. A novel feature of our approach is that it is well suited for distributed computation through an extended belief space formulation and the use of a partitioned architecture, which is trained with multiple neural networks. We apply our methods in simulation to a class of sequential repair problems where a robot inspects and repairs a pipeline with potentially several rupture sites under partial information about the state of the pipeline.

KW - Optimization and optimal control

KW - autonomous agents

KW - deep learning in robotics and automation

KW - distributed robot systems

KW - search and rescue robots

UR - http://www.scopus.com/inward/record.url?scp=85084152693&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85084152693&partnerID=8YFLogxK

U2 - 10.1109/LRA.2020.2978451

DO - 10.1109/LRA.2020.2978451

M3 - Article

AN - SCOPUS:85084152693

SN - 2377-3766

VL - 5

SP - 3967

EP - 3974

JO - IEEE Robotics and Automation Letters

JF - IEEE Robotics and Automation Letters

IS - 3

M1 - 9024010

ER -

Reinforcement Learning for POMDP: Partitioned Rollout and Policy Iteration with Application to Autonomous Sequential Repair Problems

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this