TY - JOUR
T1 - Reinforcement Learning for POMDP
T2 - Partitioned Rollout and Policy Iteration with Application to Autonomous Sequential Repair Problems
AU - Bhattacharya, Sushmita
AU - Badyal, Sahil
AU - Wheeler, Thomas
AU - Gil, Stephanie
AU - Bertsekas, Dimitri
N1 - Funding Information:
Manuscript received September 10, 2019; accepted January 23, 2020. Date of publication March 4, 2020; date of current version April 21, 2020. This letter was recommended for publication by Associate Editor G. Neumann and Editor T. Asfour upon evaluation of the reviewers’ comments. This work was supported by the National Science Foundation CAREER Award under Grant 1845225. (Corresponding author: Stephanie Gil.) Sushmita Bhattacharya, Sahil Badyal, and Thomas Wheeler are with the REACT Lab, Arizona State University, Tempe, AZ 85287 USA (e-mail: sbhatt55@ asu.edu; sbadyal@asu.edu; thomassw66@gmail.com).
Publisher Copyright:
© 2016 IEEE.
PY - 2020/7
Y1 - 2020/7
N2 - In this letter we consider infinite horizon discounted dynamic programming problems with finite state and control spaces, and partial state observations. We discuss an algorithm that uses multistep lookahead, truncated rollout with a known base policy, and a terminal cost function approximation. This algorithm is also used for policy improvement in an approximate policy iteration scheme, where successive policies are approximated by using a neural network classifier. A novel feature of our approach is that it is well suited for distributed computation through an extended belief space formulation and the use of a partitioned architecture, which is trained with multiple neural networks. We apply our methods in simulation to a class of sequential repair problems where a robot inspects and repairs a pipeline with potentially several rupture sites under partial information about the state of the pipeline.
AB - In this letter we consider infinite horizon discounted dynamic programming problems with finite state and control spaces, and partial state observations. We discuss an algorithm that uses multistep lookahead, truncated rollout with a known base policy, and a terminal cost function approximation. This algorithm is also used for policy improvement in an approximate policy iteration scheme, where successive policies are approximated by using a neural network classifier. A novel feature of our approach is that it is well suited for distributed computation through an extended belief space formulation and the use of a partitioned architecture, which is trained with multiple neural networks. We apply our methods in simulation to a class of sequential repair problems where a robot inspects and repairs a pipeline with potentially several rupture sites under partial information about the state of the pipeline.
KW - Optimization and optimal control
KW - autonomous agents
KW - deep learning in robotics and automation
KW - distributed robot systems
KW - search and rescue robots
UR - http://www.scopus.com/inward/record.url?scp=85084152693&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85084152693&partnerID=8YFLogxK
U2 - 10.1109/LRA.2020.2978451
DO - 10.1109/LRA.2020.2978451
M3 - Article
AN - SCOPUS:85084152693
SN - 2377-3766
VL - 5
SP - 3967
EP - 3974
JO - IEEE Robotics and Automation Letters
JF - IEEE Robotics and Automation Letters
IS - 3
M1 - 9024010
ER -