Reinforcement Learning for POMDP: Partitioned Rollout and Policy Iteration with Application to Autonomous Sequential Repair Problems

Sushmita Bhattacharya, Sahil Badyal, Thomas Wheeler, Stephanie Gil, Dimitri Bertsekas

Research output: Contribution to journalArticlepeer-review

5 Scopus citations

Abstract

In this letter we consider infinite horizon discounted dynamic programming problems with finite state and control spaces, and partial state observations. We discuss an algorithm that uses multistep lookahead, truncated rollout with a known base policy, and a terminal cost function approximation. This algorithm is also used for policy improvement in an approximate policy iteration scheme, where successive policies are approximated by using a neural network classifier. A novel feature of our approach is that it is well suited for distributed computation through an extended belief space formulation and the use of a partitioned architecture, which is trained with multiple neural networks. We apply our methods in simulation to a class of sequential repair problems where a robot inspects and repairs a pipeline with potentially several rupture sites under partial information about the state of the pipeline.

Original languageEnglish (US)
Article number9024010
Pages (from-to)3967-3974
Number of pages8
JournalIEEE Robotics and Automation Letters
Volume5
Issue number3
DOIs
StatePublished - Jul 2020

Keywords

  • Optimization and optimal control
  • autonomous agents
  • deep learning in robotics and automation
  • distributed robot systems
  • search and rescue robots

ASJC Scopus subject areas

  • Control and Systems Engineering
  • Biomedical Engineering
  • Human-Computer Interaction
  • Mechanical Engineering
  • Computer Vision and Pattern Recognition
  • Computer Science Applications
  • Control and Optimization
  • Artificial Intelligence

Fingerprint

Dive into the research topics of 'Reinforcement Learning for POMDP: Partitioned Rollout and Policy Iteration with Application to Autonomous Sequential Repair Problems'. Together they form a unique fingerprint.

Cite this