Approximate robust policy iteration using multilayer perceptron neural networks for discounted infinite-horizon markov decision processes with uncertain correlated transition matrices

Baohua Li, Jennie Si

Research output: Contribution to journalArticle

14 Scopus citations

Abstract

We study finite-state, finite-action, discounted infinite-horizon Markov decision processes with uncertain correlated transition matrices in deterministic policy spaces. Existing robust dynamic programming methods cannot be extended to solving this class of general problems. In this paper, based on a robust optimality criterion, an approximate robust policy iteration using a multilayer perceptron neural network is proposed. It is proven that the proposed algorithm converges in finite iterations, and it converges to a stationary optimal or near-optimal policy in a probability sense. In addition, we point out that sometimes even a direct enumeration may not be applicable to addressing this class of problems. However, a direct enumeration based on our proposed maximum value approximation over the parameter space is a feasible approach. We provide further analysis to show that our proposed algorithm is more efficient than such an enumeration method for various scenarios.

Original languageEnglish (US)
Article number5499042
Pages (from-to)1270-1280
Number of pages11
JournalIEEE Transactions on Neural Networks
Volume21
Issue number8
DOIs
StatePublished - Aug 1 2010

Keywords

  • Approximate dynamic programming
  • Markov decision processes (MDP)
  • multilayer perceptrons
  • uncertain transition matrix

ASJC Scopus subject areas

  • Software
  • Computer Science Applications
  • Computer Networks and Communications
  • Artificial Intelligence

Fingerprint Dive into the research topics of 'Approximate robust policy iteration using multilayer perceptron neural networks for discounted infinite-horizon markov decision processes with uncertain correlated transition matrices'. Together they form a unique fingerprint.

  • Cite this