Robust dynamic programming for discounted infinite-horizon markov decision processes with uncertain stationary transition matrice

Baohua Li, Jennie Si

Research output: Chapter in Book/Report/Conference proceedingConference contribution

18 Scopus citations

Abstract

In this paper, finite-state, finite-action, discounted infinite-horizon- cost Markov decision processes (MDPs) with uncertain stationary transition matrices are discussed in the deterministic policy space. Uncertain stationary parametric transition matrices are clearly classified into independent and correlated cases. It is pointed out in this paper that the optimality criterion of uniform minimization of the maximum expected total discounted cost functions for all initial states, or robust uniform optimality criterion, is not appropriate for solving MDPs with correlated transition matrices. A new optimality criterion of minimizing the maximum quadratic total value function is proposed which includes the previous criterion as a special case. Based on the new optimality criterion, robust policy iteration is developed to compute an optimal policy in the deterministic stationary policy space. Under some assumptions, the solution is guaranteed to be optimal or near-optimal in the deterministic policy space.

Original languageEnglish (US)
Title of host publicationProceedings of the 2007 IEEE Symposium on Approximate Dynamic Programming and Reinforcement Learning, ADPRL 2007
Pages96-102
Number of pages7
DOIs
StatePublished - 2007
Event2007 IEEE Symposium on Approximate Dynamic Programming and Reinforcement Learning, ADPRL 2007 - Honolulu, HI, United States
Duration: Apr 1 2007Apr 5 2007

Publication series

NameProceedings of the 2007 IEEE Symposium on Approximate Dynamic Programming and Reinforcement Learning, ADPRL 2007

Other

Other2007 IEEE Symposium on Approximate Dynamic Programming and Reinforcement Learning, ADPRL 2007
Country/TerritoryUnited States
CityHonolulu, HI
Period4/1/074/5/07

ASJC Scopus subject areas

  • Computer Science Applications
  • Software

Fingerprint

Dive into the research topics of 'Robust dynamic programming for discounted infinite-horizon markov decision processes with uncertain stationary transition matrice'. Together they form a unique fingerprint.

Cite this