TY - GEN
T1 - Robust dynamic programming for discounted infinite-horizon markov decision processes with uncertain stationary transition matrice
AU - Li, Baohua
AU - Si, Jennie
PY - 2007
Y1 - 2007
N2 - In this paper, finite-state, finite-action, discounted infinite-horizon- cost Markov decision processes (MDPs) with uncertain stationary transition matrices are discussed in the deterministic policy space. Uncertain stationary parametric transition matrices are clearly classified into independent and correlated cases. It is pointed out in this paper that the optimality criterion of uniform minimization of the maximum expected total discounted cost functions for all initial states, or robust uniform optimality criterion, is not appropriate for solving MDPs with correlated transition matrices. A new optimality criterion of minimizing the maximum quadratic total value function is proposed which includes the previous criterion as a special case. Based on the new optimality criterion, robust policy iteration is developed to compute an optimal policy in the deterministic stationary policy space. Under some assumptions, the solution is guaranteed to be optimal or near-optimal in the deterministic policy space.
AB - In this paper, finite-state, finite-action, discounted infinite-horizon- cost Markov decision processes (MDPs) with uncertain stationary transition matrices are discussed in the deterministic policy space. Uncertain stationary parametric transition matrices are clearly classified into independent and correlated cases. It is pointed out in this paper that the optimality criterion of uniform minimization of the maximum expected total discounted cost functions for all initial states, or robust uniform optimality criterion, is not appropriate for solving MDPs with correlated transition matrices. A new optimality criterion of minimizing the maximum quadratic total value function is proposed which includes the previous criterion as a special case. Based on the new optimality criterion, robust policy iteration is developed to compute an optimal policy in the deterministic stationary policy space. Under some assumptions, the solution is guaranteed to be optimal or near-optimal in the deterministic policy space.
UR - http://www.scopus.com/inward/record.url?scp=34548772562&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=34548772562&partnerID=8YFLogxK
U2 - 10.1109/ADPRL.2007.368175
DO - 10.1109/ADPRL.2007.368175
M3 - Conference contribution
AN - SCOPUS:34548772562
SN - 1424407060
SN - 9781424407064
T3 - Proceedings of the 2007 IEEE Symposium on Approximate Dynamic Programming and Reinforcement Learning, ADPRL 2007
SP - 96
EP - 102
BT - Proceedings of the 2007 IEEE Symposium on Approximate Dynamic Programming and Reinforcement Learning, ADPRL 2007
T2 - 2007 IEEE Symposium on Approximate Dynamic Programming and Reinforcement Learning, ADPRL 2007
Y2 - 1 April 2007 through 5 April 2007
ER -