Robust dynamic programming for discounted infinite-horizon markov decision processes with uncertain stationary transition matrice

Baohua Li; Jennie Si

doi:10.1109/ADPRL.2007.368175

Robust dynamic programming for discounted infinite-horizon markov decision processes with uncertain stationary transition matrice

Baohua Li, Jennie Si

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

18 Scopus citations

Abstract

In this paper, finite-state, finite-action, discounted infinite-horizon- cost Markov decision processes (MDPs) with uncertain stationary transition matrices are discussed in the deterministic policy space. Uncertain stationary parametric transition matrices are clearly classified into independent and correlated cases. It is pointed out in this paper that the optimality criterion of uniform minimization of the maximum expected total discounted cost functions for all initial states, or robust uniform optimality criterion, is not appropriate for solving MDPs with correlated transition matrices. A new optimality criterion of minimizing the maximum quadratic total value function is proposed which includes the previous criterion as a special case. Based on the new optimality criterion, robust policy iteration is developed to compute an optimal policy in the deterministic stationary policy space. Under some assumptions, the solution is guaranteed to be optimal or near-optimal in the deterministic policy space.

Original language	English (US)
Title of host publication	Proceedings of the 2007 IEEE Symposium on Approximate Dynamic Programming and Reinforcement Learning, ADPRL 2007
Pages	96-102
Number of pages	7
DOIs	https://doi.org/10.1109/ADPRL.2007.368175
State	Published - 2007
Event	2007 IEEE Symposium on Approximate Dynamic Programming and Reinforcement Learning, ADPRL 2007 - Honolulu, HI, United States Duration: Apr 1 2007 → Apr 5 2007

Publication series

Name	Proceedings of the 2007 IEEE Symposium on Approximate Dynamic Programming and Reinforcement Learning, ADPRL 2007

Other

Other	2007 IEEE Symposium on Approximate Dynamic Programming and Reinforcement Learning, ADPRL 2007
Country/Territory	United States
City	Honolulu, HI
Period	4/1/07 → 4/5/07

ASJC Scopus subject areas

Computer Science Applications
Software

Access to Document

10.1109/ADPRL.2007.368175

Cite this

Li, B., & Si, J. (2007). Robust dynamic programming for discounted infinite-horizon markov decision processes with uncertain stationary transition matrice. In Proceedings of the 2007 IEEE Symposium on Approximate Dynamic Programming and Reinforcement Learning, ADPRL 2007 (pp. 96-102). Article 4220820 (Proceedings of the 2007 IEEE Symposium on Approximate Dynamic Programming and Reinforcement Learning, ADPRL 2007). https://doi.org/10.1109/ADPRL.2007.368175

Robust dynamic programming for discounted infinite-horizon markov decision processes with uncertain stationary transition matrice. / Li, Baohua; Si, Jennie.
Proceedings of the 2007 IEEE Symposium on Approximate Dynamic Programming and Reinforcement Learning, ADPRL 2007. 2007. p. 96-102 4220820 (Proceedings of the 2007 IEEE Symposium on Approximate Dynamic Programming and Reinforcement Learning, ADPRL 2007).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Li, B & Si, J 2007, Robust dynamic programming for discounted infinite-horizon markov decision processes with uncertain stationary transition matrice. in Proceedings of the 2007 IEEE Symposium on Approximate Dynamic Programming and Reinforcement Learning, ADPRL 2007., 4220820, Proceedings of the 2007 IEEE Symposium on Approximate Dynamic Programming and Reinforcement Learning, ADPRL 2007, pp. 96-102, 2007 IEEE Symposium on Approximate Dynamic Programming and Reinforcement Learning, ADPRL 2007, Honolulu, HI, United States, 4/1/07. https://doi.org/10.1109/ADPRL.2007.368175

Li B, Si J. Robust dynamic programming for discounted infinite-horizon markov decision processes with uncertain stationary transition matrice. In Proceedings of the 2007 IEEE Symposium on Approximate Dynamic Programming and Reinforcement Learning, ADPRL 2007. 2007. p. 96-102. 4220820. (Proceedings of the 2007 IEEE Symposium on Approximate Dynamic Programming and Reinforcement Learning, ADPRL 2007). doi: 10.1109/ADPRL.2007.368175

Li, Baohua ; Si, Jennie. / Robust dynamic programming for discounted infinite-horizon markov decision processes with uncertain stationary transition matrice. Proceedings of the 2007 IEEE Symposium on Approximate Dynamic Programming and Reinforcement Learning, ADPRL 2007. 2007. pp. 96-102 (Proceedings of the 2007 IEEE Symposium on Approximate Dynamic Programming and Reinforcement Learning, ADPRL 2007).

@inproceedings{6bc4050d8e5b429cbba7329483e62df5,

title = "Robust dynamic programming for discounted infinite-horizon markov decision processes with uncertain stationary transition matrice",

abstract = "In this paper, finite-state, finite-action, discounted infinite-horizon- cost Markov decision processes (MDPs) with uncertain stationary transition matrices are discussed in the deterministic policy space. Uncertain stationary parametric transition matrices are clearly classified into independent and correlated cases. It is pointed out in this paper that the optimality criterion of uniform minimization of the maximum expected total discounted cost functions for all initial states, or robust uniform optimality criterion, is not appropriate for solving MDPs with correlated transition matrices. A new optimality criterion of minimizing the maximum quadratic total value function is proposed which includes the previous criterion as a special case. Based on the new optimality criterion, robust policy iteration is developed to compute an optimal policy in the deterministic stationary policy space. Under some assumptions, the solution is guaranteed to be optimal or near-optimal in the deterministic policy space.",

author = "Baohua Li and Jennie Si",

year = "2007",

doi = "10.1109/ADPRL.2007.368175",

language = "English (US)",

isbn = "1424407060",

series = "Proceedings of the 2007 IEEE Symposium on Approximate Dynamic Programming and Reinforcement Learning, ADPRL 2007",

pages = "96--102",

booktitle = "Proceedings of the 2007 IEEE Symposium on Approximate Dynamic Programming and Reinforcement Learning, ADPRL 2007",

note = "2007 IEEE Symposium on Approximate Dynamic Programming and Reinforcement Learning, ADPRL 2007 ; Conference date: 01-04-2007 Through 05-04-2007",

}

TY - GEN

T1 - Robust dynamic programming for discounted infinite-horizon markov decision processes with uncertain stationary transition matrice

AU - Li, Baohua

AU - Si, Jennie

PY - 2007

Y1 - 2007

N2 - In this paper, finite-state, finite-action, discounted infinite-horizon- cost Markov decision processes (MDPs) with uncertain stationary transition matrices are discussed in the deterministic policy space. Uncertain stationary parametric transition matrices are clearly classified into independent and correlated cases. It is pointed out in this paper that the optimality criterion of uniform minimization of the maximum expected total discounted cost functions for all initial states, or robust uniform optimality criterion, is not appropriate for solving MDPs with correlated transition matrices. A new optimality criterion of minimizing the maximum quadratic total value function is proposed which includes the previous criterion as a special case. Based on the new optimality criterion, robust policy iteration is developed to compute an optimal policy in the deterministic stationary policy space. Under some assumptions, the solution is guaranteed to be optimal or near-optimal in the deterministic policy space.

AB - In this paper, finite-state, finite-action, discounted infinite-horizon- cost Markov decision processes (MDPs) with uncertain stationary transition matrices are discussed in the deterministic policy space. Uncertain stationary parametric transition matrices are clearly classified into independent and correlated cases. It is pointed out in this paper that the optimality criterion of uniform minimization of the maximum expected total discounted cost functions for all initial states, or robust uniform optimality criterion, is not appropriate for solving MDPs with correlated transition matrices. A new optimality criterion of minimizing the maximum quadratic total value function is proposed which includes the previous criterion as a special case. Based on the new optimality criterion, robust policy iteration is developed to compute an optimal policy in the deterministic stationary policy space. Under some assumptions, the solution is guaranteed to be optimal or near-optimal in the deterministic policy space.

UR - http://www.scopus.com/inward/record.url?scp=34548772562&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=34548772562&partnerID=8YFLogxK

U2 - 10.1109/ADPRL.2007.368175

DO - 10.1109/ADPRL.2007.368175

M3 - Conference contribution

AN - SCOPUS:34548772562

SN - 1424407060

SN - 9781424407064

T3 - Proceedings of the 2007 IEEE Symposium on Approximate Dynamic Programming and Reinforcement Learning, ADPRL 2007

SP - 96

EP - 102

BT - Proceedings of the 2007 IEEE Symposium on Approximate Dynamic Programming and Reinforcement Learning, ADPRL 2007

T2 - 2007 IEEE Symposium on Approximate Dynamic Programming and Reinforcement Learning, ADPRL 2007

Y2 - 1 April 2007 through 5 April 2007

ER -

Robust dynamic programming for discounted infinite-horizon markov decision processes with uncertain stationary transition matrice

Abstract

Publication series

Other

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this