Approximate robust policy iteration for discounted infinite-horizon Markov decision processes with uncertain stationary parametric transition matrices

Baohua Li; Jennie Si

doi:10.1109/IJCNN.2007.4371274

Approximate robust policy iteration for discounted infinite-horizon Markov decision processes with uncertain stationary parametric transition matrices

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Abstract

We consider Markov decision processes with finite states, finite actions, and discounted infinite-horizon cost in the deterministic policy space. State transition matrices are uncertain but with stationary parameterization. The uncertainty in transition matrices signifies realistic considerations that an accurate system model is not available for the controller design due to limitations in estimation methods and model deficiencies. Based on the quadratic total value function formulation, two approximate robust policy iterations are developed, the performance errors of which are guaranteed to be within an arbitrarily small error bound. The two approximations make use of iterative aggregation and multilayer perceptron, respectively. It is proved that the robust policy iteration based on approximation with iterative aggregation converges surely to a stationary optimal or near-optimal policy, and also that under some conditions the robust policy iteration based on approximation with multilayer perceptron converges in a probability sense to a stationary near-optimal policy. Furthermore, under some assumptions, the stationary solutions are guaranteed to be near-optimal in the deterministic policy space.

Original language	English (US)
Title of host publication	The 2007 International Joint Conference on Neural Networks, IJCNN 2007 Conference Proceedings
Pages	2052-2057
Number of pages	6
DOIs	https://doi.org/10.1109/IJCNN.2007.4371274
State	Published - 2007
Event	2007 International Joint Conference on Neural Networks, IJCNN 2007 - Orlando, FL, United States Duration: Aug 12 2007 → Aug 17 2007

Publication series

Name	IEEE International Conference on Neural Networks - Conference Proceedings
ISSN (Print)	1098-7576

Other

Other	2007 International Joint Conference on Neural Networks, IJCNN 2007
Country/Territory	United States
City	Orlando, FL
Period	8/12/07 → 8/17/07

ASJC Scopus subject areas

Software

Access to Document

10.1109/IJCNN.2007.4371274

Cite this

Li, B., & Si, J. (2007). Approximate robust policy iteration for discounted infinite-horizon Markov decision processes with uncertain stationary parametric transition matrices. In The 2007 International Joint Conference on Neural Networks, IJCNN 2007 Conference Proceedings (pp. 2052-2057). Article 4371274 (IEEE International Conference on Neural Networks - Conference Proceedings). https://doi.org/10.1109/IJCNN.2007.4371274

Approximate robust policy iteration for discounted infinite-horizon Markov decision processes with uncertain stationary parametric transition matrices. / Li, Baohua; Si, Jennie.
The 2007 International Joint Conference on Neural Networks, IJCNN 2007 Conference Proceedings. 2007. p. 2052-2057 4371274 (IEEE International Conference on Neural Networks - Conference Proceedings).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Li, B & Si, J 2007, Approximate robust policy iteration for discounted infinite-horizon Markov decision processes with uncertain stationary parametric transition matrices. in The 2007 International Joint Conference on Neural Networks, IJCNN 2007 Conference Proceedings., 4371274, IEEE International Conference on Neural Networks - Conference Proceedings, pp. 2052-2057, 2007 International Joint Conference on Neural Networks, IJCNN 2007, Orlando, FL, United States, 8/12/07. https://doi.org/10.1109/IJCNN.2007.4371274

Li B, Si J. Approximate robust policy iteration for discounted infinite-horizon Markov decision processes with uncertain stationary parametric transition matrices. In The 2007 International Joint Conference on Neural Networks, IJCNN 2007 Conference Proceedings. 2007. p. 2052-2057. 4371274. (IEEE International Conference on Neural Networks - Conference Proceedings). doi: 10.1109/IJCNN.2007.4371274

Li, Baohua ; Si, Jennie. / Approximate robust policy iteration for discounted infinite-horizon Markov decision processes with uncertain stationary parametric transition matrices. The 2007 International Joint Conference on Neural Networks, IJCNN 2007 Conference Proceedings. 2007. pp. 2052-2057 (IEEE International Conference on Neural Networks - Conference Proceedings).

@inproceedings{7234a1311bcb4a29b296c6d0cbf3549a,

title = "Approximate robust policy iteration for discounted infinite-horizon Markov decision processes with uncertain stationary parametric transition matrices",

abstract = "We consider Markov decision processes with finite states, finite actions, and discounted infinite-horizon cost in the deterministic policy space. State transition matrices are uncertain but with stationary parameterization. The uncertainty in transition matrices signifies realistic considerations that an accurate system model is not available for the controller design due to limitations in estimation methods and model deficiencies. Based on the quadratic total value function formulation, two approximate robust policy iterations are developed, the performance errors of which are guaranteed to be within an arbitrarily small error bound. The two approximations make use of iterative aggregation and multilayer perceptron, respectively. It is proved that the robust policy iteration based on approximation with iterative aggregation converges surely to a stationary optimal or near-optimal policy, and also that under some conditions the robust policy iteration based on approximation with multilayer perceptron converges in a probability sense to a stationary near-optimal policy. Furthermore, under some assumptions, the stationary solutions are guaranteed to be near-optimal in the deterministic policy space.",

author = "Baohua Li and Jennie Si",

year = "2007",

doi = "10.1109/IJCNN.2007.4371274",

language = "English (US)",

isbn = "142441380X",

series = "IEEE International Conference on Neural Networks - Conference Proceedings",

pages = "2052--2057",

booktitle = "The 2007 International Joint Conference on Neural Networks, IJCNN 2007 Conference Proceedings",

note = "2007 International Joint Conference on Neural Networks, IJCNN 2007 ; Conference date: 12-08-2007 Through 17-08-2007",

}

TY - GEN

T1 - Approximate robust policy iteration for discounted infinite-horizon Markov decision processes with uncertain stationary parametric transition matrices

AU - Li, Baohua

AU - Si, Jennie

PY - 2007

Y1 - 2007

N2 - We consider Markov decision processes with finite states, finite actions, and discounted infinite-horizon cost in the deterministic policy space. State transition matrices are uncertain but with stationary parameterization. The uncertainty in transition matrices signifies realistic considerations that an accurate system model is not available for the controller design due to limitations in estimation methods and model deficiencies. Based on the quadratic total value function formulation, two approximate robust policy iterations are developed, the performance errors of which are guaranteed to be within an arbitrarily small error bound. The two approximations make use of iterative aggregation and multilayer perceptron, respectively. It is proved that the robust policy iteration based on approximation with iterative aggregation converges surely to a stationary optimal or near-optimal policy, and also that under some conditions the robust policy iteration based on approximation with multilayer perceptron converges in a probability sense to a stationary near-optimal policy. Furthermore, under some assumptions, the stationary solutions are guaranteed to be near-optimal in the deterministic policy space.

AB - We consider Markov decision processes with finite states, finite actions, and discounted infinite-horizon cost in the deterministic policy space. State transition matrices are uncertain but with stationary parameterization. The uncertainty in transition matrices signifies realistic considerations that an accurate system model is not available for the controller design due to limitations in estimation methods and model deficiencies. Based on the quadratic total value function formulation, two approximate robust policy iterations are developed, the performance errors of which are guaranteed to be within an arbitrarily small error bound. The two approximations make use of iterative aggregation and multilayer perceptron, respectively. It is proved that the robust policy iteration based on approximation with iterative aggregation converges surely to a stationary optimal or near-optimal policy, and also that under some conditions the robust policy iteration based on approximation with multilayer perceptron converges in a probability sense to a stationary near-optimal policy. Furthermore, under some assumptions, the stationary solutions are guaranteed to be near-optimal in the deterministic policy space.

UR - http://www.scopus.com/inward/record.url?scp=51749105226&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=51749105226&partnerID=8YFLogxK

U2 - 10.1109/IJCNN.2007.4371274

DO - 10.1109/IJCNN.2007.4371274

M3 - Conference contribution

AN - SCOPUS:51749105226

SN - 142441380X

SN - 9781424413805

T3 - IEEE International Conference on Neural Networks - Conference Proceedings

SP - 2052

EP - 2057

BT - The 2007 International Joint Conference on Neural Networks, IJCNN 2007 Conference Proceedings

T2 - 2007 International Joint Conference on Neural Networks, IJCNN 2007

Y2 - 12 August 2007 through 17 August 2007

ER -

Approximate robust policy iteration for discounted infinite-horizon Markov decision processes with uncertain stationary parametric transition matrices

Abstract

Publication series

Other

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this