TY - GEN
T1 - Approximate robust policy iteration for discounted infinite-horizon Markov decision processes with uncertain stationary parametric transition matrices
AU - Li, Baohua
AU - Si, Jennie
PY - 2007
Y1 - 2007
N2 - We consider Markov decision processes with finite states, finite actions, and discounted infinite-horizon cost in the deterministic policy space. State transition matrices are uncertain but with stationary parameterization. The uncertainty in transition matrices signifies realistic considerations that an accurate system model is not available for the controller design due to limitations in estimation methods and model deficiencies. Based on the quadratic total value function formulation, two approximate robust policy iterations are developed, the performance errors of which are guaranteed to be within an arbitrarily small error bound. The two approximations make use of iterative aggregation and multilayer perceptron, respectively. It is proved that the robust policy iteration based on approximation with iterative aggregation converges surely to a stationary optimal or near-optimal policy, and also that under some conditions the robust policy iteration based on approximation with multilayer perceptron converges in a probability sense to a stationary near-optimal policy. Furthermore, under some assumptions, the stationary solutions are guaranteed to be near-optimal in the deterministic policy space.
AB - We consider Markov decision processes with finite states, finite actions, and discounted infinite-horizon cost in the deterministic policy space. State transition matrices are uncertain but with stationary parameterization. The uncertainty in transition matrices signifies realistic considerations that an accurate system model is not available for the controller design due to limitations in estimation methods and model deficiencies. Based on the quadratic total value function formulation, two approximate robust policy iterations are developed, the performance errors of which are guaranteed to be within an arbitrarily small error bound. The two approximations make use of iterative aggregation and multilayer perceptron, respectively. It is proved that the robust policy iteration based on approximation with iterative aggregation converges surely to a stationary optimal or near-optimal policy, and also that under some conditions the robust policy iteration based on approximation with multilayer perceptron converges in a probability sense to a stationary near-optimal policy. Furthermore, under some assumptions, the stationary solutions are guaranteed to be near-optimal in the deterministic policy space.
UR - http://www.scopus.com/inward/record.url?scp=51749105226&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=51749105226&partnerID=8YFLogxK
U2 - 10.1109/IJCNN.2007.4371274
DO - 10.1109/IJCNN.2007.4371274
M3 - Conference contribution
AN - SCOPUS:51749105226
SN - 142441380X
SN - 9781424413805
T3 - IEEE International Conference on Neural Networks - Conference Proceedings
SP - 2052
EP - 2057
BT - The 2007 International Joint Conference on Neural Networks, IJCNN 2007 Conference Proceedings
T2 - 2007 International Joint Conference on Neural Networks, IJCNN 2007
Y2 - 12 August 2007 through 17 August 2007
ER -