TY - JOUR
T1 - Approximate robust policy iteration using multilayer perceptron neural networks for discounted infinite-horizon markov decision processes with uncertain correlated transition matrices
AU - Li, Baohua
AU - Si, Jennie
N1 - Funding Information:
Manuscript received July 5, 2009; revised November 22, 2009, March 2, 2010, and May 6, 2010. Date of publication July 1, 2010; date of current version August 6, 2010. This work was supported in part by the National Science Foundation, under Grants ECS-0401405 and ECS-0702057, and by the National Science Foundation of China, under Grant 50 828 701.
PY - 2010/8
Y1 - 2010/8
N2 - We study finite-state, finite-action, discounted infinite-horizon Markov decision processes with uncertain correlated transition matrices in deterministic policy spaces. Existing robust dynamic programming methods cannot be extended to solving this class of general problems. In this paper, based on a robust optimality criterion, an approximate robust policy iteration using a multilayer perceptron neural network is proposed. It is proven that the proposed algorithm converges in finite iterations, and it converges to a stationary optimal or near-optimal policy in a probability sense. In addition, we point out that sometimes even a direct enumeration may not be applicable to addressing this class of problems. However, a direct enumeration based on our proposed maximum value approximation over the parameter space is a feasible approach. We provide further analysis to show that our proposed algorithm is more efficient than such an enumeration method for various scenarios.
AB - We study finite-state, finite-action, discounted infinite-horizon Markov decision processes with uncertain correlated transition matrices in deterministic policy spaces. Existing robust dynamic programming methods cannot be extended to solving this class of general problems. In this paper, based on a robust optimality criterion, an approximate robust policy iteration using a multilayer perceptron neural network is proposed. It is proven that the proposed algorithm converges in finite iterations, and it converges to a stationary optimal or near-optimal policy in a probability sense. In addition, we point out that sometimes even a direct enumeration may not be applicable to addressing this class of problems. However, a direct enumeration based on our proposed maximum value approximation over the parameter space is a feasible approach. We provide further analysis to show that our proposed algorithm is more efficient than such an enumeration method for various scenarios.
KW - Approximate dynamic programming
KW - Markov decision processes (MDP)
KW - multilayer perceptrons
KW - uncertain transition matrix
UR - http://www.scopus.com/inward/record.url?scp=77955513754&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=77955513754&partnerID=8YFLogxK
U2 - 10.1109/TNN.2010.2050334
DO - 10.1109/TNN.2010.2050334
M3 - Article
C2 - 20601311
AN - SCOPUS:77955513754
SN - 1045-9227
VL - 21
SP - 1270
EP - 1280
JO - IEEE Transactions on Neural Networks
JF - IEEE Transactions on Neural Networks
IS - 8
M1 - 5499042
ER -