TY - GEN
T1 - An analysis of gradient-based policy iteration
AU - Dankert, James
AU - Yang, Lei
AU - Si, Jennie
PY - 2005
Y1 - 2005
N2 - Recently, a system theoretic framework for learning and optimization has been developed that shows how many approximate dynamic programming paradigms such as perturbation analysis, Markov decision processes, and reinforcement learning are very closely related. Using this system theoretic framework a new optimization technique called gradient-based policy iteration (GBPI) has been developed. In this paper we will show how GBPI iteration can be extended to partially observable Markov decision processes (POMDPs). We will also develop the value iteration analogue of GBPI and show that this new version of value iteration, extended to POMDPs, not only theoretically acts like value iteration but also does so numerically.
AB - Recently, a system theoretic framework for learning and optimization has been developed that shows how many approximate dynamic programming paradigms such as perturbation analysis, Markov decision processes, and reinforcement learning are very closely related. Using this system theoretic framework a new optimization technique called gradient-based policy iteration (GBPI) has been developed. In this paper we will show how GBPI iteration can be extended to partially observable Markov decision processes (POMDPs). We will also develop the value iteration analogue of GBPI and show that this new version of value iteration, extended to POMDPs, not only theoretically acts like value iteration but also does so numerically.
UR - http://www.scopus.com/inward/record.url?scp=33750137533&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=33750137533&partnerID=8YFLogxK
U2 - 10.1109/IJCNN.2005.1556399
DO - 10.1109/IJCNN.2005.1556399
M3 - Conference contribution
AN - SCOPUS:33750137533
SN - 0780390482
SN - 9780780390485
T3 - Proceedings of the International Joint Conference on Neural Networks
SP - 2977
EP - 2982
BT - Proceedings of the International Joint Conference on Neural Networks, IJCNN 2005
T2 - International Joint Conference on Neural Networks, IJCNN 2005
Y2 - 31 July 2005 through 4 August 2005
ER -