An analysis of gradient-based policy iteration

James Dankert, Lei Yang, Jennie Si

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Recently, a system theoretic framework for learning and optimization has been developed that shows how many approximate dynamic programming paradigms such as perturbation analysis, Markov decision processes, and reinforcement learning are very closely related. Using this system theoretic framework a new optimization technique called gradient-based policy iteration (GBPI) has been developed. In this paper we will show how GBPI iteration can be extended to partially observable Markov decision processes (POMDPs). We will also develop the value iteration analogue of GBPI and show that this new version of value iteration, extended to POMDPs, not only theoretically acts like value iteration but also does so numerically.

Original languageEnglish (US)
Title of host publicationProceedings of the International Joint Conference on Neural Networks, IJCNN 2005
Pages2977-2982
Number of pages6
DOIs
StatePublished - Dec 1 2005
EventInternational Joint Conference on Neural Networks, IJCNN 2005 - Montreal, QC, Canada
Duration: Jul 31 2005Aug 4 2005

Publication series

NameProceedings of the International Joint Conference on Neural Networks
Volume5

Other

OtherInternational Joint Conference on Neural Networks, IJCNN 2005
CountryCanada
CityMontreal, QC
Period7/31/058/4/05

ASJC Scopus subject areas

  • Software
  • Artificial Intelligence

Fingerprint Dive into the research topics of 'An analysis of gradient-based policy iteration'. Together they form a unique fingerprint.

  • Cite this

    Dankert, J., Yang, L., & Si, J. (2005). An analysis of gradient-based policy iteration. In Proceedings of the International Joint Conference on Neural Networks, IJCNN 2005 (pp. 2977-2982). [1556399] (Proceedings of the International Joint Conference on Neural Networks; Vol. 5). https://doi.org/10.1109/IJCNN.2005.1556399