A performance gradient perspective on approximate dynamic programming and its application to partially observable markov decision processes

James Dankert, Lei Yang, Jennie Si

Research output: Chapter in Book/Report/Conference proceedingConference contribution

8 Scopus citations

Abstract

This paper shows an approach to integrating common approximate dynamic programming (ADP) algorithms into a theoretical framework to address both analytical characteristics and algorithmic features. Several important insights are gained from this analysis, including new approaches to the creation of algorithms. Built on this paradigm, ADP learning algorithms are further developed to address a broader class of problems: optimization with partial observability. This framework is based on an average cost formulation which makes use of the concepts of differential costs and performance gradients to describe learning and optimization algorithms. Numerical simulations are conducted including a queueing problem and a maze problem to illustrate and verify features of the proposed algorithms. Pathways for applying this analysis to adaptive critics are also shown.

Original languageEnglish (US)
Title of host publicationProceedings of the 2006 IEEE International Symposium on Intelligent Control, ISIC
Pages458-463
Number of pages6
DOIs
StatePublished - Dec 1 2006
EventJoint 2006 IEEE Conference on Control Applications (CCA), Computer-Aided Control Systems Design Symposium (CACSD) and International Symposium on Intelligent Control (ISIC) - Munich, Germany
Duration: Oct 4 2006Oct 6 2006

Publication series

NameIEEE International Symposium on Intelligent Control - Proceedings

Other

OtherJoint 2006 IEEE Conference on Control Applications (CCA), Computer-Aided Control Systems Design Symposium (CACSD) and International Symposium on Intelligent Control (ISIC)
CountryGermany
CityMunich
Period10/4/0610/6/06

ASJC Scopus subject areas

  • Control and Systems Engineering
  • Modeling and Simulation
  • Computer Science Applications
  • Electrical and Electronic Engineering

Fingerprint Dive into the research topics of 'A performance gradient perspective on approximate dynamic programming and its application to partially observable markov decision processes'. Together they form a unique fingerprint.

  • Cite this

    Dankert, J., Yang, L., & Si, J. (2006). A performance gradient perspective on approximate dynamic programming and its application to partially observable markov decision processes. In Proceedings of the 2006 IEEE International Symposium on Intelligent Control, ISIC (pp. 458-463). [4064920] (IEEE International Symposium on Intelligent Control - Proceedings). https://doi.org/10.1109/ISIC.2006.285595