A performance gradient perspective on approximate dynamic programming and its application to partially observable markov decision processes

James Dankert, Lei Yang, Jennie Si

Research output: Chapter in Book/Report/Conference proceedingConference contribution

8 Citations (Scopus)

Abstract

This paper shows an approach to integrating common approximate dynamic programming (ADP) algorithms into a theoretical framework to address both analytical characteristics and algorithmic features. Several important insights are gained from this analysis, including new approaches to the creation of algorithms. Built on this paradigm, ADP learning algorithms are further developed to address a broader class of problems: optimization with partial observability. This framework is based on an average cost formulation which makes use of the concepts of differential costs and performance gradients to describe learning and optimization algorithms. Numerical simulations are conducted including a queueing problem and a maze problem to illustrate and verify features of the proposed algorithms. Pathways for applying this analysis to adaptive critics are also shown.

Original languageEnglish (US)
Title of host publicationIEEE International Symposium on Intelligent Control - Proceedings
Pages458-463
Number of pages6
DOIs
StatePublished - 2006
EventJoint 2006 IEEE Conference on Control Applications (CCA), Computer-Aided Control Systems Design Symposium (CACSD) and International Symposium on Intelligent Control (ISIC) - Munich, Germany
Duration: Oct 4 2006Oct 6 2006

Other

OtherJoint 2006 IEEE Conference on Control Applications (CCA), Computer-Aided Control Systems Design Symposium (CACSD) and International Symposium on Intelligent Control (ISIC)
CountryGermany
CityMunich
Period10/4/0610/6/06

Fingerprint

Approximate Dynamic Programming
Partially Observable Markov Decision Process
Dynamic programming
Gradient
Learning Algorithm
Average Cost
Queueing
Observability
Pathway
Optimization Algorithm
Paradigm
Learning algorithms
Verify
Optimization Problem
Partial
Costs
Numerical Simulation
Formulation
Computer simulation
Framework

ASJC Scopus subject areas

  • Computer Science Applications
  • Control and Systems Engineering
  • Electrical and Electronic Engineering
  • Modeling and Simulation

Cite this

Dankert, J., Yang, L., & Si, J. (2006). A performance gradient perspective on approximate dynamic programming and its application to partially observable markov decision processes. In IEEE International Symposium on Intelligent Control - Proceedings (pp. 458-463). [4064920] https://doi.org/10.1109/ISIC.2006.285595

A performance gradient perspective on approximate dynamic programming and its application to partially observable markov decision processes. / Dankert, James; Yang, Lei; Si, Jennie.

IEEE International Symposium on Intelligent Control - Proceedings. 2006. p. 458-463 4064920.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Dankert, J, Yang, L & Si, J 2006, A performance gradient perspective on approximate dynamic programming and its application to partially observable markov decision processes. in IEEE International Symposium on Intelligent Control - Proceedings., 4064920, pp. 458-463, Joint 2006 IEEE Conference on Control Applications (CCA), Computer-Aided Control Systems Design Symposium (CACSD) and International Symposium on Intelligent Control (ISIC), Munich, Germany, 10/4/06. https://doi.org/10.1109/ISIC.2006.285595
Dankert, James ; Yang, Lei ; Si, Jennie. / A performance gradient perspective on approximate dynamic programming and its application to partially observable markov decision processes. IEEE International Symposium on Intelligent Control - Proceedings. 2006. pp. 458-463
@inproceedings{122ef57fe92940f3ba9375e57e670239,
title = "A performance gradient perspective on approximate dynamic programming and its application to partially observable markov decision processes",
abstract = "This paper shows an approach to integrating common approximate dynamic programming (ADP) algorithms into a theoretical framework to address both analytical characteristics and algorithmic features. Several important insights are gained from this analysis, including new approaches to the creation of algorithms. Built on this paradigm, ADP learning algorithms are further developed to address a broader class of problems: optimization with partial observability. This framework is based on an average cost formulation which makes use of the concepts of differential costs and performance gradients to describe learning and optimization algorithms. Numerical simulations are conducted including a queueing problem and a maze problem to illustrate and verify features of the proposed algorithms. Pathways for applying this analysis to adaptive critics are also shown.",
author = "James Dankert and Lei Yang and Jennie Si",
year = "2006",
doi = "10.1109/ISIC.2006.285595",
language = "English (US)",
isbn = "0780397983",
pages = "458--463",
booktitle = "IEEE International Symposium on Intelligent Control - Proceedings",

}

TY - GEN

T1 - A performance gradient perspective on approximate dynamic programming and its application to partially observable markov decision processes

AU - Dankert, James

AU - Yang, Lei

AU - Si, Jennie

PY - 2006

Y1 - 2006

N2 - This paper shows an approach to integrating common approximate dynamic programming (ADP) algorithms into a theoretical framework to address both analytical characteristics and algorithmic features. Several important insights are gained from this analysis, including new approaches to the creation of algorithms. Built on this paradigm, ADP learning algorithms are further developed to address a broader class of problems: optimization with partial observability. This framework is based on an average cost formulation which makes use of the concepts of differential costs and performance gradients to describe learning and optimization algorithms. Numerical simulations are conducted including a queueing problem and a maze problem to illustrate and verify features of the proposed algorithms. Pathways for applying this analysis to adaptive critics are also shown.

AB - This paper shows an approach to integrating common approximate dynamic programming (ADP) algorithms into a theoretical framework to address both analytical characteristics and algorithmic features. Several important insights are gained from this analysis, including new approaches to the creation of algorithms. Built on this paradigm, ADP learning algorithms are further developed to address a broader class of problems: optimization with partial observability. This framework is based on an average cost formulation which makes use of the concepts of differential costs and performance gradients to describe learning and optimization algorithms. Numerical simulations are conducted including a queueing problem and a maze problem to illustrate and verify features of the proposed algorithms. Pathways for applying this analysis to adaptive critics are also shown.

UR - http://www.scopus.com/inward/record.url?scp=61849156138&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=61849156138&partnerID=8YFLogxK

U2 - 10.1109/ISIC.2006.285595

DO - 10.1109/ISIC.2006.285595

M3 - Conference contribution

AN - SCOPUS:61849156138

SN - 0780397983

SN - 9780780397989

SP - 458

EP - 463

BT - IEEE International Symposium on Intelligent Control - Proceedings

ER -