Abstract
This paper shows an approach to integrating common approximate dynamic programming (ADP) algorithms into a theoretical framework to address both analytical characteristics and algorithmic features. Several important insights are gained from this analysis, including new approaches to the creation of algorithms. Built on this paradigm, ADP learning algorithms are further developed to address a broader class of problems: optimization with partial observability. This framework is based on an average cost formulation which makes use of the concepts of differential costs and performance gradients to describe learning and optimization algorithms. Numerical simulations are conducted including a queueing problem and a maze problem to illustrate and verify features of the proposed algorithms. Pathways for applying this analysis to adaptive critics are also shown.
Original language | English (US) |
---|---|
Title of host publication | IEEE International Symposium on Intelligent Control - Proceedings |
Pages | 458-463 |
Number of pages | 6 |
DOIs | |
State | Published - 2006 |
Event | Joint 2006 IEEE Conference on Control Applications (CCA), Computer-Aided Control Systems Design Symposium (CACSD) and International Symposium on Intelligent Control (ISIC) - Munich, Germany Duration: Oct 4 2006 → Oct 6 2006 |
Other
Other | Joint 2006 IEEE Conference on Control Applications (CCA), Computer-Aided Control Systems Design Symposium (CACSD) and International Symposium on Intelligent Control (ISIC) |
---|---|
Country | Germany |
City | Munich |
Period | 10/4/06 → 10/6/06 |
Fingerprint
ASJC Scopus subject areas
- Computer Science Applications
- Control and Systems Engineering
- Electrical and Electronic Engineering
- Modeling and Simulation
Cite this
A performance gradient perspective on approximate dynamic programming and its application to partially observable markov decision processes. / Dankert, James; Yang, Lei; Si, Jennie.
IEEE International Symposium on Intelligent Control - Proceedings. 2006. p. 458-463 4064920.Research output: Chapter in Book/Report/Conference proceeding › Conference contribution
}
TY - GEN
T1 - A performance gradient perspective on approximate dynamic programming and its application to partially observable markov decision processes
AU - Dankert, James
AU - Yang, Lei
AU - Si, Jennie
PY - 2006
Y1 - 2006
N2 - This paper shows an approach to integrating common approximate dynamic programming (ADP) algorithms into a theoretical framework to address both analytical characteristics and algorithmic features. Several important insights are gained from this analysis, including new approaches to the creation of algorithms. Built on this paradigm, ADP learning algorithms are further developed to address a broader class of problems: optimization with partial observability. This framework is based on an average cost formulation which makes use of the concepts of differential costs and performance gradients to describe learning and optimization algorithms. Numerical simulations are conducted including a queueing problem and a maze problem to illustrate and verify features of the proposed algorithms. Pathways for applying this analysis to adaptive critics are also shown.
AB - This paper shows an approach to integrating common approximate dynamic programming (ADP) algorithms into a theoretical framework to address both analytical characteristics and algorithmic features. Several important insights are gained from this analysis, including new approaches to the creation of algorithms. Built on this paradigm, ADP learning algorithms are further developed to address a broader class of problems: optimization with partial observability. This framework is based on an average cost formulation which makes use of the concepts of differential costs and performance gradients to describe learning and optimization algorithms. Numerical simulations are conducted including a queueing problem and a maze problem to illustrate and verify features of the proposed algorithms. Pathways for applying this analysis to adaptive critics are also shown.
UR - http://www.scopus.com/inward/record.url?scp=61849156138&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=61849156138&partnerID=8YFLogxK
U2 - 10.1109/ISIC.2006.285595
DO - 10.1109/ISIC.2006.285595
M3 - Conference contribution
AN - SCOPUS:61849156138
SN - 0780397983
SN - 9780780397989
SP - 458
EP - 463
BT - IEEE International Symposium on Intelligent Control - Proceedings
ER -