A performance gradient perspective on gradient-based policy iteration and a modified value iteration

Lei Yang, James Dankert, Jennie Si

Research output: Contribution to journalArticlepeer-review

3 Scopus citations

Abstract

Purpose – The purpose of this paper is to develop a mathematical framework to address some algorithmic features of approximate dynamic programming (ADP) by using an average cost formulation based on the concepts of differential costs and performance gradients. Under such a framework, a modified value iteration algorithm is developed that is easy to implement, in the mean time it can address a class of partially observable Markov decision processes (POMDP). Design/methodology/approach – Gradient-based policy iteration (GBPI) is a top-down, system-theoretic approach to dynamic optimization with performance guarantees. In this paper, a bottom-up, algorithmic view is provided to complement the original high-level development of GBPI. A modified value iteration is introduced, which can provide solutions to the same type of POMDP problems dealt with by GBPI. Numerical simulations are conducted to include a queuing problem and a maze problem to illustrate and verify features of the proposed algorithms as compared to GBPI. Findings – The direct connection between GBPI and policy iteration is shown under a Markov decision process formulation. As such, additional analytical insights were gained on GBPI. Furthermore, motivated by this analytical framework, the authors propose a modified value iteration as an alternative to addressing the same POMDP problem handled by GBPI. Originality/value – Several important insights are gained from the analytical framework, which motivate the development of both algorithms. Built on this paradigm, new ADP learning algorithms can be developed, in this case, the modified value iteration, to address a broader class of problems, the POMDP. In addition, it is now possible to provide ADP algorithms with a gradient perspective. Inspired by the fundamental understanding of learning and optimization problems under the gradient-based framework, additional new insight may be developed for bottom-up type of algorithms with performance guarantees.

Original languageEnglish (US)
Pages (from-to)509-520
Number of pages12
JournalInternational Journal of Intelligent Computing and Cybernetics
Volume1
Issue number4
DOIs
StatePublished - Oct 17 2008

Keywords

  • Gradient methods
  • Iterative methods
  • Markov processes
  • Programming and algorithm theory

ASJC Scopus subject areas

  • General Computer Science

Fingerprint

Dive into the research topics of 'A performance gradient perspective on gradient-based policy iteration and a modified value iteration'. Together they form a unique fingerprint.

Cite this