Policy iteration approximate dynamic programming using Volterra series based actor

Wentao Guo, Jennie Si, Feng Liu, Shengwei Mei

Research output: Chapter in Book/Report/Conference proceedingConference contribution

4 Scopus citations

Abstract

There is an extensive literature on value function approximation for approximate dynamic programming (ADP). Multilayer perceptrons (MLPs) and radial basis functions (RBFs), among others, are typical approximators for value functions in ADP. Similar approaches have been taken for policy approximation. In this paper, we propose a new Volterra series based structure for actor approximation in ADP. The Volterra approx-imator is linear in parameters with global optima attainable. Given the proposed approximator structures, we further develop a policy iteration framework under which a gradient descent training algorithm for obtaining the optimal Volterra kernels can be obtained. Associated with this ADP design, we provide a sufficient condition based on actor approximation error to guarantee convergence of the value function iterations. A finite bound of the final convergent value function is also given. Finally, by using a simulation example we illustrate the effectiveness of the proposed Volterra actor for optimal control of a nonlinear system.

Original languageEnglish (US)
Title of host publicationProceedings of the International Joint Conference on Neural Networks
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages249-255
Number of pages7
ISBN (Electronic)9781479914845
DOIs
StatePublished - Sep 3 2014
Event2014 International Joint Conference on Neural Networks, IJCNN 2014 - Beijing, China
Duration: Jul 6 2014Jul 11 2014

Publication series

NameProceedings of the International Joint Conference on Neural Networks

Other

Other2014 International Joint Conference on Neural Networks, IJCNN 2014
Country/TerritoryChina
CityBeijing
Period7/6/147/11/14

ASJC Scopus subject areas

  • Software
  • Artificial Intelligence

Fingerprint

Dive into the research topics of 'Policy iteration approximate dynamic programming using Volterra series based actor'. Together they form a unique fingerprint.

Cite this