Policy iteration approximate dynamic programming using Volterra series based actor

Wentao Guo, Jennie Si, Feng Liu, Shengwei Mei

Research output: Chapter in Book/Report/Conference proceedingConference contribution

4 Citations (Scopus)

Abstract

There is an extensive literature on value function approximation for approximate dynamic programming (ADP). Multilayer perceptrons (MLPs) and radial basis functions (RBFs), among others, are typical approximators for value functions in ADP. Similar approaches have been taken for policy approximation. In this paper, we propose a new Volterra series based structure for actor approximation in ADP. The Volterra approx-imator is linear in parameters with global optima attainable. Given the proposed approximator structures, we further develop a policy iteration framework under which a gradient descent training algorithm for obtaining the optimal Volterra kernels can be obtained. Associated with this ADP design, we provide a sufficient condition based on actor approximation error to guarantee convergence of the value function iterations. A finite bound of the final convergent value function is also given. Finally, by using a simulation example we illustrate the effectiveness of the proposed Volterra actor for optimal control of a nonlinear system.

Original languageEnglish (US)
Title of host publicationProceedings of the International Joint Conference on Neural Networks
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages249-255
Number of pages7
ISBN (Print)9781479914845
DOIs
StatePublished - Sep 3 2014
Event2014 International Joint Conference on Neural Networks, IJCNN 2014 - Beijing, China
Duration: Jul 6 2014Jul 11 2014

Other

Other2014 International Joint Conference on Neural Networks, IJCNN 2014
CountryChina
CityBeijing
Period7/6/147/11/14

Fingerprint

Dynamic programming
Multilayer neural networks
Nonlinear systems

ASJC Scopus subject areas

  • Software
  • Artificial Intelligence

Cite this

Guo, W., Si, J., Liu, F., & Mei, S. (2014). Policy iteration approximate dynamic programming using Volterra series based actor. In Proceedings of the International Joint Conference on Neural Networks (pp. 249-255). [6889865] Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/IJCNN.2014.6889865

Policy iteration approximate dynamic programming using Volterra series based actor. / Guo, Wentao; Si, Jennie; Liu, Feng; Mei, Shengwei.

Proceedings of the International Joint Conference on Neural Networks. Institute of Electrical and Electronics Engineers Inc., 2014. p. 249-255 6889865.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Guo, W, Si, J, Liu, F & Mei, S 2014, Policy iteration approximate dynamic programming using Volterra series based actor. in Proceedings of the International Joint Conference on Neural Networks., 6889865, Institute of Electrical and Electronics Engineers Inc., pp. 249-255, 2014 International Joint Conference on Neural Networks, IJCNN 2014, Beijing, China, 7/6/14. https://doi.org/10.1109/IJCNN.2014.6889865
Guo W, Si J, Liu F, Mei S. Policy iteration approximate dynamic programming using Volterra series based actor. In Proceedings of the International Joint Conference on Neural Networks. Institute of Electrical and Electronics Engineers Inc. 2014. p. 249-255. 6889865 https://doi.org/10.1109/IJCNN.2014.6889865
Guo, Wentao ; Si, Jennie ; Liu, Feng ; Mei, Shengwei. / Policy iteration approximate dynamic programming using Volterra series based actor. Proceedings of the International Joint Conference on Neural Networks. Institute of Electrical and Electronics Engineers Inc., 2014. pp. 249-255
@inproceedings{71440c554110429f80ec9d91d102a3fb,
title = "Policy iteration approximate dynamic programming using Volterra series based actor",
abstract = "There is an extensive literature on value function approximation for approximate dynamic programming (ADP). Multilayer perceptrons (MLPs) and radial basis functions (RBFs), among others, are typical approximators for value functions in ADP. Similar approaches have been taken for policy approximation. In this paper, we propose a new Volterra series based structure for actor approximation in ADP. The Volterra approx-imator is linear in parameters with global optima attainable. Given the proposed approximator structures, we further develop a policy iteration framework under which a gradient descent training algorithm for obtaining the optimal Volterra kernels can be obtained. Associated with this ADP design, we provide a sufficient condition based on actor approximation error to guarantee convergence of the value function iterations. A finite bound of the final convergent value function is also given. Finally, by using a simulation example we illustrate the effectiveness of the proposed Volterra actor for optimal control of a nonlinear system.",
author = "Wentao Guo and Jennie Si and Feng Liu and Shengwei Mei",
year = "2014",
month = "9",
day = "3",
doi = "10.1109/IJCNN.2014.6889865",
language = "English (US)",
isbn = "9781479914845",
pages = "249--255",
booktitle = "Proceedings of the International Joint Conference on Neural Networks",
publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

TY - GEN

T1 - Policy iteration approximate dynamic programming using Volterra series based actor

AU - Guo, Wentao

AU - Si, Jennie

AU - Liu, Feng

AU - Mei, Shengwei

PY - 2014/9/3

Y1 - 2014/9/3

N2 - There is an extensive literature on value function approximation for approximate dynamic programming (ADP). Multilayer perceptrons (MLPs) and radial basis functions (RBFs), among others, are typical approximators for value functions in ADP. Similar approaches have been taken for policy approximation. In this paper, we propose a new Volterra series based structure for actor approximation in ADP. The Volterra approx-imator is linear in parameters with global optima attainable. Given the proposed approximator structures, we further develop a policy iteration framework under which a gradient descent training algorithm for obtaining the optimal Volterra kernels can be obtained. Associated with this ADP design, we provide a sufficient condition based on actor approximation error to guarantee convergence of the value function iterations. A finite bound of the final convergent value function is also given. Finally, by using a simulation example we illustrate the effectiveness of the proposed Volterra actor for optimal control of a nonlinear system.

AB - There is an extensive literature on value function approximation for approximate dynamic programming (ADP). Multilayer perceptrons (MLPs) and radial basis functions (RBFs), among others, are typical approximators for value functions in ADP. Similar approaches have been taken for policy approximation. In this paper, we propose a new Volterra series based structure for actor approximation in ADP. The Volterra approx-imator is linear in parameters with global optima attainable. Given the proposed approximator structures, we further develop a policy iteration framework under which a gradient descent training algorithm for obtaining the optimal Volterra kernels can be obtained. Associated with this ADP design, we provide a sufficient condition based on actor approximation error to guarantee convergence of the value function iterations. A finite bound of the final convergent value function is also given. Finally, by using a simulation example we illustrate the effectiveness of the proposed Volterra actor for optimal control of a nonlinear system.

UR - http://www.scopus.com/inward/record.url?scp=84908466723&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84908466723&partnerID=8YFLogxK

U2 - 10.1109/IJCNN.2014.6889865

DO - 10.1109/IJCNN.2014.6889865

M3 - Conference contribution

AN - SCOPUS:84908466723

SN - 9781479914845

SP - 249

EP - 255

BT - Proceedings of the International Joint Conference on Neural Networks

PB - Institute of Electrical and Electronics Engineers Inc.

ER -