Policy iteration approximate dynamic programming using Volterra series based actor

Wentao Guo; Jennie Si; Feng Liu; Shengwei Mei

doi:10.1109/IJCNN.2014.6889865

Policy iteration approximate dynamic programming using Volterra series based actor

Wentao Guo, Jennie Si, Feng Liu, Shengwei Mei

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

4 Scopus citations

Abstract

There is an extensive literature on value function approximation for approximate dynamic programming (ADP). Multilayer perceptrons (MLPs) and radial basis functions (RBFs), among others, are typical approximators for value functions in ADP. Similar approaches have been taken for policy approximation. In this paper, we propose a new Volterra series based structure for actor approximation in ADP. The Volterra approx-imator is linear in parameters with global optima attainable. Given the proposed approximator structures, we further develop a policy iteration framework under which a gradient descent training algorithm for obtaining the optimal Volterra kernels can be obtained. Associated with this ADP design, we provide a sufficient condition based on actor approximation error to guarantee convergence of the value function iterations. A finite bound of the final convergent value function is also given. Finally, by using a simulation example we illustrate the effectiveness of the proposed Volterra actor for optimal control of a nonlinear system.

Original language	English (US)
Title of host publication	Proceedings of the International Joint Conference on Neural Networks
Publisher	Institute of Electrical and Electronics Engineers Inc.
Pages	249-255
Number of pages	7
ISBN (Electronic)	9781479914845
DOIs	https://doi.org/10.1109/IJCNN.2014.6889865
State	Published - Sep 3 2014
Event	2014 International Joint Conference on Neural Networks, IJCNN 2014 - Beijing, China Duration: Jul 6 2014 → Jul 11 2014

Publication series

Name	Proceedings of the International Joint Conference on Neural Networks

Other

Other	2014 International Joint Conference on Neural Networks, IJCNN 2014
Country/Territory	China
City	Beijing
Period	7/6/14 → 7/11/14

ASJC Scopus subject areas

Software
Artificial Intelligence

Access to Document

10.1109/IJCNN.2014.6889865

Cite this

Guo, W., Si, J., Liu, F., & Mei, S. (2014). Policy iteration approximate dynamic programming using Volterra series based actor. In Proceedings of the International Joint Conference on Neural Networks (pp. 249-255). Article 6889865 (Proceedings of the International Joint Conference on Neural Networks). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/IJCNN.2014.6889865

Policy iteration approximate dynamic programming using Volterra series based actor. / Guo, Wentao; Si, Jennie; Liu, Feng et al.
Proceedings of the International Joint Conference on Neural Networks. Institute of Electrical and Electronics Engineers Inc., 2014. p. 249-255 6889865 (Proceedings of the International Joint Conference on Neural Networks).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Guo, W, Si, J, Liu, F & Mei, S 2014, Policy iteration approximate dynamic programming using Volterra series based actor. in Proceedings of the International Joint Conference on Neural Networks., 6889865, Proceedings of the International Joint Conference on Neural Networks, Institute of Electrical and Electronics Engineers Inc., pp. 249-255, 2014 International Joint Conference on Neural Networks, IJCNN 2014, Beijing, China, 7/6/14. https://doi.org/10.1109/IJCNN.2014.6889865

@inproceedings{71440c554110429f80ec9d91d102a3fb,

title = "Policy iteration approximate dynamic programming using Volterra series based actor",

abstract = "There is an extensive literature on value function approximation for approximate dynamic programming (ADP). Multilayer perceptrons (MLPs) and radial basis functions (RBFs), among others, are typical approximators for value functions in ADP. Similar approaches have been taken for policy approximation. In this paper, we propose a new Volterra series based structure for actor approximation in ADP. The Volterra approx-imator is linear in parameters with global optima attainable. Given the proposed approximator structures, we further develop a policy iteration framework under which a gradient descent training algorithm for obtaining the optimal Volterra kernels can be obtained. Associated with this ADP design, we provide a sufficient condition based on actor approximation error to guarantee convergence of the value function iterations. A finite bound of the final convergent value function is also given. Finally, by using a simulation example we illustrate the effectiveness of the proposed Volterra actor for optimal control of a nonlinear system.",

author = "Wentao Guo and Jennie Si and Feng Liu and Shengwei Mei",

note = "Publisher Copyright: {\textcopyright} 2014 IEEE.; 2014 International Joint Conference on Neural Networks, IJCNN 2014 ; Conference date: 06-07-2014 Through 11-07-2014",

year = "2014",

month = sep,

day = "3",

doi = "10.1109/IJCNN.2014.6889865",

language = "English (US)",

series = "Proceedings of the International Joint Conference on Neural Networks",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

pages = "249--255",

booktitle = "Proceedings of the International Joint Conference on Neural Networks",

}

TY - GEN

T1 - Policy iteration approximate dynamic programming using Volterra series based actor

AU - Guo, Wentao

AU - Si, Jennie

AU - Liu, Feng

AU - Mei, Shengwei

PY - 2014/9/3

Y1 - 2014/9/3

N2 - There is an extensive literature on value function approximation for approximate dynamic programming (ADP). Multilayer perceptrons (MLPs) and radial basis functions (RBFs), among others, are typical approximators for value functions in ADP. Similar approaches have been taken for policy approximation. In this paper, we propose a new Volterra series based structure for actor approximation in ADP. The Volterra approx-imator is linear in parameters with global optima attainable. Given the proposed approximator structures, we further develop a policy iteration framework under which a gradient descent training algorithm for obtaining the optimal Volterra kernels can be obtained. Associated with this ADP design, we provide a sufficient condition based on actor approximation error to guarantee convergence of the value function iterations. A finite bound of the final convergent value function is also given. Finally, by using a simulation example we illustrate the effectiveness of the proposed Volterra actor for optimal control of a nonlinear system.

AB - There is an extensive literature on value function approximation for approximate dynamic programming (ADP). Multilayer perceptrons (MLPs) and radial basis functions (RBFs), among others, are typical approximators for value functions in ADP. Similar approaches have been taken for policy approximation. In this paper, we propose a new Volterra series based structure for actor approximation in ADP. The Volterra approx-imator is linear in parameters with global optima attainable. Given the proposed approximator structures, we further develop a policy iteration framework under which a gradient descent training algorithm for obtaining the optimal Volterra kernels can be obtained. Associated with this ADP design, we provide a sufficient condition based on actor approximation error to guarantee convergence of the value function iterations. A finite bound of the final convergent value function is also given. Finally, by using a simulation example we illustrate the effectiveness of the proposed Volterra actor for optimal control of a nonlinear system.

UR - http://www.scopus.com/inward/record.url?scp=84908466723&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84908466723&partnerID=8YFLogxK

U2 - 10.1109/IJCNN.2014.6889865

DO - 10.1109/IJCNN.2014.6889865

M3 - Conference contribution

AN - SCOPUS:84908466723

T3 - Proceedings of the International Joint Conference on Neural Networks

SP - 249

EP - 255

BT - Proceedings of the International Joint Conference on Neural Networks

PB - Institute of Electrical and Electronics Engineers Inc.

T2 - 2014 International Joint Conference on Neural Networks, IJCNN 2014

Y2 - 6 July 2014 through 11 July 2014

ER -

Policy iteration approximate dynamic programming using Volterra series based actor

Abstract

Publication series

Other

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this