Policy approximation in policy iteration approximate dynamic programming for discrete-time nonlinear systems

Wentao Guo; Jennie Si; Feng Liu; Shengwei Mei

doi:10.1109/TNNLS.2017.2702566

Policy approximation in policy iteration approximate dynamic programming for discrete-time nonlinear systems

Wentao Guo, Jennie Si, Feng Liu, Shengwei Mei

Research output: Contribution to journal › Article › peer-review

32 Scopus citations

Abstract

Policy iteration approximate dynamic programming (DP) is an important algorithm for solving optimal decision and control problems. In this paper, we focus on the problem associated with policy approximation in policy iteration approximate DP for discrete-time nonlinear systems using infinite-horizon undiscounted value functions. Taking policy approximation error into account, we demonstrate asymptotic stability of the control policy under our problem setting, show boundedness of the value function during each policy iteration step, and introduce a new sufficient condition for the value function to converge to a bounded neighborhood of the optimal value function. Aiming for practical implementation of an approximate policy, we consider using Volterra series, which has been extensively covered in controls literature for its good theoretical properties and for its success in practical applications. We illustrate the effectiveness of the main ideas developed in this paper using several examples including a practical problem of excitation control of a hydrogenerator.

Original language	English (US)
Pages (from-to)	2794-2807
Number of pages	14
Journal	IEEE Transactions on Neural Networks and Learning Systems
Volume	29
Issue number	7
DOIs	https://doi.org/10.1109/TNNLS.2017.2702566
State	Published - Jul 2018

Keywords

Approximate dynamic programming (DP)
Volterra series
convergence
error bound
policy approximation
policy iteration

ASJC Scopus subject areas

Software
Computer Science Applications
Computer Networks and Communications
Artificial Intelligence

Access to Document

10.1109/TNNLS.2017.2702566

Cite this

@article{02046de1faa34a59b29843b830963acb,

title = "Policy approximation in policy iteration approximate dynamic programming for discrete-time nonlinear systems",

abstract = "Policy iteration approximate dynamic programming (DP) is an important algorithm for solving optimal decision and control problems. In this paper, we focus on the problem associated with policy approximation in policy iteration approximate DP for discrete-time nonlinear systems using infinite-horizon undiscounted value functions. Taking policy approximation error into account, we demonstrate asymptotic stability of the control policy under our problem setting, show boundedness of the value function during each policy iteration step, and introduce a new sufficient condition for the value function to converge to a bounded neighborhood of the optimal value function. Aiming for practical implementation of an approximate policy, we consider using Volterra series, which has been extensively covered in controls literature for its good theoretical properties and for its success in practical applications. We illustrate the effectiveness of the main ideas developed in this paper using several examples including a practical problem of excitation control of a hydrogenerator.",

keywords = "Approximate dynamic programming (DP), Volterra series, convergence, error bound, policy approximation, policy iteration",

author = "Wentao Guo and Jennie Si and Feng Liu and Shengwei Mei",

note = "Publisher Copyright: {\textcopyright} 2012 IEEE.",

year = "2018",

month = jul,

doi = "10.1109/TNNLS.2017.2702566",

language = "English (US)",

volume = "29",

pages = "2794--2807",

journal = "IEEE Transactions on Neural Networks and Learning Systems",

issn = "2162-237X",

publisher = "IEEE Computational Intelligence Society",

number = "7",

}

TY - JOUR

T1 - Policy approximation in policy iteration approximate dynamic programming for discrete-time nonlinear systems

AU - Guo, Wentao

AU - Si, Jennie

AU - Liu, Feng

AU - Mei, Shengwei

PY - 2018/7

Y1 - 2018/7

N2 - Policy iteration approximate dynamic programming (DP) is an important algorithm for solving optimal decision and control problems. In this paper, we focus on the problem associated with policy approximation in policy iteration approximate DP for discrete-time nonlinear systems using infinite-horizon undiscounted value functions. Taking policy approximation error into account, we demonstrate asymptotic stability of the control policy under our problem setting, show boundedness of the value function during each policy iteration step, and introduce a new sufficient condition for the value function to converge to a bounded neighborhood of the optimal value function. Aiming for practical implementation of an approximate policy, we consider using Volterra series, which has been extensively covered in controls literature for its good theoretical properties and for its success in practical applications. We illustrate the effectiveness of the main ideas developed in this paper using several examples including a practical problem of excitation control of a hydrogenerator.

AB - Policy iteration approximate dynamic programming (DP) is an important algorithm for solving optimal decision and control problems. In this paper, we focus on the problem associated with policy approximation in policy iteration approximate DP for discrete-time nonlinear systems using infinite-horizon undiscounted value functions. Taking policy approximation error into account, we demonstrate asymptotic stability of the control policy under our problem setting, show boundedness of the value function during each policy iteration step, and introduce a new sufficient condition for the value function to converge to a bounded neighborhood of the optimal value function. Aiming for practical implementation of an approximate policy, we consider using Volterra series, which has been extensively covered in controls literature for its good theoretical properties and for its success in practical applications. We illustrate the effectiveness of the main ideas developed in this paper using several examples including a practical problem of excitation control of a hydrogenerator.

KW - Approximate dynamic programming (DP)

KW - Volterra series

KW - convergence

KW - error bound

KW - policy approximation

KW - policy iteration

UR - http://www.scopus.com/inward/record.url?scp=85020479163&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85020479163&partnerID=8YFLogxK

U2 - 10.1109/TNNLS.2017.2702566

DO - 10.1109/TNNLS.2017.2702566

M3 - Article

C2 - 28600262

AN - SCOPUS:85020479163

SN - 2162-237X

VL - 29

SP - 2794

EP - 2807

JO - IEEE Transactions on Neural Networks and Learning Systems

JF - IEEE Transactions on Neural Networks and Learning Systems

IS - 7

ER -

Policy approximation in policy iteration approximate dynamic programming for discrete-time nonlinear systems

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this