Least squares policy evaluation algorithms with linear function approximation

Angelia Nedich, D. P. Bertsekas

Research output: Contribution to journalArticle

97 Citations (Scopus)

Abstract

We consider policy evaluation algorithms within the context of infinite-horizon dynamic programming problems with discounted cost. We focus on discrete-time dynamic systems with a large number of states, and we discuss two methods, which use simulation, temporal differences, and linear cost function approximation. The first method is a new gradient-like algorithm involving least-squares subproblems and a diminishing stepsize, which is based on the λ-policy iteration method of Bertsekas and Ioffe. The second method is the LSTD(λ) algorithm recently proposed by Boyan, which for λ = 0 coincides with the linear least-squares temporal-difference algorithm of Bradtke and Barto. At present, there is only a convergence result by Bradtke and Barto for the LSTD(0) algorithm. Here, we strengthen this result by showing the convergence of LSTD(λ), with probability 1, for every λε[0,1].

Original languageEnglish (US)
Pages (from-to)79-110
Number of pages32
JournalDiscrete Event Dynamic Systems: Theory and Applications
Volume13
Issue number1-2
DOIs
StatePublished - Jan 2003
Externally publishedYes

Fingerprint

Function Approximation
Linear Approximation
Linear Function
Least Squares
Evaluation
Policy Iteration
Linear Least Squares
Diminishing
Least Square Algorithm
Infinite Horizon
Iteration Method
Discrete-time Systems
Convergence Results
Dynamic Systems
Dynamic Programming
Cost Function
Dynamic programming
Cost functions
Gradient
Dynamical systems

ASJC Scopus subject areas

  • Control and Systems Engineering
  • Applied Mathematics
  • Management Science and Operations Research
  • Safety, Risk, Reliability and Quality

Cite this

Least squares policy evaluation algorithms with linear function approximation. / Nedich, Angelia; Bertsekas, D. P.

In: Discrete Event Dynamic Systems: Theory and Applications, Vol. 13, No. 1-2, 01.2003, p. 79-110.

Research output: Contribution to journalArticle

@article{6df16224ad874ff98ac7cfdd7441d817,
title = "Least squares policy evaluation algorithms with linear function approximation",
abstract = "We consider policy evaluation algorithms within the context of infinite-horizon dynamic programming problems with discounted cost. We focus on discrete-time dynamic systems with a large number of states, and we discuss two methods, which use simulation, temporal differences, and linear cost function approximation. The first method is a new gradient-like algorithm involving least-squares subproblems and a diminishing stepsize, which is based on the λ-policy iteration method of Bertsekas and Ioffe. The second method is the LSTD(λ) algorithm recently proposed by Boyan, which for λ = 0 coincides with the linear least-squares temporal-difference algorithm of Bradtke and Barto. At present, there is only a convergence result by Bradtke and Barto for the LSTD(0) algorithm. Here, we strengthen this result by showing the convergence of LSTD(λ), with probability 1, for every λε[0,1].",
author = "Angelia Nedich and Bertsekas, {D. P.}",
year = "2003",
month = "1",
doi = "10.1023/A:1022192903948",
language = "English (US)",
volume = "13",
pages = "79--110",
journal = "Discrete Event Dynamic Systems: Theory and Applications",
issn = "0924-6703",
publisher = "Springer Netherlands",
number = "1-2",

}

TY - JOUR

T1 - Least squares policy evaluation algorithms with linear function approximation

AU - Nedich, Angelia

AU - Bertsekas, D. P.

PY - 2003/1

Y1 - 2003/1

N2 - We consider policy evaluation algorithms within the context of infinite-horizon dynamic programming problems with discounted cost. We focus on discrete-time dynamic systems with a large number of states, and we discuss two methods, which use simulation, temporal differences, and linear cost function approximation. The first method is a new gradient-like algorithm involving least-squares subproblems and a diminishing stepsize, which is based on the λ-policy iteration method of Bertsekas and Ioffe. The second method is the LSTD(λ) algorithm recently proposed by Boyan, which for λ = 0 coincides with the linear least-squares temporal-difference algorithm of Bradtke and Barto. At present, there is only a convergence result by Bradtke and Barto for the LSTD(0) algorithm. Here, we strengthen this result by showing the convergence of LSTD(λ), with probability 1, for every λε[0,1].

AB - We consider policy evaluation algorithms within the context of infinite-horizon dynamic programming problems with discounted cost. We focus on discrete-time dynamic systems with a large number of states, and we discuss two methods, which use simulation, temporal differences, and linear cost function approximation. The first method is a new gradient-like algorithm involving least-squares subproblems and a diminishing stepsize, which is based on the λ-policy iteration method of Bertsekas and Ioffe. The second method is the LSTD(λ) algorithm recently proposed by Boyan, which for λ = 0 coincides with the linear least-squares temporal-difference algorithm of Bradtke and Barto. At present, there is only a convergence result by Bradtke and Barto for the LSTD(0) algorithm. Here, we strengthen this result by showing the convergence of LSTD(λ), with probability 1, for every λε[0,1].

UR - http://www.scopus.com/inward/record.url?scp=0037288398&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0037288398&partnerID=8YFLogxK

U2 - 10.1023/A:1022192903948

DO - 10.1023/A:1022192903948

M3 - Article

AN - SCOPUS:0037288398

VL - 13

SP - 79

EP - 110

JO - Discrete Event Dynamic Systems: Theory and Applications

JF - Discrete Event Dynamic Systems: Theory and Applications

SN - 0924-6703

IS - 1-2

ER -