Least squares policy evaluation algorithms with linear function approximation

Research output: Contribution to journalArticlepeer-review

127 Scopus citations

Abstract

We consider policy evaluation algorithms within the context of infinite-horizon dynamic programming problems with discounted cost. We focus on discrete-time dynamic systems with a large number of states, and we discuss two methods, which use simulation, temporal differences, and linear cost function approximation. The first method is a new gradient-like algorithm involving least-squares subproblems and a diminishing stepsize, which is based on the λ-policy iteration method of Bertsekas and Ioffe. The second method is the LSTD(λ) algorithm recently proposed by Boyan, which for λ = 0 coincides with the linear least-squares temporal-difference algorithm of Bradtke and Barto. At present, there is only a convergence result by Bradtke and Barto for the LSTD(0) algorithm. Here, we strengthen this result by showing the convergence of LSTD(λ), with probability 1, for every λε[0,1].

Original languageEnglish (US)
Pages (from-to)79-110
Number of pages32
JournalDiscrete Event Dynamic Systems: Theory and Applications
Volume13
Issue number1-2
DOIs
StatePublished - Jan 2003
Externally publishedYes

ASJC Scopus subject areas

  • Control and Systems Engineering
  • Modeling and Simulation
  • Electrical and Electronic Engineering

Fingerprint

Dive into the research topics of 'Least squares policy evaluation algorithms with linear function approximation'. Together they form a unique fingerprint.

Cite this