Convergence results for some temporal difference methods based on least squares

Huizhen Yu; Dimitri P. Bertsekas

doi:10.1109/TAC.2009.2022097

Convergence results for some temporal difference methods based on least squares

Huizhen Yu, Dimitri P. Bertsekas

Research output: Contribution to journal › Article › peer-review

66 Scopus citations

Abstract

We consider finite-state Markov decision processes, and prove convergence and rate of convergence results for certain least squares policy evaluation algorithms of the type known as LSPE(λ). These are temporal difference methods for constructing a linear function approximation of the cost function of a stationary policy, within the context of infinite-horizon discounted and average cost dynamic programming. We introduce an average cost method, patterned after the known discounted cost method, and we prove its convergence for a range of constant stepsize choices. We also show that the convergence rate of both the discounted and the average cost methods is optimal within the class of temporal difference methods. Analysis and experiment indicate that our methods are substantially and often dramatically faster than TD(λ), as well as more reliable.

Original language	English (US)
Pages (from-to)	1515-1531
Number of pages	17
Journal	IEEE Transactions on Automatic Control
Volume	54
Issue number	7
DOIs	https://doi.org/10.1109/TAC.2009.2022097
State	Published - 2009
Externally published	Yes

Keywords

Approximation methods
Convergence of numerical methods
Dynamic programming
Markov processes

ASJC Scopus subject areas

Control and Systems Engineering
Computer Science Applications
Electrical and Electronic Engineering

Access to Document

10.1109/TAC.2009.2022097

Cite this

@article{0ad4146e43b04738af178bc698d95e47,

title = "Convergence results for some temporal difference methods based on least squares",

abstract = "We consider finite-state Markov decision processes, and prove convergence and rate of convergence results for certain least squares policy evaluation algorithms of the type known as LSPE(λ). These are temporal difference methods for constructing a linear function approximation of the cost function of a stationary policy, within the context of infinite-horizon discounted and average cost dynamic programming. We introduce an average cost method, patterned after the known discounted cost method, and we prove its convergence for a range of constant stepsize choices. We also show that the convergence rate of both the discounted and the average cost methods is optimal within the class of temporal difference methods. Analysis and experiment indicate that our methods are substantially and often dramatically faster than TD(λ), as well as more reliable.",

keywords = "Approximation methods, Convergence of numerical methods, Dynamic programming, Markov processes",

author = "Huizhen Yu and Bertsekas, {Dimitri P.}",

note = "Funding Information: Manuscript received July 17, 2006; revised August 15, 2007 and August 22, 2008. Current version published July 09, 2009. This work was supported by National Science Foundation (NSF) Grant ECS-0218328. Recommended by Associate Editor A. Lim.",

year = "2009",

doi = "10.1109/TAC.2009.2022097",

language = "English (US)",

volume = "54",

pages = "1515--1531",

journal = "IEEE Transactions on Automatic Control",

issn = "0018-9286",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

number = "7",

}

TY - JOUR

T1 - Convergence results for some temporal difference methods based on least squares

AU - Yu, Huizhen

AU - Bertsekas, Dimitri P.

N1 - Funding Information: Manuscript received July 17, 2006; revised August 15, 2007 and August 22, 2008. Current version published July 09, 2009. This work was supported by National Science Foundation (NSF) Grant ECS-0218328. Recommended by Associate Editor A. Lim.

PY - 2009

Y1 - 2009

N2 - We consider finite-state Markov decision processes, and prove convergence and rate of convergence results for certain least squares policy evaluation algorithms of the type known as LSPE(λ). These are temporal difference methods for constructing a linear function approximation of the cost function of a stationary policy, within the context of infinite-horizon discounted and average cost dynamic programming. We introduce an average cost method, patterned after the known discounted cost method, and we prove its convergence for a range of constant stepsize choices. We also show that the convergence rate of both the discounted and the average cost methods is optimal within the class of temporal difference methods. Analysis and experiment indicate that our methods are substantially and often dramatically faster than TD(λ), as well as more reliable.

AB - We consider finite-state Markov decision processes, and prove convergence and rate of convergence results for certain least squares policy evaluation algorithms of the type known as LSPE(λ). These are temporal difference methods for constructing a linear function approximation of the cost function of a stationary policy, within the context of infinite-horizon discounted and average cost dynamic programming. We introduce an average cost method, patterned after the known discounted cost method, and we prove its convergence for a range of constant stepsize choices. We also show that the convergence rate of both the discounted and the average cost methods is optimal within the class of temporal difference methods. Analysis and experiment indicate that our methods are substantially and often dramatically faster than TD(λ), as well as more reliable.

KW - Approximation methods

KW - Convergence of numerical methods

KW - Dynamic programming

KW - Markov processes

UR - http://www.scopus.com/inward/record.url?scp=67949109470&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=67949109470&partnerID=8YFLogxK

U2 - 10.1109/TAC.2009.2022097

DO - 10.1109/TAC.2009.2022097

M3 - Article

AN - SCOPUS:67949109470

SN - 0018-9286

VL - 54

SP - 1515

EP - 1531

JO - IEEE Transactions on Automatic Control

JF - IEEE Transactions on Automatic Control

IS - 7

ER -

Convergence results for some temporal difference methods based on least squares

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this