Improved temporal difference methods with linear function approximation

Dimitri P. Bertsekas, Angelia Nedich, Vivek S. Borkar

Research output: Chapter in Book/Report/Conference proceedingChapter

25 Citations (Scopus)

Abstract

This chapter considers temporal difference algorithms within the context of infinite-horizon finite-state dynamic programming problems with discounted cost and linear cost function approximation. This problem arises as a subproblem in the policy iteration method of dynamic programming. Additional discussions of such problems can be found in Chapters 6 and 12. The method presented here is the first iterative temporal difference method that converges without requiring a diminishing step size. The chapter discusses the connections with Sutton’s ID(λ) and with various versions of least-squares that are based on value iteration. It is shown using both analysis and experiments that the proposed method is substantially faster, simpler, and more reliable than TD(λ). Comparisons are also made with the LSTD method of Boyan, and Bradtke and Barto.

Original languageEnglish (US)
Title of host publicationHandbook of Learning and Approximate Dynamic Programming
PublisherJohn Wiley and Sons Inc.
Pages235-259
Number of pages25
ISBN (Electronic)9780470544785
ISBN (Print)047166054X, 9780471660545
DOIs
StatePublished - Jan 1 2004
Externally publishedYes

Fingerprint

Dynamic programming
Cost functions
Costs
Experiments

Keywords

  • Argon
  • Convergence
  • Eigenvalues and eigenfunctions
  • Function approximation
  • Markov processes
  • Trajectory
  • Vectors

ASJC Scopus subject areas

  • Computer Science(all)

Cite this

Bertsekas, D. P., Nedich, A., & Borkar, V. S. (2004). Improved temporal difference methods with linear function approximation. In Handbook of Learning and Approximate Dynamic Programming (pp. 235-259). John Wiley and Sons Inc.. https://doi.org/10.1109/9780470544785.ch9

Improved temporal difference methods with linear function approximation. / Bertsekas, Dimitri P.; Nedich, Angelia; Borkar, Vivek S.

Handbook of Learning and Approximate Dynamic Programming. John Wiley and Sons Inc., 2004. p. 235-259.

Research output: Chapter in Book/Report/Conference proceedingChapter

Bertsekas, DP, Nedich, A & Borkar, VS 2004, Improved temporal difference methods with linear function approximation. in Handbook of Learning and Approximate Dynamic Programming. John Wiley and Sons Inc., pp. 235-259. https://doi.org/10.1109/9780470544785.ch9
Bertsekas DP, Nedich A, Borkar VS. Improved temporal difference methods with linear function approximation. In Handbook of Learning and Approximate Dynamic Programming. John Wiley and Sons Inc. 2004. p. 235-259 https://doi.org/10.1109/9780470544785.ch9
Bertsekas, Dimitri P. ; Nedich, Angelia ; Borkar, Vivek S. / Improved temporal difference methods with linear function approximation. Handbook of Learning and Approximate Dynamic Programming. John Wiley and Sons Inc., 2004. pp. 235-259
@inbook{01ffe6e9224c44cf8223ba10086954ef,
title = "Improved temporal difference methods with linear function approximation",
abstract = "This chapter considers temporal difference algorithms within the context of infinite-horizon finite-state dynamic programming problems with discounted cost and linear cost function approximation. This problem arises as a subproblem in the policy iteration method of dynamic programming. Additional discussions of such problems can be found in Chapters 6 and 12. The method presented here is the first iterative temporal difference method that converges without requiring a diminishing step size. The chapter discusses the connections with Sutton’s ID(λ) and with various versions of least-squares that are based on value iteration. It is shown using both analysis and experiments that the proposed method is substantially faster, simpler, and more reliable than TD(λ). Comparisons are also made with the LSTD method of Boyan, and Bradtke and Barto.",
keywords = "Argon, Convergence, Eigenvalues and eigenfunctions, Function approximation, Markov processes, Trajectory, Vectors",
author = "Bertsekas, {Dimitri P.} and Angelia Nedich and Borkar, {Vivek S.}",
year = "2004",
month = "1",
day = "1",
doi = "10.1109/9780470544785.ch9",
language = "English (US)",
isbn = "047166054X",
pages = "235--259",
booktitle = "Handbook of Learning and Approximate Dynamic Programming",
publisher = "John Wiley and Sons Inc.",
address = "United States",

}

TY - CHAP

T1 - Improved temporal difference methods with linear function approximation

AU - Bertsekas, Dimitri P.

AU - Nedich, Angelia

AU - Borkar, Vivek S.

PY - 2004/1/1

Y1 - 2004/1/1

N2 - This chapter considers temporal difference algorithms within the context of infinite-horizon finite-state dynamic programming problems with discounted cost and linear cost function approximation. This problem arises as a subproblem in the policy iteration method of dynamic programming. Additional discussions of such problems can be found in Chapters 6 and 12. The method presented here is the first iterative temporal difference method that converges without requiring a diminishing step size. The chapter discusses the connections with Sutton’s ID(λ) and with various versions of least-squares that are based on value iteration. It is shown using both analysis and experiments that the proposed method is substantially faster, simpler, and more reliable than TD(λ). Comparisons are also made with the LSTD method of Boyan, and Bradtke and Barto.

AB - This chapter considers temporal difference algorithms within the context of infinite-horizon finite-state dynamic programming problems with discounted cost and linear cost function approximation. This problem arises as a subproblem in the policy iteration method of dynamic programming. Additional discussions of such problems can be found in Chapters 6 and 12. The method presented here is the first iterative temporal difference method that converges without requiring a diminishing step size. The chapter discusses the connections with Sutton’s ID(λ) and with various versions of least-squares that are based on value iteration. It is shown using both analysis and experiments that the proposed method is substantially faster, simpler, and more reliable than TD(λ). Comparisons are also made with the LSTD method of Boyan, and Bradtke and Barto.

KW - Argon

KW - Convergence

KW - Eigenvalues and eigenfunctions

KW - Function approximation

KW - Markov processes

KW - Trajectory

KW - Vectors

UR - http://www.scopus.com/inward/record.url?scp=85036496976&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85036496976&partnerID=8YFLogxK

U2 - 10.1109/9780470544785.ch9

DO - 10.1109/9780470544785.ch9

M3 - Chapter

AN - SCOPUS:85036496976

SN - 047166054X

SN - 9780471660545

SP - 235

EP - 259

BT - Handbook of Learning and Approximate Dynamic Programming

PB - John Wiley and Sons Inc.

ER -