Improved temporal difference methods with linear function approximation

Dimitri P. Bertsekas, Angelia Nedich, Vivek S. Borkar

Research output: Chapter in Book/Report/Conference proceedingChapter

25 Scopus citations

Abstract

This chapter considers temporal difference algorithms within the context of infinite-horizon finite-state dynamic programming problems with discounted cost and linear cost function approximation. This problem arises as a subproblem in the policy iteration method of dynamic programming. Additional discussions of such problems can be found in Chapters 6 and 12. The method presented here is the first iterative temporal difference method that converges without requiring a diminishing step size. The chapter discusses the connections with Sutton’s ID(λ) and with various versions of least-squares that are based on value iteration. It is shown using both analysis and experiments that the proposed method is substantially faster, simpler, and more reliable than TD(λ). Comparisons are also made with the LSTD method of Boyan, and Bradtke and Barto.

Original languageEnglish (US)
Title of host publicationHandbook of Learning and Approximate Dynamic Programming
PublisherJohn Wiley and Sons Inc.
Pages235-259
Number of pages25
ISBN (Electronic)9780470544785
ISBN (Print)047166054X, 9780471660545
DOIs
StatePublished - Jan 1 2004
Externally publishedYes

Keywords

  • Argon
  • Convergence
  • Eigenvalues and eigenfunctions
  • Function approximation
  • Markov processes
  • Trajectory
  • Vectors

ASJC Scopus subject areas

  • Computer Science(all)

Fingerprint Dive into the research topics of 'Improved temporal difference methods with linear function approximation'. Together they form a unique fingerprint.

  • Cite this

    Bertsekas, D. P., Nedich, A., & Borkar, V. S. (2004). Improved temporal difference methods with linear function approximation. In Handbook of Learning and Approximate Dynamic Programming (pp. 235-259). John Wiley and Sons Inc.. https://doi.org/10.1109/9780470544785.ch9