TY - JOUR
T1 - Convergence results for some temporal difference methods based on least squares
AU - Yu, Huizhen
AU - Bertsekas, Dimitri P.
N1 - Funding Information:
Manuscript received July 17, 2006; revised August 15, 2007 and August 22, 2008. Current version published July 09, 2009. This work was supported by National Science Foundation (NSF) Grant ECS-0218328. Recommended by Associate Editor A. Lim.
PY - 2009
Y1 - 2009
N2 - We consider finite-state Markov decision processes, and prove convergence and rate of convergence results for certain least squares policy evaluation algorithms of the type known as LSPE(λ). These are temporal difference methods for constructing a linear function approximation of the cost function of a stationary policy, within the context of infinite-horizon discounted and average cost dynamic programming. We introduce an average cost method, patterned after the known discounted cost method, and we prove its convergence for a range of constant stepsize choices. We also show that the convergence rate of both the discounted and the average cost methods is optimal within the class of temporal difference methods. Analysis and experiment indicate that our methods are substantially and often dramatically faster than TD(λ), as well as more reliable.
AB - We consider finite-state Markov decision processes, and prove convergence and rate of convergence results for certain least squares policy evaluation algorithms of the type known as LSPE(λ). These are temporal difference methods for constructing a linear function approximation of the cost function of a stationary policy, within the context of infinite-horizon discounted and average cost dynamic programming. We introduce an average cost method, patterned after the known discounted cost method, and we prove its convergence for a range of constant stepsize choices. We also show that the convergence rate of both the discounted and the average cost methods is optimal within the class of temporal difference methods. Analysis and experiment indicate that our methods are substantially and often dramatically faster than TD(λ), as well as more reliable.
KW - Approximation methods
KW - Convergence of numerical methods
KW - Dynamic programming
KW - Markov processes
UR - http://www.scopus.com/inward/record.url?scp=67949109470&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=67949109470&partnerID=8YFLogxK
U2 - 10.1109/TAC.2009.2022097
DO - 10.1109/TAC.2009.2022097
M3 - Article
AN - SCOPUS:67949109470
SN - 0018-9286
VL - 54
SP - 1515
EP - 1531
JO - IEEE Transactions on Automatic Control
JF - IEEE Transactions on Automatic Control
IS - 7
ER -