Stochastic approximation for nonexpansive maps: Application to Q-learning algorithms

Jinane Abounadi; Dimitri P. Bertsekas; Vivek Borkar

doi:10.1137/S0363012998346621

Stochastic approximation for nonexpansive maps: Application to Q-learning algorithms

Jinane Abounadi, Dimitri P. Bertsekas, Vivek Borkar

Research output: Contribution to journal › Article › peer-review

34 Scopus citations

Abstract

We discuss synchronous and asynchronous iterations of the form x^k+1 = x^k + γ(k)(h(x^k) + w^k), where h is a suitable map and {w^k} is a deterministic or stochastic sequence satisfying suitable conditions. In particular, in the stochastic case, these are stochastic approximation iterations that can be analyzed using the ODE approach based either on Kushner and Clark's lemma for the synchronous case or on Borkar's theorem for the asynchronous case. However, the analysis requires that the iterates {x^k} be bounded, a fact which is usually hard to prove. We develop a novel framework for proving boundedness in the deterministic framework, which is also applicable to the stochastic case when the deterministic hypotheses can be verified in the almost sure sense. This is based on scaling ideas and on the properties of Lyapunov functions. We then combine the boundedness property with Borkar's stability analysis of ODEs involving nonexpansive mappings to prove convergence (with probability 1 in the stochastic case). We also apply our convergence analysis to Q-learning algorithms for stochastic shortest path problems and are able to relax some of the assumptions of the currently available results.

Original language	English (US)
Pages (from-to)	1-22
Number of pages	22
Journal	SIAM Journal on Control and Optimization
Volume	41
Issue number	1
DOIs	https://doi.org/10.1137/S0363012998346621
State	Published - 2003
Externally published	Yes

Keywords

Neuro-dynamic programming
Q-learning
Stochastic approximation

ASJC Scopus subject areas

Control and Optimization
Applied Mathematics

Access to Document

10.1137/S0363012998346621

Cite this

@article{8ac4e2af3bc64abfad9475e3249d713f,

title = "Stochastic approximation for nonexpansive maps: Application to Q-learning algorithms",

abstract = "We discuss synchronous and asynchronous iterations of the form xk+1 = xk + γ(k)(h(xk) + wk), where h is a suitable map and {wk} is a deterministic or stochastic sequence satisfying suitable conditions. In particular, in the stochastic case, these are stochastic approximation iterations that can be analyzed using the ODE approach based either on Kushner and Clark's lemma for the synchronous case or on Borkar's theorem for the asynchronous case. However, the analysis requires that the iterates {xk} be bounded, a fact which is usually hard to prove. We develop a novel framework for proving boundedness in the deterministic framework, which is also applicable to the stochastic case when the deterministic hypotheses can be verified in the almost sure sense. This is based on scaling ideas and on the properties of Lyapunov functions. We then combine the boundedness property with Borkar's stability analysis of ODEs involving nonexpansive mappings to prove convergence (with probability 1 in the stochastic case). We also apply our convergence analysis to Q-learning algorithms for stochastic shortest path problems and are able to relax some of the assumptions of the currently available results.",

keywords = "Neuro-dynamic programming, Q-learning, Stochastic approximation",

author = "Jinane Abounadi and Bertsekas, {Dimitri P.} and Vivek Borkar",

year = "2003",

doi = "10.1137/S0363012998346621",

language = "English (US)",

volume = "41",

pages = "1--22",

journal = "SIAM Journal on Control and Optimization",

issn = "0363-0129",

publisher = "Society for Industrial and Applied Mathematics Publications",

number = "1",

}

TY - JOUR

T1 - Stochastic approximation for nonexpansive maps

T2 - Application to Q-learning algorithms

AU - Abounadi, Jinane

AU - Bertsekas, Dimitri P.

AU - Borkar, Vivek

PY - 2003

Y1 - 2003

N2 - We discuss synchronous and asynchronous iterations of the form xk+1 = xk + γ(k)(h(xk) + wk), where h is a suitable map and {wk} is a deterministic or stochastic sequence satisfying suitable conditions. In particular, in the stochastic case, these are stochastic approximation iterations that can be analyzed using the ODE approach based either on Kushner and Clark's lemma for the synchronous case or on Borkar's theorem for the asynchronous case. However, the analysis requires that the iterates {xk} be bounded, a fact which is usually hard to prove. We develop a novel framework for proving boundedness in the deterministic framework, which is also applicable to the stochastic case when the deterministic hypotheses can be verified in the almost sure sense. This is based on scaling ideas and on the properties of Lyapunov functions. We then combine the boundedness property with Borkar's stability analysis of ODEs involving nonexpansive mappings to prove convergence (with probability 1 in the stochastic case). We also apply our convergence analysis to Q-learning algorithms for stochastic shortest path problems and are able to relax some of the assumptions of the currently available results.

AB - We discuss synchronous and asynchronous iterations of the form xk+1 = xk + γ(k)(h(xk) + wk), where h is a suitable map and {wk} is a deterministic or stochastic sequence satisfying suitable conditions. In particular, in the stochastic case, these are stochastic approximation iterations that can be analyzed using the ODE approach based either on Kushner and Clark's lemma for the synchronous case or on Borkar's theorem for the asynchronous case. However, the analysis requires that the iterates {xk} be bounded, a fact which is usually hard to prove. We develop a novel framework for proving boundedness in the deterministic framework, which is also applicable to the stochastic case when the deterministic hypotheses can be verified in the almost sure sense. This is based on scaling ideas and on the properties of Lyapunov functions. We then combine the boundedness property with Borkar's stability analysis of ODEs involving nonexpansive mappings to prove convergence (with probability 1 in the stochastic case). We also apply our convergence analysis to Q-learning algorithms for stochastic shortest path problems and are able to relax some of the assumptions of the currently available results.

KW - Neuro-dynamic programming

KW - Q-learning

KW - Stochastic approximation

UR - http://www.scopus.com/inward/record.url?scp=0037225359&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0037225359&partnerID=8YFLogxK

U2 - 10.1137/S0363012998346621

DO - 10.1137/S0363012998346621

M3 - Article

AN - SCOPUS:0037225359

SN - 0363-0129

VL - 41

SP - 1

EP - 22

JO - SIAM Journal on Control and Optimization

JF - SIAM Journal on Control and Optimization

IS - 1

ER -

Stochastic approximation for nonexpansive maps: Application to Q-learning algorithms

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this