On boundedness of Q-learning iterates for stochastic shortest path problems

Research output: Contribution to journalArticlepeer-review

5 Scopus citations

Abstract

We consider a totally asynchronous stochastic approximation algorithm, Q-learning, for solving finite space stochastic shortest path (SSP) problems, which are undiscounted, total cost Markov decision processes with an absorbing and cost-free state. For the most commonly used SSP models, existing convergence proofs assume that the sequence of Q-learning iterates is bounded with probability one, or some other condition that guarantees boundedness. We prove that the sequence of iterates is naturally bounded with probability one, thus furnishing the boundedness condition in the convergence proof by Tsitsiklis [Tsitsiklis JN (1994) Asynchronous stochastic approximation and Q-learning. Machine Learn. 16:185-202] and establishing completely the convergence of Q-learning for these SSP models.

Original languageEnglish (US)
Pages (from-to)209-227
Number of pages19
JournalMathematics of Operations Research
Volume38
Issue number2
DOIs
StatePublished - May 2013
Externally publishedYes

Keywords

  • Dynamic programming
  • Markov decision processes
  • Q-learning
  • Reinforcement learning
  • Stochastic approximation

ASJC Scopus subject areas

  • Mathematics(all)
  • Computer Science Applications
  • Management Science and Operations Research

Fingerprint Dive into the research topics of 'On boundedness of Q-learning iterates for stochastic shortest path problems'. Together they form a unique fingerprint.

Cite this