New value iteration and Q-learning methods for the average cost dynamic programming problem

Dimitri P. Bertsekas

New value iteration and Q-learning methods for the average cost dynamic programming problem

Research output: Contribution to journal › Conference article › peer-review

Abstract

We propose a new value iteration method for the classical average cost Markovian Decision problem, under the assumption that all stationary policies are unichain and furthermore there exists a state that is recurrent under all stationary policies. This method is motivated by a relation between the average cost problem and an associated stochastic shortest path problem. Contrary to the standard relative value iteration, our method involves a weighted sup norm contraction and for this reason it admits a Gauss-Seidel and an asynchronous implementation. Computational tests indicate that the Gauss-Seidel version of the new method substantially outperforms the standard method for difficult problems. The contraction property also makes the method a suitable basis for the development of asynchronous Q-learning methods.

Original language	English (US)
Pages (from-to)	2692-2697
Number of pages	6
Journal	Proceedings of the IEEE Conference on Decision and Control
Volume	3
State	Published - 1998
Externally published	Yes
Event	Proceedings of the 1998 37th IEEE Conference on Decision and Control (CDC) - Tampa, FL, USA Duration: Dec 16 1998 → Dec 18 1998

ASJC Scopus subject areas

Control and Systems Engineering
Modeling and Simulation
Control and Optimization

Cite this

@article{3ea88723a4ae44159514acebadffe03c,

title = "New value iteration and Q-learning methods for the average cost dynamic programming problem",

abstract = "We propose a new value iteration method for the classical average cost Markovian Decision problem, under the assumption that all stationary policies are unichain and furthermore there exists a state that is recurrent under all stationary policies. This method is motivated by a relation between the average cost problem and an associated stochastic shortest path problem. Contrary to the standard relative value iteration, our method involves a weighted sup norm contraction and for this reason it admits a Gauss-Seidel and an asynchronous implementation. Computational tests indicate that the Gauss-Seidel version of the new method substantially outperforms the standard method for difficult problems. The contraction property also makes the method a suitable basis for the development of asynchronous Q-learning methods.",

author = "Bertsekas, {Dimitri P.}",

year = "1998",

language = "English (US)",

volume = "3",

pages = "2692--2697",

journal = "Proceedings of the IEEE Conference on Decision and Control",

issn = "0191-2216",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

note = "Proceedings of the 1998 37th IEEE Conference on Decision and Control (CDC) ; Conference date: 16-12-1998 Through 18-12-1998",

}

TY - JOUR

T1 - New value iteration and Q-learning methods for the average cost dynamic programming problem

AU - Bertsekas, Dimitri P.

PY - 1998

Y1 - 1998

N2 - We propose a new value iteration method for the classical average cost Markovian Decision problem, under the assumption that all stationary policies are unichain and furthermore there exists a state that is recurrent under all stationary policies. This method is motivated by a relation between the average cost problem and an associated stochastic shortest path problem. Contrary to the standard relative value iteration, our method involves a weighted sup norm contraction and for this reason it admits a Gauss-Seidel and an asynchronous implementation. Computational tests indicate that the Gauss-Seidel version of the new method substantially outperforms the standard method for difficult problems. The contraction property also makes the method a suitable basis for the development of asynchronous Q-learning methods.

AB - We propose a new value iteration method for the classical average cost Markovian Decision problem, under the assumption that all stationary policies are unichain and furthermore there exists a state that is recurrent under all stationary policies. This method is motivated by a relation between the average cost problem and an associated stochastic shortest path problem. Contrary to the standard relative value iteration, our method involves a weighted sup norm contraction and for this reason it admits a Gauss-Seidel and an asynchronous implementation. Computational tests indicate that the Gauss-Seidel version of the new method substantially outperforms the standard method for difficult problems. The contraction property also makes the method a suitable basis for the development of asynchronous Q-learning methods.

UR - http://www.scopus.com/inward/record.url?scp=0032255606&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0032255606&partnerID=8YFLogxK

M3 - Conference article

AN - SCOPUS:0032255606

SN - 0191-2216

VL - 3

SP - 2692

EP - 2697

JO - Proceedings of the IEEE Conference on Decision and Control

JF - Proceedings of the IEEE Conference on Decision and Control

T2 - Proceedings of the 1998 37th IEEE Conference on Decision and Control (CDC)

Y2 - 16 December 1998 through 18 December 1998

ER -

New value iteration and Q-learning methods for the average cost dynamic programming problem

Abstract

ASJC Scopus subject areas

Other files and links

Fingerprint

Cite this