Approximate robust policy iteration using multilayer perceptron neural networks for discounted infinite-horizon markov decision processes with uncertain correlated transition matrices

Baohua Li; Jennie Si

doi:10.1109/TNN.2010.2050334

Approximate robust policy iteration using multilayer perceptron neural networks for discounted infinite-horizon markov decision processes with uncertain correlated transition matrices

Baohua Li, Jennie Si

Research output: Contribution to journal › Article › peer-review

16 Scopus citations

Abstract

We study finite-state, finite-action, discounted infinite-horizon Markov decision processes with uncertain correlated transition matrices in deterministic policy spaces. Existing robust dynamic programming methods cannot be extended to solving this class of general problems. In this paper, based on a robust optimality criterion, an approximate robust policy iteration using a multilayer perceptron neural network is proposed. It is proven that the proposed algorithm converges in finite iterations, and it converges to a stationary optimal or near-optimal policy in a probability sense. In addition, we point out that sometimes even a direct enumeration may not be applicable to addressing this class of problems. However, a direct enumeration based on our proposed maximum value approximation over the parameter space is a feasible approach. We provide further analysis to show that our proposed algorithm is more efficient than such an enumeration method for various scenarios.

Original language	English (US)
Article number	5499042
Pages (from-to)	1270-1280
Number of pages	11
Journal	IEEE Transactions on Neural Networks
Volume	21
Issue number	8
DOIs	https://doi.org/10.1109/TNN.2010.2050334
State	Published - Aug 2010

Keywords

Approximate dynamic programming
Markov decision processes (MDP)
multilayer perceptrons
uncertain transition matrix

ASJC Scopus subject areas

Software
Computer Science Applications
Computer Networks and Communications
Artificial Intelligence

Access to Document

10.1109/TNN.2010.2050334

Cite this

@article{61014c75511e4bccb6f7757eb55a3d89,

title = "Approximate robust policy iteration using multilayer perceptron neural networks for discounted infinite-horizon markov decision processes with uncertain correlated transition matrices",

abstract = "We study finite-state, finite-action, discounted infinite-horizon Markov decision processes with uncertain correlated transition matrices in deterministic policy spaces. Existing robust dynamic programming methods cannot be extended to solving this class of general problems. In this paper, based on a robust optimality criterion, an approximate robust policy iteration using a multilayer perceptron neural network is proposed. It is proven that the proposed algorithm converges in finite iterations, and it converges to a stationary optimal or near-optimal policy in a probability sense. In addition, we point out that sometimes even a direct enumeration may not be applicable to addressing this class of problems. However, a direct enumeration based on our proposed maximum value approximation over the parameter space is a feasible approach. We provide further analysis to show that our proposed algorithm is more efficient than such an enumeration method for various scenarios.",

keywords = "Approximate dynamic programming, Markov decision processes (MDP), multilayer perceptrons, uncertain transition matrix",

author = "Baohua Li and Jennie Si",

note = "Funding Information: Manuscript received July 5, 2009; revised November 22, 2009, March 2, 2010, and May 6, 2010. Date of publication July 1, 2010; date of current version August 6, 2010. This work was supported in part by the National Science Foundation, under Grants ECS-0401405 and ECS-0702057, and by the National Science Foundation of China, under Grant 50 828 701.",

year = "2010",

month = aug,

doi = "10.1109/TNN.2010.2050334",

language = "English (US)",

volume = "21",

pages = "1270--1280",

journal = "IEEE Transactions on Neural Networks",

issn = "1045-9227",

publisher = "IEEE Computational Intelligence Society",

number = "8",

}

TY - JOUR

T1 - Approximate robust policy iteration using multilayer perceptron neural networks for discounted infinite-horizon markov decision processes with uncertain correlated transition matrices

AU - Li, Baohua

AU - Si, Jennie

N1 - Funding Information: Manuscript received July 5, 2009; revised November 22, 2009, March 2, 2010, and May 6, 2010. Date of publication July 1, 2010; date of current version August 6, 2010. This work was supported in part by the National Science Foundation, under Grants ECS-0401405 and ECS-0702057, and by the National Science Foundation of China, under Grant 50 828 701.

PY - 2010/8

Y1 - 2010/8

N2 - We study finite-state, finite-action, discounted infinite-horizon Markov decision processes with uncertain correlated transition matrices in deterministic policy spaces. Existing robust dynamic programming methods cannot be extended to solving this class of general problems. In this paper, based on a robust optimality criterion, an approximate robust policy iteration using a multilayer perceptron neural network is proposed. It is proven that the proposed algorithm converges in finite iterations, and it converges to a stationary optimal or near-optimal policy in a probability sense. In addition, we point out that sometimes even a direct enumeration may not be applicable to addressing this class of problems. However, a direct enumeration based on our proposed maximum value approximation over the parameter space is a feasible approach. We provide further analysis to show that our proposed algorithm is more efficient than such an enumeration method for various scenarios.

AB - We study finite-state, finite-action, discounted infinite-horizon Markov decision processes with uncertain correlated transition matrices in deterministic policy spaces. Existing robust dynamic programming methods cannot be extended to solving this class of general problems. In this paper, based on a robust optimality criterion, an approximate robust policy iteration using a multilayer perceptron neural network is proposed. It is proven that the proposed algorithm converges in finite iterations, and it converges to a stationary optimal or near-optimal policy in a probability sense. In addition, we point out that sometimes even a direct enumeration may not be applicable to addressing this class of problems. However, a direct enumeration based on our proposed maximum value approximation over the parameter space is a feasible approach. We provide further analysis to show that our proposed algorithm is more efficient than such an enumeration method for various scenarios.

KW - Approximate dynamic programming

KW - Markov decision processes (MDP)

KW - multilayer perceptrons

KW - uncertain transition matrix

UR - http://www.scopus.com/inward/record.url?scp=77955513754&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=77955513754&partnerID=8YFLogxK

U2 - 10.1109/TNN.2010.2050334

DO - 10.1109/TNN.2010.2050334

M3 - Article

C2 - 20601311

AN - SCOPUS:77955513754

SN - 1045-9227

VL - 21

SP - 1270

EP - 1280

JO - IEEE Transactions on Neural Networks

JF - IEEE Transactions on Neural Networks

IS - 8

M1 - 5499042

ER -

Approximate robust policy iteration using multilayer perceptron neural networks for discounted infinite-horizon markov decision processes with uncertain correlated transition matrices

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this