Dynamic programming results concerning existence and characterizations of optimal or nearly optimal policies, convergence of algorithms and characterizations of the optimal cost function have been available for some time but a rigorous proof of these results has required quite restrictive hypotheses, such as countability of the state space, in order to circumvent the inherent measurabilities. The authors show that the use of universally measurable policies in the Borel space framework resolves the measurability issues so that all the basic results of dynamic programming can be obtained in the strongest possible form. In particular, epsilon -optimal policies are shown to exist, the dynamic programming algorithm is defined and conditions and bounds for its convergence to the optimal cost are given. The optimality equation is shown to hold and is used to characterize the optimal cost function and optimal policies.
ASJC Scopus subject areas
- Computer Science Applications
- Management Science and Operations Research