Abstract
Direct NDP is in the family of approximate dynamic programming designs aiming at using learning and approximation methods to solve dynamic optimization problems formulated in dynamic programming, and to overcome the curse of dimensionality. Due to the statistical learning nature of the approaches, researchers usually make use of statistical measures to evaluate the design performance of the learning system such as the learning speed and the variation from one learning experience to the other. However, there are no systematic studies to date that address closed loop system performance from an input-output functional perspective. This paper analyzes direct NDP designs using classic control-theoretic sensitivity arguments. By using the benchmark cart-pole problem, it is shown that direct NDP uses an LQR with desired closed-loop properties as a learning guide, it is more likely for direct NDP to generate better designs than a direct NDP learning from scratch. Although the approach and results are illustrated using a simple nonlinear cart-pole system, it is clear that they are readily extended to more complex dynamical systems.
Original language | English (US) |
---|---|
Title of host publication | IEEE International Symposium on Intelligent Control - Proceedings |
Pages | 529-532 |
Number of pages | 4 |
State | Published - 2003 |
Event | PROCEEDINGS of the 2003 IEEE INTERNATIONAL SYMPOSIUM on INTELLIGENT CONTROL - Houston, TX, United States Duration: Oct 5 2003 → Oct 8 2003 |
Other
Other | PROCEEDINGS of the 2003 IEEE INTERNATIONAL SYMPOSIUM on INTELLIGENT CONTROL |
---|---|
Country/Territory | United States |
City | Houston, TX |
Period | 10/5/03 → 10/8/03 |
ASJC Scopus subject areas
- Hardware and Architecture
- Control and Systems Engineering