On-line learning control by association and reinforcement

Jennie Si, Yu Tsung Wang

Research output: Contribution to journalArticle

582 Scopus citations

Abstract

This paper focuses on a systematic treatment for developing a generic on-line learning control system based on the fundamental principle of reinforcement learning or more specifically neural dynamic programming. This on-line learning system improves its performance over time in two aspects. First, it learns from its own mistakes through the reinforcement signal from the external environment and tries to reinforce its action to improve future performance. Second, system states associated with the positive reinforcement is memorized through a network learning process where in the future, similar states will be more positively associated with a control action leading to a positive reinforcement. A successful candidate of on-line learning control design will be introduced. Real-time learning algorithms will be derived for individual components in the learning system. Some analytical insight will be provided to give guidelines on the learning process took place in each module of the on-line learning control system. The performance of the on-line learning controller is measured by its learning speed, success rate of learning, and the degree to meet the learning control objective. The overall learning control system performance will be tested on a single cart-pole balancing problem, a pendulum swing up and balancing task, and a more complex problem of balancing a triple-link inverted pendulum.

Original languageEnglish (US)
Pages (from-to)264-276
Number of pages13
JournalIEEE Transactions on Neural Networks
Volume12
Issue number2
DOIs
StatePublished - Mar 1 2001

Keywords

  • Neural dynamic programming (NDP)
  • On-line learning
  • Reinforcement learning

ASJC Scopus subject areas

  • Software
  • Computer Science Applications
  • Computer Networks and Communications
  • Artificial Intelligence

Fingerprint Dive into the research topics of 'On-line learning control by association and reinforcement'. Together they form a unique fingerprint.

  • Cite this