On-line learning control by association and reinforcement

Jennie Si, Yu Tsung Wang

Research output: Contribution to journalArticle

558 Citations (Scopus)

Abstract

This paper focuses on a systematic treatment for developing a generic on-line learning control system based on the fundamental principle of reinforcement learning or more specifically neural dynamic programming. This on-line learning system improves its performance over time in two aspects. First, it learns from its own mistakes through the reinforcement signal from the external environment and tries to reinforce its action to improve future performance. Second, system states associated with the positive reinforcement is memorized through a network learning process where in the future, similar states will be more positively associated with a control action leading to a positive reinforcement. A successful candidate of on-line learning control design will be introduced. Real-time learning algorithms will be derived for individual components in the learning system. Some analytical insight will be provided to give guidelines on the learning process took place in each module of the on-line learning control system. The performance of the on-line learning controller is measured by its learning speed, success rate of learning, and the degree to meet the learning control objective. The overall learning control system performance will be tested on a single cart-pole balancing problem, a pendulum swing up and balancing task, and a more complex problem of balancing a triple-link inverted pendulum.

Original languageEnglish (US)
Pages (from-to)264-276
Number of pages13
JournalIEEE Transactions on Neural Networks
Volume12
Issue number2
DOIs
StatePublished - Mar 2001

Fingerprint

Learning Control
Learning Systems
Reinforcement
Pendulums
Control systems
Learning systems
Balancing
Control System
Learning Process
Reinforcement learning
Dynamic programming
Learning algorithms
Poles
Inverted Pendulum
Pendulum
Controllers
Reinforcement Learning
Control Design
Dynamic Programming
Pole

Keywords

  • Neural dynamic programming (NDP)
  • On-line learning
  • Reinforcement learning

ASJC Scopus subject areas

  • Control and Systems Engineering
  • Theoretical Computer Science
  • Electrical and Electronic Engineering
  • Artificial Intelligence
  • Computational Theory and Mathematics
  • Hardware and Architecture

Cite this

On-line learning control by association and reinforcement. / Si, Jennie; Wang, Yu Tsung.

In: IEEE Transactions on Neural Networks, Vol. 12, No. 2, 03.2001, p. 264-276.

Research output: Contribution to journalArticle

@article{ad7c195525a541ab89feffddd1d78734,
title = "On-line learning control by association and reinforcement",
abstract = "This paper focuses on a systematic treatment for developing a generic on-line learning control system based on the fundamental principle of reinforcement learning or more specifically neural dynamic programming. This on-line learning system improves its performance over time in two aspects. First, it learns from its own mistakes through the reinforcement signal from the external environment and tries to reinforce its action to improve future performance. Second, system states associated with the positive reinforcement is memorized through a network learning process where in the future, similar states will be more positively associated with a control action leading to a positive reinforcement. A successful candidate of on-line learning control design will be introduced. Real-time learning algorithms will be derived for individual components in the learning system. Some analytical insight will be provided to give guidelines on the learning process took place in each module of the on-line learning control system. The performance of the on-line learning controller is measured by its learning speed, success rate of learning, and the degree to meet the learning control objective. The overall learning control system performance will be tested on a single cart-pole balancing problem, a pendulum swing up and balancing task, and a more complex problem of balancing a triple-link inverted pendulum.",
keywords = "Neural dynamic programming (NDP), On-line learning, Reinforcement learning",
author = "Jennie Si and Wang, {Yu Tsung}",
year = "2001",
month = "3",
doi = "10.1109/72.914523",
language = "English (US)",
volume = "12",
pages = "264--276",
journal = "IEEE Transactions on Neural Networks and Learning Systems",
issn = "2162-237X",
publisher = "IEEE Computational Intelligence Society",
number = "2",

}

TY - JOUR

T1 - On-line learning control by association and reinforcement

AU - Si, Jennie

AU - Wang, Yu Tsung

PY - 2001/3

Y1 - 2001/3

N2 - This paper focuses on a systematic treatment for developing a generic on-line learning control system based on the fundamental principle of reinforcement learning or more specifically neural dynamic programming. This on-line learning system improves its performance over time in two aspects. First, it learns from its own mistakes through the reinforcement signal from the external environment and tries to reinforce its action to improve future performance. Second, system states associated with the positive reinforcement is memorized through a network learning process where in the future, similar states will be more positively associated with a control action leading to a positive reinforcement. A successful candidate of on-line learning control design will be introduced. Real-time learning algorithms will be derived for individual components in the learning system. Some analytical insight will be provided to give guidelines on the learning process took place in each module of the on-line learning control system. The performance of the on-line learning controller is measured by its learning speed, success rate of learning, and the degree to meet the learning control objective. The overall learning control system performance will be tested on a single cart-pole balancing problem, a pendulum swing up and balancing task, and a more complex problem of balancing a triple-link inverted pendulum.

AB - This paper focuses on a systematic treatment for developing a generic on-line learning control system based on the fundamental principle of reinforcement learning or more specifically neural dynamic programming. This on-line learning system improves its performance over time in two aspects. First, it learns from its own mistakes through the reinforcement signal from the external environment and tries to reinforce its action to improve future performance. Second, system states associated with the positive reinforcement is memorized through a network learning process where in the future, similar states will be more positively associated with a control action leading to a positive reinforcement. A successful candidate of on-line learning control design will be introduced. Real-time learning algorithms will be derived for individual components in the learning system. Some analytical insight will be provided to give guidelines on the learning process took place in each module of the on-line learning control system. The performance of the on-line learning controller is measured by its learning speed, success rate of learning, and the degree to meet the learning control objective. The overall learning control system performance will be tested on a single cart-pole balancing problem, a pendulum swing up and balancing task, and a more complex problem of balancing a triple-link inverted pendulum.

KW - Neural dynamic programming (NDP)

KW - On-line learning

KW - Reinforcement learning

UR - http://www.scopus.com/inward/record.url?scp=0035273403&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0035273403&partnerID=8YFLogxK

U2 - 10.1109/72.914523

DO - 10.1109/72.914523

M3 - Article

C2 - 18244383

AN - SCOPUS:0035273403

VL - 12

SP - 264

EP - 276

JO - IEEE Transactions on Neural Networks and Learning Systems

JF - IEEE Transactions on Neural Networks and Learning Systems

SN - 2162-237X

IS - 2

ER -